Match between dictonary and array

Asked

Viewed 54 times

5

GOAL:

I’m trying to match one dictonary and a array. I want to check if the contents of the array are contained in KEY NAME dictonary, if it is it should return me the whole line of dictonary.

import re

dict = {
    "NAME": "LUIS", "AGE": "25", "CITY": "SAO PAULO", 
    "NAME": "LUCAS", "AGE": "30", "CITY": "SANTA CATARINA", 
    "NAME": "CARLOS", "AGE": "35", "CITY": "BAHIA"
}

array = ["LUCAS","LUIS"]

matches = []
for row in range(len(dict)):
    pattern = re.compile(r'^'+dict["NAME"][row]+'$')
    matches.append([x for x in array if pattern.match(x)])

for linha in matches:
    print(linha)

OUTPUT

C: Users Luis Desktop>python regex.py

[]

  • 1

    The way in which your dictionary is being used is wrong, first do not use the variable name as dicit that word is part of python, that is, a reserved keyword. Another thing, the way your dictionary is structured makes there is only one line in it. If you print on it the output will be: {'CITY': 'BAHIA', 'NAME': 'CARLOS', 'AGE': '35'}

  • 1

    Okay, if Dictionary is correct, as the search should be ?

  • 1

    The search is right. The only thing you would do is turn to Lower case to avoid problems (unless you can), and then turn into an array (list in python), so each item would be a dictionary

  • 1

    Sounds more like a case of XY problem. It was mentioned that the dictionary data came from a database, so this filter should be done in the query using a clause WHERE... Example: WHERE NAME LIKE "%LUCAS%" OR NAME LIKE "%LUIS%"

  • Needs to be processed in back-end data, are many elements within the array, the @hkotsubo response solved the problem, thank you!

1 answer

4


First of all, your dictionary is wrong, because it repeats the same keys, overwriting the previous values, so in fact it only has this:

{'NAME': 'CARLOS', 'AGE': '35', 'CITY': 'BAHIA'}

Maybe what you want is a list containing 3 dictionaries:

dados = [
    { "NAME": "LUIS", "AGE": "25", "CITY": "SAO PAULO" }, 
    { "NAME": "LUCAS", "AGE": "30", "CITY": "SANTA CATARINA" }, 
    { "NAME": "CARLOS", "AGE": "35", "CITY": "BAHIA" }
]

Now we can do the following:

import re

dados = [
    { "NAME": "LUIS", "AGE": "25", "CITY": "SAO PAULO" }, 
    { "NAME": "LUCAS", "AGE": "30", "CITY": "SANTA CATARINA" }, 
    { "NAME": "CARLOS", "AGE": "35", "CITY": "BAHIA" }
]

array = ["LUCAS", "LUIS"]

matches = []
for linha in dados:
    pattern = re.compile('^{}$'.format(linha["NAME"]))
    if any(x for x in array if pattern.match(x)):
        matches.append(linha)

for linha in matches:
    print(linha)

For each dictionary, I check if there is any element of array which corresponds to the NAME of this dictionary (using the function any, returning True if there is any element satisfying the criterion).

The exit is:

{'NAME': 'LUIS', 'AGE': '25', 'CITY': 'SAO PAULO'}
{'NAME': 'LUCAS', 'AGE': '30', 'CITY': 'SANTA CATARINA'}

If you want the search to be case insensitive (that is, that does not differentiate between upper and lower case, so the array could have 'luis' that the search would also find), just use the flag IGNORECASE when creating the regex:

pattern = re.compile('^{}$'.format(linha["NAME"]), re.IGNORECASE)

Remembering that your regex uses the markers ^ and $, which are respectively the beginning and end of the string. That is, if the name contained in the dictionary is "LUIS", the regex will be ^LUIS$: this means that it looks exactly for the string "LUIS". If you want to search for something that contain the string "LUIS", but that may have other characters before or after, just remove the ^ and the $.


For the case that you informed in the comments (in the dictionary has "LUIS01" and in the array has "LUIS"), there is the opposite: what has in the array that should be used in regex, and you should check if it finds a match in the contents of the dictionary:

import re

dados = [
    { "NAME": "LUIS01", "AGE": "25", "CITY": "SAO PAULO" }, 
    { "NAME": "LUCAS", "AGE": "30", "CITY": "SANTA CATARINA" }, 
    { "NAME": "CARLOS", "AGE": "35", "CITY": "BAHIA" }
]
array = ["LUCAS", "LUIS"]
matches = []
for linha in dados:
    if any(x for x in array if re.match(x, linha['NAME'], re.IGNORECASE)):
        matches.append(linha)

for linha in matches:
    print(linha)

Exit:

{'NAME': 'LUIS01', 'AGE': '25', 'CITY': 'SAO PAULO'}
{'NAME': 'LUCAS', 'AGE': '30', 'CITY': 'SANTA CATARINA'}
  • 1

    It was exactly that friend, because the dictionary is formed by a select to DB2 database, helped me a lot, thank you!

  • Only a doubt if in the case, the names contained within the KEY NAME, were different, for example Dict = LUIS01 and array = Luis, as would this part Pattern = re.Compile(' {}$'. format(line["NAME"])) ?

  • 1

    @Luisv. I updated the answer (based on what I understood, see if this is it)

  • tested this way: 'Pattern = re.Compile('{}'.format(line["NAME"]), re.IGNORE ?

  • 1

    @Luisv. Now I think I understand, I updated the answer again

  • Perfect, it was just that doubt, thank you very much!

  • If the dictionary comes from the database, why are you filtering in python instead of adding a clause WHERE in your query?

  • Needs to be processed in back-end data, are many elements within the array, the @hkotsubo response solved the problem, thank you!

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.