Complementing the reply from @fernandosavio (which already explains why to use zip
is wrong in this case), follows an explanation about why your regex failed.
You are using '\\{}\\b'.format(x)
to create regex. Inside a string, \\
is interpreted as the character \
(since the \
is used for escape sequences, such as \n
to designate a line break, then to the character itself \
, is used \\
).
So when x
has the value brasil
, the result is the string \brasil\b
. And the black regex starts the string as the shortcut \b
, indicating a "boundary between words" (word Boundary, a position that has a prior alphanumeric character and a non-alphanumeric character after, or vice versa). That is, regex only looks for the word "Rasil" (see).
When x
has the value argentina
, the result is the string \argentina\b
. The escape sequence \a
corresponds to the character BELL (a control character that was used to make a sound in the terminal, although nowadays not everyone does it). Anyway, this regex does not search for the word "argentina", but for a \a
followed by the word "rgentina" (see).
And finally when the x
has the value canada
, the result is the string \canada\b
. Only that \c
is an invalid escape sequence (see here the list of valid sequences), and module documentation re
says the use of invalid exhaust sequences makes a mistake.
You probably wanted to use \b
at the beginning too, so just do:
for termo in termos:
r = re.compile(r'\b{}\b'.format(termo))
for palavra in palavras:
if r.search(palavra):
print(f'Encontrado {termo!r} em {palavra!r}')
break
else:
print(f'{termo!r} não encontrado.')
Note the r
before opening the quotation marks (r'\b{}\b'
). This indicates a raw string literal, within which the character \
does not need to be written as \\
, which makes regex a little more readable. Now the expressions will be \bbrasil\b
, \bargentina\b
and \bcanada\b
, that is, all are valid regex.
I also compile the expression once before searching the list of words, so I reuse the same regex in for
internal (it’s okay that there is a regex cache, but still I think you do not need to recreate several times the string inside the loop).
This solution is almost equal to the @fernandosavio, with a slight difference. For example, if we have something like:
termos = ['brasil']
palavras = ['brasileiro', 'brasil']
How his solution uses termo.casefold() in palavra.casefold()
, she finds first brasileiro
, since the term brasil
is actually contained in this string.
Already the regex \bbrasil\b
only finds the second word ("brazil"), because the regex looks for brasil
as long as there is a \b
before and after (and how \b
is the "boundary between words", the regex only finds a match when the word is exactly "brazil").
Note also that it uses casefold()
to leave the search case insensitive. For regex to behave the same, you can use the flag I
:
r = re.compile(r'\b{}\b'.format(termo), re.I)
Or use your own casefold()
in strings (both in search terms and in the words being searched):
for termo in termos:
r = re.compile(r'\b{}\b'.format(termo.casefold()))
for palavra in palavras:
if r.search(palavra.casefold()):
print(f'Encontrado {termo!r} em {palavra!r}')
break
else:
print(f'{termo!r} não encontrado.')
The difference occurs in some cases as the character ß
, the capital version of which is SS
(see). So a search case insensitive should find so much ß
how much SS
or ss
. Only that we use the flag re.I
is not sufficient for this case, and only with casefold
works. It’s up to you, because depending on the strings you search for, it might not make a difference (anyway, it’s good to know that there are these options).
Finally, adjusting the messages to what you wanted:
termos = ['brasil', 'argentina', 'chile', 'canada']
palavras = ['brasil.sao_paulo', 'chile', 'argentina']
for termo in termos:
r = re.compile(r'\b{}\b'.format(termo), re.I)
for palavra in palavras:
if r.search(palavra):
print(f'MATCH: {termo}')
break
else:
print(f'NOT MATCH: {termo}')
Exit:
MATCH: brasil
MATCH: argentina
MATCH: chile
NOT MATCH: canada
Luis, apparently you’re trying to search the
array2
using as search terms the items ofarray
. Only if that’s really what you want to do 'cause they’re usingzip
? You can’t understand what exactly you want or what exactly your code is trying to do. I strongly advise to [Dit] your question and explain your intentions and doubts better.– fernandosavio
I would like it to look independent of the array order
– Luis Henrique
And return to me what there is in common
– Luis Henrique
for example: Array 1 -> Line1 == Array2 -> All Lines
– Luis Henrique