According to the documentation, it is possible to pass a second parameter to extract
, containing flags which alter the behaviour of regular expression.
In this case, just use the flag re.I
, which makes regex case insensitive (does not differentiate between upper and lower case):
import re
data['text'].str.extract('(#flamengo)', re.I)
It is also possible to use the modifier inline (?i)
in the expression itself, which has the same effect as flag:
data['text'].str.extract('((?i)#flamengo)')
# ou
data['text'].str.extract('((?i:#flamengo))')
To another answer suggested using [f|F]
to get both a lowercase and a uppercase "f". Only this expression also picks up the character |
, see. If you’re going to follow that idea, then the right one would be [fF][lL]...
. But using the flags is simpler.
the other answer is wrong. is a typical answer from someone who knows a hammer, and treats everything like a nail - (and not even the hammer knows it well, since you can put the case-insensitive flag, as you did)
– jsbueno
if it was not possible to solve the problem with parameters only in the regular expression - let’s assume that you solve that you have to remove the accents too, and find both "tree" and "tree" - the correct is to create another column in the dataframe, using the "apply" menu that has the normalized text - (that is: removed accents, converted to lowercase, spaces and characters that are not interesting converted to "_") - and search in the other column.
– jsbueno