Due to the lack of clarity of the question, it can be interpreted in the following ways:
- How to know if a word is present in a text?
- How to get index of the first occurrence of a word in a text?
- How to get the slice inside a string where the first occurrence of a word in a text is found?
Regardless of what the question actually is, no regex is used for any of the three cases. Regular expressions are an expensive and cumbersome processing feature and should only be used to find patterns of characters or bytes never to find a defined word because there are less costly features.
Don’t misunderstand, I love working with Regex but despite the nice name, regular expressions are a case of linear grammar within the Chomsky Hierarchy ie using regex you are loading a parser(deterministic finite automaton) into your program which means you are giving up memory resources and processing time. So use Regex yes, but when necessary for example:
- find words in a text containing special spelling or abject.
- find words in a text containing certain vowel encounters.
- special alphanumeric sequences.
- break text using variable separators.
- separate text into lexical symbols.
- anything that can be related to searching and searching based on character repetition patterns.
After the introduction we go to(s) question(s).
How to know if a word is present in a text?
To know if a word is present in a text use the python operator in. Operators in
and not in
salute. x in s
returns True
if x
is contained in s
and False
otherwise. x not in s
returns the denial of x in s
.
s = ' o primeiro o segundo primeiro novamente'
print('primeiro' in s) #True
print('segundo' in s) #True
print('terceiro' in s) #False
How to get index of the first occurrence of a word in a text?
To get index of the first occurrence of a word in a text use the builtin method str.find()
.
str.find(sub[, start[, end]])
Returns the smallest index in the string where the substring sub
is found inside the slice s[start:end]
. Optional arguments such as start
and end
are interpreted as in slicing notation. Returns -1
if sub
is not located.
s = ' o primeiro o segundo primeiro novamente'
print(s.find('primeiro')) #3
print(s.find('segundo')) #14
print(s.find('terceiro')) #-1
How to get the slice inside a string where the first occurrence of a word in a text is found?
To get slice inside a string where the first occurrence of a word in a text is found simply add index of the first occurrence of the word plus its length obtained with the builtin function Len(). Remembering that not always the length of a string is visual length of the same string
s = ' o primeiro o segundo primeiro novamente'
for p in ("primeiro", "segundo", "terceiro"):
if p not in s:
print(f"Palavra \"{p}\" não encontrada.")
else:
print((p, i:= s.find(p), len(p) + i))
#('primeiro', 3, 11)
#('segundo', 14, 21)
#Palavra "terceiro" não encontrada.
Use
re.search()
instead of findall. See documentation here– Paulo Marques
It doesn’t make much sense, because if you are searching for "first", the result will be the word "first". It would make sense to search for if a regex is not a fixed word. What exactly do you want to do?
– hkotsubo
The question makes no sense, if you want to get the index of the first occurrence of a given word do not need regex, use the method
str.find()
. Example:print(' o primeiro o segundo primeiro novamente'.find('primeiro'))
– Augusto Vasques
Thanks @Paulomarques, the Ruan in the answer below complemented this issue well.
– Perciliano
@hkotsubo the word "first" has to come in the search, but in the string has it twice, but I wish only the first occurrence of it, the Ruan in the answer below complemented well this grateful question.
– Perciliano
@I beg you, pardon my ignorance in the subject and also in not being able to clarify the question better, the Ruan in the answer below complemented this question well, grateful.
– Perciliano
But if you search for the regex "first", the return is the word "first", then in practice it would be enough to know if the word "first" is in the string:
if 'primeiro' in texto
, or something like that. Do you realize that using regex is kind of useless in this case? It’s as if I wanted to find the letter "a" in the word "banana", and want as a result the letter "a" itself. I don’t need regex to get it, I just need to know if it has the letter "a" in the word... The answers below are Overkill, are a cannon to kill fly and it is a pity that no one has even mentioned it...– hkotsubo
got @hkotsubo, but in continuity of this research, from the first occurrence, I will need other details as the 30 characters after only the first occurrence and then within that range selected identify a pattern, I didn’t put all these details in the initial question because it would be very extensive.
– Perciliano
Well, you saw what I mentioned in your other question? Here. Maybe it’ll help...
– hkotsubo
yes, I was already wearing something similar, grateful.
– Perciliano