3
I know I can access the position of a match of a regex in a string using the methods start
, end
and span
. Example of a program that identifies repetitions in a text:
import re
from colorama import Fore,Style
text='''Dorsey was born and raised in St. Louis, Missouri,[8][9]
the son of Tim and Marcia (née Smith) Dorsey.[10][11][12] He is of English, Irish, and Italian descent.[13]
His father worked for a company that developed mass spectrometers and his mother was a homemaker.[14]
He was raised Catholic, and his uncle is a Catholic Catholic priest in Cincinnati.[15] He attended the Catholic
Bishop DuBourg High School. In his younger days, Dorsey worked occasionally as a fashion model.
[16][17][18][19][20] By age 14, Dorsey had become interested in dispatch dispatch routing. Some of the open-source software he created in the area of dispatch logistics is still used by taxicab companies.[10] Dorsey enrolled at the University of Missouri–Rolla in 1995 and attended for two-plus years[15] before transferring to New York University in 1997, but he dropped out two years later,[21] one semester short of graduating.[15]
He came up with the idea that he developed as Twitter while studying at NYU.[15][22]
'''
print("Searching for repeated words ...", "\n")
try:
result=re.search(r'(\w{3,}\s)\1',text)
start=result.start()
end=result.end()
value=result.group()
print("The word \"{}\" is repeated at: ".format(value.split(' ')[0]),"\n\n")
print(text[start-100:start]+ Fore.RED + text[start:end]+ Style.RESET_ALL+text[end:end+200])
except:
print("No repeated words found")
Returns:
Note that the problem of this program is that it identifies only one occurrence. I imagined that the method start
return a list or tuple when there is more than one match, but that is not what happens.
How can I access the position of all matchs of a regular expression in a string? For example, the word Dispatch also repeats in the text, but I do not know how to get the position of it.
Not directly related, but when
search
finds nothing he returnsNone
, then instead oftry
/except
, just doif result: encontrou else: não encontrou
– hkotsubo
Only consecutive repeated words ex:
Catholic Catholic
anddispatch dispatch
or all the repeated words ex:Dorsey
– Augusto Vasques