The function open
returns a file object
, while the function findall
should receive a string. That’s what the error message is saying:
Typeerror: expected string or bytes-like Object
You passed the return of open
(namely, a file Object), instead of a string.
To check the contents of the file, you must first use the file Object to read the contents of the file and get it as a string. Then you pass that string to findall
. I also recommend using with
, because it already closes the file automatically:
with open('infos', 'r') as f:
for line in f: # para cada linha do arquivo
print(re.findall(r'\sSSBR\s', line))
Remembering that findall
returns a list of regex occurrences in the string in question, so you just print it to get the results (if you have nothing, returns an empty list).
The code above makes a loop by all lines of the file, and for each one, checks the regex in question. But if you want, you can also put all the file contents at once in a single string, and then use regex:
with open('infos', 'r') as f:
tudo = f.read()
print(re.findall(r'\sSSBR\s', tudo))
But for very large files, loading all at once can consume a lot of memory, so it’s best to use the first approach, to read one line at a time.
Just remembering that findall
returns a list of snippets found in the string. But its regex contains a "fixed" text (the letters "SSBR", exactly in this order, and with a space before and after), then the return of findall
will be a list with one or more strings " SSBR "
(or an empty list if not found).
If you just want to know whether the line contains "SSBR" or not, you can use search
:
with open('infos', 'r') as f:
for line in f:
if re.search(r'\sSSBR\s', line):
print('linha contém SSBR')
else:
print('linha não contém SSBR')
When using the same regex several times, it is interesting to compile it before using the method compile
:
r = re.compile(r'\sSSBR\s')
with open('infos', 'r') as f:
for line in f:
if r.search(line):
print('linha contém SSBR')
else:
print('linha não contém SSBR')
So you reuse the regex, because it does not need to be recompiled several times within the loop (although the documentation cites that there is a cache most recently used regex programs, so for small programs and/or with a few regex that are not often used it won’t make as much difference).
Another detail is that you used \s
(which corresponds to spaces, TAB and line breaks, see the documentation for the full list), and the spaces are part of the return of findall
(that is, it will return " SSBR "
, with the spaces before and after). If you want only the string "SSBR" to be in the results, you can change the regex to r'\s(SSBR)\s'
: the parentheses form a catch group and when these are present, findall
returns only the groups.
Or you can use r'\bSSBR\b'
. The \b
means "word Boundary" (something like "boundary between words"), and corresponds to positions in which there is an alphanumeric character before and a non-alphinical character after (or vice versa). That is, it takes the string "SSBR" even if it has other things besides \s
before or after (such as punctuation marks, the beginning or end of the string, etc).
Post a snippet of what you have inside the Indian archive
– FourZeroFive
All right, buddy!!
– Matheus Henrique