You can use regular expressions for this.
A regular expression that finds all sequences that can contain digits, "-" and "." with at least 17 elements - it would be possible to refine the expression until it finds for itself 17 digits, but I think it gets too complex - so I prefer to combine the regular expression with some logic in Python.
Since the files are small (10kb, but even if they were 30 times larger), it is not necessary to read only part of the file and search there. But da also prevents you from reading the first 4KB of each file if the sequence is always there (~400 lines if the lines are not large).
import os, re
def encontra_nome(pasta, nome_do_arquivo):
dados = open(os.path.join(pasta, nome_do_arquivo)).read(4096)
sequencias = re.findall(r"[0-9\.\-]{17, 35}", dados)
for seq in sequencias:
sequencia_limpa = re.sub("\-|\.", "", a)
if len sequencia_limpa >= 17:
return sequencia
raise ValueError ("Sequencia de 17 dígitos não encontrada")
The regular expression r"[0-9\.\-]{17, 35}" search, as described, any sequence between 17 and 35 repetitions of characters between digits, "-" and ".". This allows up to a separator after each digit, so it should cover all possible formats. I preferred this rather than complicating regular expression - because they are neither especially readable, nor easy to do, to "count only the digits and ignore the other characters, and find 17 exactly". A single regular expression for this would certainly be possible. Instead, once all candidates have been found, I use a linear search with a for, filter them - and . - this time with a simple regular expression that replaces all "-" and "." with "".
I sometimes prefer to use two calls to the replace method of the strings instead of doing this, but since we are already making use of regular expressions, there is no reason not to use one more: there are no performance barriers or anything like that, but there are barriers to "oops, here comes a regular expression" of people keeping their code.
When separated by dots and hyphens, these characters are always in the same positions within the number or may vary?
– Woss
@Andersoncarloswoss the characters when they appear by what I looked in the hand, appear in the same positions
– stacker
And why when there are characters separating the number has 20 digits? Shouldn’t it always be 17?
– Woss
I saw now that I ended up copying the wrong sequence of numbers in my example, already corrected, thank you. The correct example would be 00588.2007.011.02.00-9
– stacker