2
I’m new to Python, and I’m having a problem that I can’t find a solution to. I have a folder with about 10k of . txt (written in many different ways). I need to extract the FIRST sequence of 17 numbers that is located in the first lines of these txt`s, and rename the file with the extracted sequence.
This sequence sometimes appears concatenated and sometimes appears separated by a dot and hyphen (e.g.: 00273200844202003, 00588.2007.011.02.00-9) PS: there are other numerical sequences in the text different or equal to 17 numbers, but the sequence is always the first of 17 that appears.
I stored the current document names in a list, was trying to find the sequence of numbers in the text using the NLTK package but without success.
pasta_de_documentos = (r'''C:\Users\mateus.ferreira\Desktop\Estudos\Python\Doc_Classifier\TXT''')
documentos = os.listdir(pasta_de_documentos)
If anyone knows a better approach or can give me a way to continue attacking the problem thanks. (I’m using Python 3)
When separated by dots and hyphens, these characters are always in the same positions within the number or may vary?
– Woss
@Andersoncarloswoss the characters when they appear by what I looked in the hand, appear in the same positions
– stacker
And why when there are characters separating the number has 20 digits? Shouldn’t it always be 17?
– Woss
I saw now that I ended up copying the wrong sequence of numbers in my example, already corrected, thank you. The correct example would be 00588.2007.011.02.00-9
– stacker