Replace function does not work for all cases

Asked

Viewed 333 times

2

I made this script to read a TXT file, find a sequence of 20 digits in the text, and rename the file with the digit sequence found. I used the replace to remove all the characters that appear between the numbers, but somehow he did not remove the hyphens when renaming.

name_files5 = os.listdir(path_txt)

for TXT in name_files5:
    with open(path_txt + '\\' + TXT, "r") as content:
        search = re.search(r'(?:\d(?:[\s,.\-\xAD_]|(?:\\r)|(?:\\n))*){20}', content.read())
    if search is not None:
        name5 = search.group(0)
        name5 = name5.replace("\n", "")
        name5 = name5.replace("\r", "")
        name5 = name5.replace("n", "")
        name5 = name5.replace("r", "")
        name5 = name5.replace("-", "")
        name5 = name5.replace("\\", "")
        name5 = name5.replace("/", "")
        name5 = name5.replace(".", "")
        name5 = name5.replace(" ", "")
        fp = os.path.join("20_digitos", name5 + "_%d.txt")
        postfix = 0
        while os.path.exists(fp % postfix):
            postfix += 1
        os.rename(
            os.path.join(path_txt, TXT),
            fp % postfix
        )

I made other loops to find other sequences for other more or less digit sequences, using replace in the same way, including for the hyphen and worked smoothly

editing: example of how the sequence appears in the text, and how he renamed the file, the "_0" is just an increment to differentiate the files when you already have one with the same name

As it appears in the text:

0001018-88.2011.5.02.0002

As he renamed:

0001018-8820115020002_0

  • Also put an example of text for which this code fails.

  • @Pabloalmeida made

  • 1

    @Matt If you only want the numbers, then you can do something simpler, like: ''.join([letter for letter in name5 if letter.isdigit()]) , and ;filter(lambda x: x.isdigit(), name5) This would greatly simplify and perhaps solve the problem

  • 2

    There are several characters in Unicode that "seem" to hyphenate but are not. The @Klaus tip above to filter only the digits is better than what you’re doing anyway.

1 answer

0

There will be a reference for you. You will certainly be able to implement to save the file.

import re
import os
from os import path

def search_files(prefix:'Uso: /home/user'):
    if not path.lexists(prefix):
        raise FileNotFoundError('Diretório ou arquivo não existe!')

    if path.isfile(prefix):
        return list(prefix)

    return [path.join(prefix, file) for file in os.listdir() if path.isfile(file)]

def map_numbers_in_files(prefix:'Uso: /home/user ou /home/user/file.txt'):
    absolute_path_files = search_files(prefix)

    if not absolute_path_files: 
        raise FileNotFoundError('Não existem arquivos nesse diretório.')

    pattern = re.compile(r"[\d]+")

    print('Numero'.ljust(25), 'Arquivo', sep='\t', end='\n\n')
    for path_file in absolute_path_files:
        with open(path_file) as file:
            for line in file.readlines():
                _matchs = pattern.findall(line)
                numbers = ''.join(_matchs)
                if numbers:
                    print(numbers.ljust(25), path_file, sep='\t')



map_numbers_in_files('/home/runner')

"""
Numero                      Arquivo

0                           /home/runner/_test_runner.py
1                           /home/runner/_test_runner.py
1                           /home/runner/_test_runner.py
1                           /home/runner/.bashrc
1                           /home/runner/.bashrc
1                           /home/runner/.bashrc
1000                        /home/runner/.bashrc
2000                        /home/runner/.bashrc
1                           /home/runner/.bashrc
1                           /home/runner/.bashrc
48                          /home/runner/.bashrc
6429                        /home/runner/.bashrc
1033013203300033013403300   /home/runner/.bashrc
1                           /home/runner/.bashrc
101                         /home/runner/.bashrc
01310135013601320101        /home/runner/.bashrc
1                           /home/runner/.profile
022                         /home/runner/.profile
1                           /home/runner/.bash_logout
1                           /home/runner/.bash_logout"""

Browser other questions tagged

You are not signed in. Login or sign up in order to post.