1
I was in charge of creating a code that navigates to the server folders looking for files with wrong nomenclature (Accents, spaces, punctuation and special characters..) where I have broken through the logic of regex and cannot progress because all the files have an extension ex: .txt, . pdf, . doc among others.
And that blessed point makes my code simply stay in that dictated "It’s either 8 or 80" because in the regex the expression \W
captures all other characters I want (all non-alphanumeric) but as the "." comes together in this capture, files that have the correct nomenclature as for example: .txt file are accused as files with wrong nomenclature on account of the blessed point.
Follows the code:
import os, re;
def encontraArquivosEmPastaRecursivamente(pasta):
arquivosTxt = []
caminhoAbsoluto = os.path.abspath(pasta)
for pastaAtual, subPastas, arquivos in os.walk(caminhoAbsoluto):
arquivosTxt.extend([os.path.join(pastaAtual,arquivo)
for arquivo in arquivos
if(re.findall(r'[áàâãéèêíïóôõöúçñÁÀÂÃÉÈÍÏÓÔÕÖÚÇÑ\s\W]', arquivo))])
arquivo = open('lista_de_arquivo.txt', 'w')
for arquivosTxt in arquivosTxt:
arquivo.write(arquivosTxt + '\n')
arquivo.close()
encontraArquivosEmPastaRecursivamente('c:/Users/paulo/Desktop/Ambiente_de_arquivos')
Filenames that should be in the file "list_de_file.txt" after the code runs:
1.txt file (file with spaces);
archive1! @#$% &()_+` {[ª ~}]º,. ;-. txt (file with special characters);
file 1.txt (file with accents);
In turn, what should not be on the list:
file 4.txt (file without space, without accent and without special character) (but it appears on the list because of ".")
This is my test environment:
Ignore "pasta1" and "pasta2" are just to test code recursiveness.
It was exactly this logic that I couldn’t think of, I’m very new to Regex, Thank you very much for your help !!!
– Paulo César