2
I made this script to read a TXT file, find a sequence of 20 digits in the text, and rename the file with the digit sequence found.
I used the replace
to remove all the characters that appear between the numbers, but somehow he did not remove the hyphens when renaming.
name_files5 = os.listdir(path_txt)
for TXT in name_files5:
with open(path_txt + '\\' + TXT, "r") as content:
search = re.search(r'(?:\d(?:[\s,.\-\xAD_]|(?:\\r)|(?:\\n))*){20}', content.read())
if search is not None:
name5 = search.group(0)
name5 = name5.replace("\n", "")
name5 = name5.replace("\r", "")
name5 = name5.replace("n", "")
name5 = name5.replace("r", "")
name5 = name5.replace("-", "")
name5 = name5.replace("\\", "")
name5 = name5.replace("/", "")
name5 = name5.replace(".", "")
name5 = name5.replace(" ", "")
fp = os.path.join("20_digitos", name5 + "_%d.txt")
postfix = 0
while os.path.exists(fp % postfix):
postfix += 1
os.rename(
os.path.join(path_txt, TXT),
fp % postfix
)
I made other loops to find other sequences for other more or less digit sequences, using replace in the same way, including for the hyphen and worked smoothly
editing: example of how the sequence appears in the text, and how he renamed the file, the "_0" is just an increment to differentiate the files when you already have one with the same name
As it appears in the text:
0001018-88.2011.5.02.0002
As he renamed:
0001018-8820115020002_0
Also put an example of text for which this code fails.
– Pablo Almeida
@Pabloalmeida made
– matt
@Matt If you only want the numbers, then you can do something simpler, like:
''.join([letter for letter in name5 if letter.isdigit()])
, and ;filter(lambda x: x.isdigit(), name5)
This would greatly simplify and perhaps solve the problem– klaus
There are several characters in Unicode that "seem" to hyphenate but are not. The @Klaus tip above to filter only the digits is better than what you’re doing anyway.
– jsbueno