Compare files in folder and delete specific ones in Python

Question

Compare files in folder and delete specific ones in Python

Asked 4 years, 8 months ago

Viewed 110 times

2

I am very beginner and I have a doubt in a situation that I am creating, if anyone can help me thank you.

I am trying to develop a program in Python that solves my daily stress of deleting some files in folders. I have to check the files by name in this format:

01 - Carlos
01 - TSE
02 - John
02 - TSE
03 - TSE
04 - Mary
04 - TSE

Deleting the "TSE" that has number equal to names and leaving the "TSE" that has no other aruivo with same number, the list would be like this:

01 - Carlos
02 - John
03 - TSE
04 - Mary

I can even delete all the files that have TSE in their name, but I should leave those that don’t match with numbered names. My code is like this:

from time import sleep
import os
import glob

pasta = os.getcwd()
tipo = '*.txt'
texto = 'TSE'
arquivos = glob.glob1(pasta, tipo)

n = 1
y = 1
ok = int(input(f'''
<====>\033[1;31mOS arquivos "{tipo}" com "{texto}" em seus nomes serão DELETADOS!\033[m<====>
\n
\033[0;33m[1] OK
[2] CANCELAR\033[m \n
Opção: '''))
if ok == 1:
    for x in arquivos:
        txnum = str(n).zfill(2)
        if texto and txnum in x:
            os.remove(arq)
            print(f'Arquivo "{x}" DELETADO!!')
            print('=' * 20)
    n += 1
print('Finalizando o Programa!!')
sleep (3)
exit()

I would suggest that first create an array with all the filenames in the folder, then iterate that array and check if that file ends in TSE and has any item in the list that starts by the file’s Numer, skipping, of course, the iteration item

– Costamilam

2020/10/24 at 14:14

2 answers

3

Code:

    #Ponto 1:
    for indice in range(len(arquivos)-1):
    
      #Ponto 2
      if arquivos[indice][:3] == arquivos[indice + 1][:3]:

        #Ponto 3:           
        if texto in arquivos[indice + 1]:
            os.remove(arquivos[indice + 1])
            print(f'Arquivo "{arquivos[indice + 1]}" DELETADO!!')
            print('=' * 20)

Point 1:

Changing the loop and using a for i in range(Len(x)) instead of for i in x. We use this whenever we need to make comparisons between the items on the list. This way we have a repetition controlled by the size of the vector, being able to change between the indexes of each item. This is important to always have the ordered vectors, this simplifies several comparisons within the list, so if the list is not ordered by the Operating System, make a method for this. More details about repeat structures here

Point 2:

This line compares the number of the file working with sub-strings. Because it is only two digites, the 3rd character is used as a stop point. For more information about strings, visit documentation here.

Point 3:

Since the list is ordered, just delete the file from the sequence, comparing before if it really has the term 'TSE' present in it.

Complete code:

from time import sleep
import os
import glob

pasta = os.getcwd()
tipo = '*.txt'
texto = 'TSE'
arquivos = glob.glob1(pasta, tipo)

n = 1
y = 1
ok = int(input(f'\033[1;31mOS arquivos "{tipo}" com "{texto}" em seus nomes serão DELETADOS!\n\n' +
                '\033[0;33m[1] OK\n[2] CANCELAR\033[m\nOpção: '))

if ok == 1:
  for indice in range(len(arquivos)-1):
    if arquivos[indice][:3] == arquivos[indice + 1][:3]:      
      if texto in arquivos[indice + 1]:
        os.remove(arquivos[indice + 1])
        print(f'Arquivo "{arquivos[indice + 1]}" DELETADO!!')
        print('=' * 20)

print('Finalizando o Programa!!')
sleep (3)
exit()

1

Got it! I’m still new to the community and I’m adapting. Thanks for the tips @Augusto Vasques!

– Octávio Lage

2020/10/24 at 15:27
Thank you Octavian, it really seems much simpler the way you posted. I’ll try here and give you an answer!

– Veloso

2020/10/24 at 15:52
It really went top! Freed my problem only reduced the substring to [:1], because it prevents errors in typing the name do not erase it ex (01-aasd, 01 - Asd, and 01--Asd, are different from the [:3] but equal from the [:1].

– Veloso

2020/10/24 at 16:05
1

Glad I helped! As soon as you can, mark one of the answers as accepted, so other users with the same doubt will know that you already have a solution.

– Octávio Lage

2020/10/24 at 16:10
Octavian, I made a small change. I used [:2] instead of [:3] and repeated the IF structure to delete also if the ([Indice] files) has TSE text and not only the ([Indice+1] files), because there are cases where the order is different due to Windows classification. Thank you very much! Taught me a lot!

– Veloso

2020/10/26 at 23:11

Browser other questions tagged python-3.x

You are not signed in. Login or sign up in order to post.

by Rfroes87 • **465** points · Answer 1 · 2020-10-24T15:31:03+00:00

It was not clear what should happen in this case:

01 - Carlos
01 - TSE
02 - TSE
02 - Mary
03 - Mary
03 - Carlos

As can be seen above, there are multiple references of the same name in different numbers and there is no exclusivity of them in any number.

Now, if this is not a real possibility, I believe that the following code should solve your problem:

import os
from collections import defaultdict

arquivos = ['01 - Carlos', '01 - TSE', '02 - Joao', '02 - TSE', '03 - TSE', '04 - Maria', '04 - TSE']

d = defaultdict(list)
l = []

formatar_arquivo = lambda num, nome: '{} - {}'.format(num, nome)

for arquivo in arquivos:
    num, nome = map(str.strip, filter(bool, arquivo.split('-')))
    arq = formatar_arquivo(num, nome)
    if arq != arquivo:
        os.rename(arquivo, arq)
    l.append(nome)
    d[num].append(nome)

for k, v in d.items():
    if len(v) == 1:
        continue
    for i in v:
        if l.count(i) == 1:
            for j in filter(lambda x: x != i, v):
                arq = formatar_arquivo(k, j)
                os.remove(arq)
            break

This solution is compatible with Python versions 2 and 3.