Compare files in folder and delete specific ones in Python

Asked

Viewed 110 times

2

I am very beginner and I have a doubt in a situation that I am creating, if anyone can help me thank you.

I am trying to develop a program in Python that solves my daily stress of deleting some files in folders. I have to check the files by name in this format:

  • 01 - Carlos
  • 01 - TSE
  • 02 - John
  • 02 - TSE
  • 03 - TSE
  • 04 - Mary
  • 04 - TSE

Deleting the "TSE" that has number equal to names and leaving the "TSE" that has no other aruivo with same number, the list would be like this:

  • 01 - Carlos
  • 02 - John
  • 03 - TSE
  • 04 - Mary

I can even delete all the files that have TSE in their name, but I should leave those that don’t match with numbered names. My code is like this:

from time import sleep
import os
import glob

pasta = os.getcwd()
tipo = '*.txt'
texto = 'TSE'
arquivos = glob.glob1(pasta, tipo)

n = 1
y = 1
ok = int(input(f'''
<====>\033[1;31mOS arquivos "{tipo}" com "{texto}" em seus nomes serão DELETADOS!\033[m<====>
\n
\033[0;33m[1] OK
[2] CANCELAR\033[m \n
Opção: '''))
if ok == 1:
    for x in arquivos:
        txnum = str(n).zfill(2)
        if texto and txnum in x:
            os.remove(arq)
            print(f'Arquivo "{x}" DELETADO!!')
            print('=' * 20)
    n += 1
print('Finalizando o Programa!!')
sleep (3)
exit()
  • I would suggest that first create an array with all the filenames in the folder, then iterate that array and check if that file ends in TSE and has any item in the list that starts by the file’s Numer, skipping, of course, the iteration item

2 answers

3


Code:

    #Ponto 1:
    for indice in range(len(arquivos)-1):
    
      #Ponto 2
      if arquivos[indice][:3] == arquivos[indice + 1][:3]:

        #Ponto 3:           
        if texto in arquivos[indice + 1]:
            os.remove(arquivos[indice + 1])
            print(f'Arquivo "{arquivos[indice + 1]}" DELETADO!!')
            print('=' * 20)

Point 1:

Changing the loop and using a for i in range(Len(x)) instead of for i in x. We use this whenever we need to make comparisons between the items on the list. This way we have a repetition controlled by the size of the vector, being able to change between the indexes of each item. This is important to always have the ordered vectors, this simplifies several comparisons within the list, so if the list is not ordered by the Operating System, make a method for this. More details about repeat structures here

Point 2:

This line compares the number of the file working with sub-strings. Because it is only two digites, the 3rd character is used as a stop point. For more information about strings, visit documentation here.

Point 3:

Since the list is ordered, just delete the file from the sequence, comparing before if it really has the term 'TSE' present in it.

Complete code:

from time import sleep
import os
import glob

pasta = os.getcwd()
tipo = '*.txt'
texto = 'TSE'
arquivos = glob.glob1(pasta, tipo)

n = 1
y = 1
ok = int(input(f'\033[1;31mOS arquivos "{tipo}" com "{texto}" em seus nomes serão DELETADOS!\n\n' +
                '\033[0;33m[1] OK\n[2] CANCELAR\033[m\nOpção: '))

if ok == 1:
  for indice in range(len(arquivos)-1):
    if arquivos[indice][:3] == arquivos[indice + 1][:3]:      
      if texto in arquivos[indice + 1]:
        os.remove(arquivos[indice + 1])
        print(f'Arquivo "{arquivos[indice + 1]}" DELETADO!!')
        print('=' * 20)

print('Finalizando o Programa!!')
sleep (3)
exit()
  • 1

    Got it! I’m still new to the community and I’m adapting. Thanks for the tips @Augusto Vasques!

  • Thank you Octavian, it really seems much simpler the way you posted. I’ll try here and give you an answer!

  • It really went top! Freed my problem only reduced the substring to [:1], because it prevents errors in typing the name do not erase it ex (01-aasd, 01 - Asd, and 01--Asd, are different from the [:3] but equal from the [:1].

  • 1

    Glad I helped! As soon as you can, mark one of the answers as accepted, so other users with the same doubt will know that you already have a solution.

  • Octavian, I made a small change. I used [:2] instead of [:3] and repeated the IF structure to delete also if the ([Indice] files) has TSE text and not only the ([Indice+1] files), because there are cases where the order is different due to Windows classification. Thank you very much! Taught me a lot!

2

It was not clear what should happen in this case:

  • 01 - Carlos
  • 01 - TSE
  • 02 - TSE
  • 02 - Mary
  • 03 - Mary
  • 03 - Carlos

As can be seen above, there are multiple references of the same name in different numbers and there is no exclusivity of them in any number.

Now, if this is not a real possibility, I believe that the following code should solve your problem:

import os
from collections import defaultdict

arquivos = ['01 - Carlos', '01 - TSE', '02 - Joao', '02 - TSE', '03 - TSE', '04 - Maria', '04 - TSE']

d = defaultdict(list)
l = []

formatar_arquivo = lambda num, nome: '{} - {}'.format(num, nome)

for arquivo in arquivos:
    num, nome = map(str.strip, filter(bool, arquivo.split('-')))
    arq = formatar_arquivo(num, nome)
    if arq != arquivo:
        os.rename(arquivo, arq)
    l.append(nome)
    d[num].append(nome)

for k, v in d.items():
    if len(v) == 1:
        continue
    for i in v:
        if l.count(i) == 1:
            for j in filter(lambda x: x != i, v):
                arq = formatar_arquivo(k, j)
                os.remove(arq)
            break

This solution is compatible with Python versions 2 and 3.

  • 1

    Thank you very much Rfroes87, the reply of the Octavian perfectly cleared my doubt, but I will look at your code calmly, because it is more complex.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.