List iteration problem using any() function

Asked

Viewed 37 times

1

I created a script to delete files from a folder and its respective subfolders. These subfolders are named as follows: 1.GERAL_%i with i ranging from 1 to 137. However, I just want to delete the files from 89 folder on, I don’t want to delete folders 1 to 88. I have done the script below:

import os

my_path = os.getcwd()
foldersToSkip = list(f'1.GERAL_{i}' for i in range (1,89))

for root, dirs, files in os.walk(my_path):
    if not any(x in root for x in foldersToSkip):
        for file in files:
            if any(y in file for y in filesToDel):
                fileToDel = os.path.join(root,file)
                os.remove(fileToDel)

Apparently it was supposed to work, but the code didn’t delete any file. When I debugged, I noticed the error. I noticed that the command x in root for x in foldersToSkip does not make an exact comparison between the string x and the list string foldersToSkip, but the list string was only contained in the string x. For example: script does not delete files from subfolder 1_GERAL_90 why there is a string 1_GERAL_9 on the list foldersToSkip, and when he does the comparison the output ends up being true.

My question then is, how do I change my code to make this comparison accurate?

1 answer

2


Basically the function os.walk returns a tuple with three values:

  1. dirpath, one string
  2. dirnames, a list
  3. filenames, a list

When you do x in root, your root will be a string and therefore will be checking whether the string has as substring the desired value. That’s why it happened the behavior you verified, to skip the directory 1_GERAL_90, because the check you performed was '1_GERAL_9' in '1_GERAL_90'.

To do what you need, you can simplify using the module pathlib along with the shutil:

from pathlib import Path
import shutil
import re

def must_be_deleted(name):
    groups = re.match(r'1\.GERAL_(\d+)', name)
    if groups and int(groups[1]) >= 89:
        return True
    return False


path = Path.cwd()

for item in path:
    if item.is_dir() and must_be_deleted(item.name):
        shutil.rmtree(item)

Basically all items are covered within path, if it is a directory and the end number of the name is greater than or equal to 89, the directory is deleted.


Since it is already defined that you need to delete directories with the number between 89 and 137, you do not need to use the regular expression, just go through a value in this range and delete the directory that exists. This detail went unnoticed initially, so I suggested the regular expression.

So I could be like:

import pathlib
import shutil

cwd = pathlib.Path.cwd()

for i in range(89, 138):
    item = cwd / f'1.GERAL_{i}'
    if item.is_dir():
        shutil.rmtree(item)
  • So, I don’t want to delete the folders, but only the files inside the folders. But this will only change the final part of the code. The solution I was looking for is the function must_be_deleted that you presented. Thank you very much!

  • Tell me more about how you want this string comparison to work with integer here: if groups and int(groups[1]) >= 89:

  • @jsbueno Which string comparison with integer?

  • The shutil was used to remove non-empty directories as it had understood that it was these that should be deleted, not just the files in them.

  • ah - there’s conversion there - I’ve seen regexps to deal with numbers in sequence and I’ve filled my head with "doesn’t work" before reading everything. But looking at it calmly, it’s fine.

  • is - I was thinking here - pathlib has no equivalent to everything in shutil, and no equivalent to "os.walk" - that’s a shame. It has the "Path.iterdir" for a non-rerecursive search, and the "Path.glob" for a recursive search - but it is "Eager", not Lazy like os.walk

  • I even thought about not using regex - the first version of the code was name[name.rfind('_')+1:], but as this would make room for other name formats ending with this suffix, I ended up putting regex to simplify

  • @jsbueno just now saw that in the question it limits up to 137 the name of the folder and I understood what you meant. I had not seen this limitation at first, so can do without the regex well quietly even... I will edit

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.