How to delete files and folders recursively safely with Python?

Asked

Viewed 620 times

2

Guys, I’m trying to create a script to delete my files and folders in a safe way, just like it does the shred, srm, etc...but I would like to do this with Python script. I was able to find a function on the Internet that records random values in the file before deleting it. I know there is nothing unrecoverable, but as a didactic I would like to implement a script like this.

I have that function:

def secure_delete(file_, steps=3):
    import os
    with open(file_, "ba+", buffering=0) as f:
        data = f.tell()
    f.close()
    with open(file_, "br+", buffering=0) as f:
        for i in range(steps):
            f.seek(0, 0)
            f.write(os.urandom(data))
        f.seek(0)
        for _ in range(data):
            f.write(b'\x00')
    os.remove(file_)

Passing the file as argument, I can do the action, but I would like to do this recursively in a directory for all files and not opening a specific file or passing it as argument.

Does anyone have any idea?

4 answers

3

Writing data over the data of a file is not "safe" - and the reason is that it depends on the filesystem layer (F.S.) of the operating system to decide what to do when you open a file to write - and if you see, none of them, for various reasons, will record the data at the same physical position of the disk on which the previous data were.

The idea that when opening a file for reading and writing, you can modify a single byte of the file, close, and read again and have all the original data with that 1 byte different, is convenient for programs at high level, but it is only an abstraction of the operating system.

In practice, because of how disk data access has evolved historically, the system can only write blocks of at least 512 bytes - but most likely 4Kbytes (4096) at once - which is done when you modify a single byte is: - the lowest layers of the S.O. read 4KB of the disk (even if the file is smaller, any dirt that is on the disk after the arqiuvo is read to memory) - the S.O. highest-level layer changes the desired byte within that block - the 4KB are returned to the disc, but not in the same position - if all went well so far, the F.S. layer of the operating system changes the file metadata to read the 4KB of the new position, not the old one. (mainly in modern F.S. which has a journaling mencanism - which allows the old version of the file to be read, if any failure happens before the end of the whole process).

Any tool that has access to the raw data on the disk partition can then access the "over-written" file data. It can take a good job to reconstruct these files from the pieces found - and many parts may be physically over-written, by the very process of rewriting the files, but it is more to "chance" than deliberate destruction.

If you are on a Linux operating system, the raw partition data is accessible simply by opening the special devices on /dev/ as if they were normal files. (In this case, read and write in those device files, yes, makes bytes read and written at the same physical disk positions - precisely why the filesystem code used by the kernel to access physical devices as files does this)

So, if you have any special bibliotca, that can understand the data structure of the particular filesystem that you intend to change, it is possible, according to the "raw" (raw) device, to write the exact bytes, irreversibly, of the files, as you want.

The work to do this correctly is at least an order of magnitude more complex than just using the user layer to open and record arches, as you want to do. (some filesystems are so complex that dozens of the best devs in the world took more than 10 years to access their data directly correctly in a parallel implementation - see the history of NTFS drivers on Linux, for example, or ZFS). But if it’s a FAT, or FAT 32 - still used on some USB drives, it’s more or less quiet to redo the access, and it can be a pretty fun project. (But on USB drives, I do not know if the layer of low-level, device firmware, does not remap the blocks - ie for any access in software, yes, the data is "over-written" - but it may be that physically are still there - on SSD disks, for sure this happens)

I believe that as to over-write the files recursively, the other answers here already account for - so I won’t add yet another example of how to do.

  • Our thanks for this text full of knowledge. then it would be better if I use this way to record the data I used and yes just encrypt the file?

  • Encrypting the files is of no use in this context: the encrypted data will be written in a different position from the disk, and the original content will be there for anyone accessing the device directly. Direct HD encryption, yes, is something that all operating systems today allow, and without typing the correct password when turning on the computer, there is no way to read.

  • Already, to make it difficult to access - without going to the point of over-writing the data on the physical layer of the disk, write a file of the same name in the same folder, and then delete it, is enough: with this someone just with a tool to look at the filesystem metadata to "unscrew" a file, you’ll arrive in the wrong position. But you should also erase the operating system caches, and take care that mechanisms like Mac OS "time machinne" do not have copies of previous versions of the file.

  • I understood. And in the case of Linux, deleting the caches...?

2


Good, taking this example you can do so:


import os
import shutil
import uuid

def recursive_listing(path):
    files = []

    # r = root, d = directories, f = files
    for r, d, f in os.walk(path):
        for file in f:
            files.append(os.path.join(r, file))
        for dirs in d:
            files.append(os.path.join(r, dirs))

    list_ = [file for file in files]
    return list_


def secure_delete_recursive(path, steps=5):

    objects = recursive_listing(path)

    for obj in objects:
        # Para arquivos (gravando, renomeando e deletando)
        if os.path.isfile(obj):
            try:

                with open(obj, "ba+", buffering=0) as f:
                    data = f.tell()
                f.close()

                with open(obj, "br+", buffering=0) as f:
                    for i in range(steps):
                        f.seek(0, 0)
                        f.write(os.urandom(data))
                    f.seek(0)
                    for _ in range(data):
                        f.write(b'\x00')

                name = str(uuid.uuid4())
                new_file_rename = os.path.join(os.path.split(obj)[0], name)
                os.rename(obj, new_file_rename)
                # Descomente a linha abaixo para deletar os arquivos recursivamente.
                # os.remove(new_file_rename)

            except PermissionError as p:
                print(p)

    for obj in objects:
        # Para diretórios (renomeando e deletando)
        if os.path.isdir(obj):
            try:

                name = str(uuid.uuid4())
                new_file_rename = os.path.join(os.path.split(obj)[0], name)
                os.rename(obj, new_file_rename)
                # Descomente a linha abaixo para deletar as pastas recursivamente.
                # shutil.rmtree(new_file_rename, ignore_errors=False, onerror=None)

            except PermissionError as p:
                print(p)


if __name__ == '__main__':
    secure_delete_recursive('/tmp')

You can also use the library Cryptography with this same idea of this example above, is even safer than recording binaries or, you can use the two forms together that reform even more. I hope it helped.

NOTE: If you don’t have permission in files and folders, it won’t work for certain objects.

  • our was just what I wanted. thank you.

  • I will see yes on the library

  • If no call is made f.flush() between the step of recording random nḿeros and the step of recording 0s, it is as if the first step does not even exist - the random numbers in most cases will not even be transferred to the disk.

  • Something else: for _ in range(data):
 f.write(b'\x00') is inefficient note 15, on a scale from 0 to 10: there are two context changes to the Python application for each byte file - if it is a 1GB file, it can wait a few hours. f.write(b'\x00' * len(data)) ( of course, for files of GB order, better write a strategy that does this in blocks, but are a few lines)

  • jsbueno yes it is true. Great files the script performance would be a huge pause. I didn’t notice this in his script.

  • so I have to take out the for _ in range(date): f.write(b' x00'), it doesn’t need to? I tried to put the * Len(date) pq are different types.

Show 1 more comment

2

You can read the current directory files using the.listdir('.') or indicate the directory you want to read.

Another way is using glob.glob('*.dat') where the answer would be a list of files, in this case the example, with extension ". dat".

The function input could be the list of files and treated all with a "for" or could make a "for" and use the function for each file.

If you want to see subdirectory files you can consider using the "os.walk.".

1

I have a script that deletes all entries from Windows cache folders, I think this can help you.

import os 
import win32con, win32api

list = ['C:\Windows\Prefetch','C:\Windows\Temp']

def clear_data(locate):
    for raiz, diretorios, arquivos in os.walk(locate):
        for arquivo in arquivos:
            try:
                print(arquivo)
                win32api.SetFileAttributes(os.path.join(raiz,arquivo), win32con.FILE_ATTRIBUTE_NORMAL)
                os.remove(os.path.join(raiz,arquivo))
            except:
                print(arquivo+' Erro')


temp = os.getenv('temp')
temp = temp.replace('Roaming','\Local\Temp')
list.append(temp)


for i in list:
    clear_data(i)

exit()
  • I understand...but the function I’ve done is that it does the same thing as the srm, or shred? Type, open the file and write binary data in it not to be recovered?

  • The one I introduced only deletes cache files from some folders like temp and Prefetch.

  • inside Try Voce can change all that to an object containing its function and set in LIST the directory you want to delete

  • Um, I tended. But even if I could do that, I would need to know how to make the file unrecoverable and unreadable if it now works. I’ll take your tip to leave recursive

  • I would do the following. Turn the file into a TXT by changing its extension. Then use a lib to encrypt txt content using a hard hash. would save to txt and after that delete the file.

  • 1

    Um, I hadn’t thought of that. Thank you :)

  • And Voce wants only to destroy the file, not repcisa "encrypt" the file - only over-write already does everything. Another thing: the extension of a file does not make the slightest difference to libs, or to running programs, is just a hint (in the case of windows, to the operating system - in the case of Unix, a hint only to the users) of which program should open that file. The byte content of a read file does not change any extension it has.

  • 1

    But then - @Willianjesusdasilva - I hope you get the idea that you don’t even need to encrypt - randomized data is even better, and that the extension doesn’t make the slightest difference. That said, your memory of deleting the system cache files is quite pertinent - I’ll leave an upvote on that account.

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.