4
I need to make a program that finds repeated files on my computer, so that the user decides what action to take with these files (e.g.: delete the copies). For now, I only care about a binary file comparison (i.e., the file is only duplicated if it is 100% the same as another)
I know that searching only by file name is insufficient, since the same file may have been saved under another name.
Is there any algorithm to compare the files?
I imagine that generating the checksum of all the files and comparing all against all is unproductive, because it is not normal to have so many duplicated files. I also imagine that you can not only use the file size. And there may be cases where the file is duplicated more than once.
Yes this algorithm exists. Git itself works with something like this, comparing files to see which latest version for example. I believe this link can help you http://algorithms.openmymind.net/search/binarysearch.html
– Luiz Picolo
Compare the sizes and if they are equal, cksum? Do you want the filesystem search method or comparison method? Or both?
– Vitor Py
@Vitorbraga updated the question, looking for a comparison method
– woliveirajr
the weight and content has to be the same, but the timestamp of when it was created and changed, must be part of the comparison of duplicate files?
– Guilherme Nascimento
@Guilhermenascimento files are identical when they have the same content (i.e., the two files have the same bytes) and thus the same size. So no matter the name, timestamp, access date... these are "metadata" of the files.
– woliveirajr
@woliveirajr ok, it was just to get a sense. For there are several types of comparison.
– Guilherme Nascimento
@Guilhermenascimento :) yes, yes. I even thought about putting some complication in the future (like the image being the same, even if in another resolution), but I gave up, now it’s already giving enough work. Sorry if the previous comment seemed rude!
– woliveirajr
@woliveirajr quiet, did not seem rude no. I found your question interesting.
– Guilherme Nascimento