Without arguments,
dups.py
checks the current directory, recursively:$ dups.pyThis has been tested on the Mac OS X and cygwin, and should also work with Python for Windows.
Duplicates found:
./Data/2004/05_4/015_12A.jpg
./Data/2004/2004.09.29 Grandma/015_12A.jpg
Duplicates found:
./Data/2002/19/uvs021219-008.jpg
./Data/2006/01_2/uvs040430-006.jpg
...
There are lots of nerdy options, like filtering by file size and following symbolic links. Try
dups.py -h
to see them all:usage: dups.py [options] [<file_or_directory> ...]
Find duplicate files in the given path(s). Defaults to searching files recursively,
except for hidden files (beginning with "."), empty files, and symbolic links.
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-v, --verbose verbose
Exclusion Options:
-f, --flat do not scan directories recursively
-g n, --greater-than=n
only scan files of size greater than n bytes
-l n, --less-than=n
only scan files of size less than n bytes
Inclusion Options:
-L, --follow-links follow symbolic links (warning: beware of infinite
loops)
-H, --hidden-files include hidden files
-z, --zero-files include empty files
Miscellaneous:
-D, --delete delete subsequent duplicates (files are scanned in
argument-list order)
-c, --create-rel-links
replace subsequent duplicates with relative links
(non-Windows only)
-C, --create-abs-links
same as "-c", but links are absolute
-s, --special-hidden
changes meaning of "hidden files" (-H) depending on
platform: cygwin - uses Windows file attributes
(warning: slow); win32 - files with names starting
with "." considered hidden
P.S. I hacked together a way to detect Windows hidden files from cygwin but it's ugly and slow.
4/6/08 update: I added the ability to delete duplicates (-D), and create relative (-c) or absolute (-C) symbolic links.
Brendan, are you saying the results were different for different runs on the same data?
ReplyDeleteWould it be possible for you to send me some of the files that were reported duplicates?
The script actually returns whether the md5 hash of files match. I suppose with diverse enough data there could be some false positives, but it's pretty unlikely. I can add the final comparison check to eliminate these, if this is what is actually happening.
i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.
ReplyDelete