Finding Duplicate Files

The place for all kinds of software for all kinds of operating systems.

Moderator: Dposcorp

Finding Duplicate Files

Postposted on Wed Nov 29, 2006 11:35 am

Is there a program to find duplicate files across multiple hard drives? Basically, I have shared folder that too many people have had access to over the years and I would like to see if there are duplicate files saved at multiple places in the folder under different file names. I am using Windows XP Pro/MCE 2005.

Thanks you for any advice.
apkellogg
Gerbil Elite
 
Posts: 921
Joined: Wed Feb 25, 2004 10:15 am

Re: Finding Duplicate Files

Postposted on Wed Nov 29, 2006 11:47 am

apkellogg wrote:Is there a program to find duplicate files across multiple hard drives? Basically, I have shared folder that too many people have had access to over the years and I would like to see if there are duplicate files saved at multiple places in the folder under different file names. I am using Windows XP Pro/MCE 2005.

Thanks you for any advice.


If they have different names, then they are different files.

You would probably need a program to search by size, but I am just guessing at this point.
Dposcorp
Minister of Gerbil Affairs
Silver subscriber
 
 
Posts: 2415
Joined: Thu Dec 27, 2001 7:00 pm
Location: Detroit, Michigan

Postposted on Wed Nov 29, 2006 11:52 am

red0510
Gerbil Elite
 
Posts: 614
Joined: Fri Mar 29, 2002 7:00 pm

Postposted on Wed Nov 29, 2006 1:43 pm

Stupid shell tricks FTW! If you have a set of Unix-style shell tools (like Cygwin) available, the following script will do it:
Code: Select all
#!/bin/bash
find "$@" -type f -print | sed -e 's/^/sha1sum "/; s/$/"/' | bash | sort | uniq --all-repeated=separate --check-chars=40 | sed -e "s/[^ ]* .//"

(If your browser has wrapped the above code, note that the only line break is after the "#!/bin/bash"; the rest is all one long line.)

Just invoke the script, passing the names of one or more drives or directories on the command line. The script searches all of the listed drives/folders, and lists each group of duplicate files it finds.

It works by recursively walking all of the specified drives/folders, generating a 160-bit checksum for each file, then finding all groups of files which have matching checksums.

So, e.g. if you've saved it as a script named dupfiles, the command:
dupfiles d:/ e:/
would find all duplicate files on your D: and E: drives.

I love little scripting puzzles like this... and it is also an excellent illustration of why I install Cygwin on all of my Windows boxes, and why IMO everyone should learn how to use UNIX-style shell commands. You can accomplish a whole lot with very little code.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37705
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Postposted on Wed Nov 29, 2006 3:38 pm

That doublekiller thing looks to be able to compare even by size and dates. :o
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 24422
Joined: Mon May 24, 2004 2:19 am

Postposted on Wed Nov 29, 2006 3:40 pm

Flying Fox wrote:That doublekiller thing looks to be able to compare even by size and dates. :o

Checksums are more reliable than looking at size and date though... :D
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37705
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Postposted on Wed Nov 29, 2006 4:12 pm

http://www.tucows.com/preview/373411

http://noclone.net/

There are lots of these types of apps floating around. However, I have yet to find one that does true byte-for-byte compares and doesn't produce false positives, and is easy to use.

So far the best I have come across - aside from a very few false postives - is the now very old ACDSee 3.2 with Duplicate file finder plugin. It works on more than just images, too.

Of course, finding an app that old is troublesome...
[/posting]
TheDVDMan
Graphmaster Gerbil
 
Posts: 1276
Joined: Fri Apr 30, 2004 2:34 pm
Location: Not TR anymore!

Postposted on Wed Nov 29, 2006 4:29 pm

I'm surprised that the tools give false positives; if the length of the files match, the tool should then do a byte-for-byte comparison of the contents to verify the match.

While false-positives are theoretically possible with a checksum-based approach like the one I gave the script for above, the odds are mathematically so low (it's a 160-bit hash, so the odds of getting a collision are vanishingly small) that practically speaking you'll never see one.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37705
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Postposted on Wed Nov 29, 2006 5:16 pm

just brew it! wrote:I'm surprised that the tools give false positives; if the length of the files match, the tool should then do a byte-for-byte comparison of the contents to verify the match.

While false-positives are theoretically possible with a checksum-based approach like the one I gave the script for above, the odds are mathematically so low (it's a 160-bit hash, so the odds of getting a collision are vanishingly small) that practically speaking you'll never see one.


Yup. I'm not sure what method ACDSee uses. It may only be a 32-bit checksum.

The other app the did give more than ACDSee did I can't remember the name of now. I think it's just called "DupFinder" or something.

I love ACDSee's dup finder; it has great options for auto-deleting the dups it finds...very few other apps seem to have that ability.
[/posting]
TheDVDMan
Graphmaster Gerbil
 
Posts: 1276
Joined: Fri Apr 30, 2004 2:34 pm
Location: Not TR anymore!


Return to General Software

Who is online

Users browsing this forum: No registered users and 4 guests