i-DeClone Returns 100% similarity on competely different TIFF images

Discussion and support for Desktop Rules and i-DeClone

Moderator: nikos

Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

Hello,

Searching for duplicate photographs in different volumes, i-DeClone grouped various TIFF photographs together, with differences from 97% to various 100% similarity. I thought that was odd, because the names are different, and I know what the photos are about.

Fair enough, I opened them and they are absolutely different. I mean totally different. Different places, indoors/outdoors, people/no people etc.

Below is an example group results:

Code: Select all

Name		Size	Date modified		Group #		% Similar	
que_003_.tif	18.8 MB	03/12/2020 18:15	2		97	
que_009_.tif	17.7 MB	03/12/2020 18:17	2		100	
que_008_.tif	17.6 MB	03/12/2020 18:17	2		100	
que_007_.tif	17.6 MB	03/12/2020 18:16	2		100	
que_006_.tif	18.3 MB	03/12/2020 18:16	2		98	
que_005_.tif	17.6 MB	03/12/2020 18:16	2		98	
que_004_.tif	17.6 MB	03/12/2020 18:15	2		100	
I can eventually post a snippet of the photographs, if necessary.

I don't think i-DeClone uses a hash comparison to check for files that are 100% the same, but I checked just in case and, of course, they are completely different:

The only similarity is that all of these were scanned with the same scanner and at same settings, but I have hundreds of other photos scanned like this and that don't show up as duplicates when scanning their folders.

If I change the scan to find only 100% similarity and compare file contents, then indeed those files are not identified as duplicates.

Does that mean that, to be on the safe side, we should always compare the contents when seeking duplicates in photographs?

THanks
User avatar
nikos
Site Admin
Site Admin
Posts: 16295
Joined: 2002 Feb 07, 15:57
Location: UK

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by nikos »

I tale it you are scanning for similar photos. The question is what kind of options did you use? Did you leave declone defaults or have you added your own criteria?
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

Hello,

The photos are quite (or completely) different. Here's a sample: https://imgur.com/a/oD1sqME
  • Blured sample of 6 photos. Sufficient to see how different they are.
  • Selected settings
  • Results

Maybe I'm picking the wrong settings, but I had initially gone for the default ones (in fact, I've never changed the settings until I realized this was happening with these photographs).

Thanks
User avatar
nikos
Site Admin
Site Admin
Posts: 16295
Joined: 2002 Feb 07, 15:57
Location: UK

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by nikos »

I see the problem is your settings. First you should be searching for PHOTOS, not all files
then make sure that CONTENT is ticked in the advanced scan property page (that should be on by default if you ticked similarity search)
see here for more information: www.zabkat.com/declone/phone-remove-similar-photos.htm
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

Hello,

OK, so when it comes to photographs (and other file types), the only way to really make a good comparison is by comparing the file contents. That makes sense.

Thanks a lot!
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

Hello, Nikos,

So, I've been following as suggested (compare Photos, compare content), but I'm still getting "100% match" photographs that are quite different.

I dropped a set of "100%" match in the following link (available for 14 days). I did pixelise some photos that have people in them, but even then it is very obvious that they have different content.

I notice that for one set, the photos were taken on the same day at the same event, but even then, sizes and/or content are different.

I've been doing hash comparisons using other tools, and so far that's been very efficient in identifying photographs that are 100% equal. I do miss when it comes to similarity (but different format or dimensions), but at least I don't run the risk of deleting photographs that aren't equal.

Link: https://app.filen.io/#/f/eecd6461-000d- ... bzlwsvh5se

Password: I'll send you a direct message with it.
User avatar
nikos
Site Admin
Site Admin
Posts: 16295
Joined: 2002 Feb 07, 15:57
Location: UK

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by nikos »

again the question is, what settings did you use?
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

The following:
  • Search for: Photos
  • Search for Sub-folders too
  • Find files similar by at least 94%
  • Properties that determine if two files are duplicate or not: Automatic, based on scan type
  • Files must have same extension: yes
  • Compare file content (slower): yes
  • Skip protected system files and folders
User avatar
nikos
Site Admin
Site Admin
Posts: 16295
Joined: 2002 Feb 07, 15:57
Location: UK

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by nikos »

taking a sample of your pictures, these are considered "similar" correctly IMHO:
Image

declone's pic similarity algorithm doesn't have a way to give exact similarity measure. It samples statistically parts of the photos a few times. This means that completely different photos can occasionally be deemed "similar" and the similarity measure will not represent the actual similarity that your own eyes would expect

the photo comparison was the first one I developed a few years ago, and needs improving. In the meantime use a similarity level above 95% (do you have the latest version 1.92?) which uses a stricter comparison for fewer false positives

and always double check the results before you start deleting!
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

Indeed they are "very similar", but not 100%, and to be honest I expected that comparing the contents would be the same as doing a hash comparison.

Actually, is there such option as a hash or checksum comparison? I would suggest including that (even if that means taking longer to do the scan), say CRC32 or MD5, or whatever is more efficient (I'm not an expert). By checking these, the user can be certain if the files are the same or not, so one could use "same CRC" has an option for marking/grouping.

I don't have version 1.92, I will download and try that.

Thanks.
User avatar
nikos
Site Admin
Site Admin
Posts: 16295
Joined: 2002 Feb 07, 15:57
Location: UK

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by nikos »

binary contents comparison is very easy and accurate, see here how to turn it on:
www.zabkat.com/declone/find-renamed-duplicates.htm

I agree that in photo similarity declone shouldn't return 100% for images that are not exactly the same. I will take care of it for the next update
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

I tried the latest version, similar results, with an interesting addition (a power supply was 100% match with a beach). I uploaded file "07" to the same shared folder, or direct link to file (no password required, expires in 14 days). Too bad we can't upload images directly here, but I guess it's a limitation of the forum backend.

Indeed, I'll look into binary comparison more often, to be 100% sure.

Thanks again.
Mike707
Member
Member
Posts: 10
Joined: 2024 Jan 02, 12:52

Re: i-DeClone Returns 100% similarity on competely different TIFF images

Post by Mike707 »

UPDATE: tried un-marking the "automatic" option and keeping "compare contents", and it worked well, only scanned photographs and identified photographs that were 100% equal.

And it was pretty fast.

THanks again.