Finding duplicate files based on content (2)

Discussion & Support for xplorer² professional

Moderators: fgagnon, nikos, Site Mods

Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

I wasn't belittling your frustration with x2's "anal precision" by meandering into esoterics, I was simply pointing out how your problem came to be, historically.

You're asking x2's "compare by content" to become (almost) "content aware" instead.  Yes, for you in this context it is logical to backstep and supply an ASCII switch... but what happens when people start asking for DOC (text-content-only) or PDF (text-content-only) type comparisons?  This is where the other utilities you have (reluctantly) discovered such as CSDiff have found their feeding ground (much the same way as we've learned to integrate 3rd party renaming utilities into x2).

x2 is, after all, a purposefully lean file manager, not a desktop suite.

At the end of the day, x2 is comparing the files themselves (and indeed, their true contents), which have obviously been altered.  How significant that content alteration, is, as fgagnon pointed out, subject to human interpretation, and limited to that.  Whether HTML files are generalised as "Text" or "ASCII" is somewhat beside the point.

Again, I'm not belittling: just saying that the human-judgement of content is paramount in your case.  And, well, despite the ubiquitously commented '2' after your name, in this case you are the best human judge, and thus forced to hone your perspicuity to find out wherein the content differences lie.  (Sorry, couldn't resist a little flavour. :wink:)

(The argument that "everyone else does it" has never seemed to hold much sway with Nikos, and that can be as frustrating as it is respectable.)
Robert2
Gold Member
Gold Member
Posts: 673
Joined: 2004 Jun 17, 15:39

Post by Robert2 »

FileZilla has several “File Types” options. The “Default transfer type” can be “Auto”, “ASCII”, or “Binary”. “Auto” is the default value and was on when I originally transferred my files. It is now set to “Binary” to comply with xplorer² Pro idiosyncrasy regarding HTML files.

FileZilla also has “Automatic file type classification”. It includes a list of the file types it treats as ASCII files by default. It originally includes the “.htm” and “.html” file types. Here it is (now without the “.htm” and “.html” file types on my system):

am
asp
bat
c
cfm
cgi
conf
cpp
css
dhtml
diz
h
hpp
in
inc
js
jsp
m4
mak
nfo
nsi
pas
patch
php
phtml
pl
po
py
qmail
sh
shtml
sql
svg
tcl
tpl
txt
vbs
xhtml
xml
xrc

That list makes sense to me. And, it seems, to most of the available file/folder comparison utilities.

So it isn’t really a question of “human-judgement of content”. It is a question of universally accepted standards of what is a “text” file.

But then again, “.doc” or “.pdf” files are universally considered as “binary” files.

Try to open any DOC or PDF file in a “pure text” editor, and you’ll see what I mean. On the other hand, you’ll have no problem opening any HTML file in the same pure text editor. “Go figure!”, as they say…

Now if I had to rely on my sole “perspicacity” to determine in what way these files differ, it would take ages! And it would be a complete waste of time! There are hundreds of these files! Only a few of them actually differ in what really matters (textual content). Xplorer² Pro reports most of them as “different in content”. Whether their new lines are coded in Unix or Windows style does not affect in any significant way how they display on the browser pages. These HTML files are, to all intents and purposes, identical. And this is all that matters.

I don’t see why anybody would be interested in finding out that the new lines in their HTML files are sometimes coded in Unix style, some other time in Windows style. The alternation seems to be purely internal, and does not affect the actual operation of the files, whether off or online.

My argument wasn’t so much that everyone else does it, but that all free utilities do it that way. After all, xplorer² Pro is a paying software…

I know that Nikos would probably answer “tough luck!”, but still…

As far as “Robert2” is concerned, a funny thing happened to me recently. I wanted to post a message on a forum. I tried to register as simply “Robert”. The answer was “try another name. This one is already in use”. I tried “Robert2”. Same answer. Then I tried “Robert3”, just for fun. Same answer. I tried all subsequent numbers, up to “Robert10”. All got rejected. Then I tried “Robert11”. It got accepted! So I can now log into that forum as “Robert11”. I searched that forum for postings by any “Robert2/3/4, etc”. No “Robert” of any order ever posted any message on that forum…
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

I'll accept that generalised argument, though a quick google to Define: text file seems to throw up a world of vagaries.

How or never, considering this thread is now at around 230-odd views, we can always pretend that it's a democracy (as they say in the movies, "it's a pleasant fiction") and see what the demand from the haunting masses might be.

At least Nikos covers himself by pointing out in the tooltip for Sync by File Content: "Compare files byte by byte; accurate but slow", so it doesn't pretend to the other (perspicaciousness counts :wink:).
Robert2
Gold Member
Gold Member
Posts: 673
Joined: 2004 Jun 17, 15:39

Post by Robert2 »

As far as I am concerned, Googling for anything always throws up a world of vagaries with a diamond core, generally (or luckily) located in the first page(s). This is an area where human perspicacity is indispensable.

The “Compare Suite Light” options include “Always check content (slowest)”. The results of such comparisons are as expected (at least by me): apparently, their only worry is with the textual content.

“ExamDiff” has an option to “Treat both files as text files”. This is ideal. No-nonsense interface, practically-minded.

So on both sides, we have claims of “accuracy”. Guess which results I personally regard as accurate?… Nikos could rightly say that, stricto sensu, the “Compare Suite Light” results are inaccurate. But, they are accurate enough to tell me which textual content is actually different from one file to the next. Xplorer² does nothing of the kind. If xplorer² said “one file has Windows coding for new lines, the other has Unix coding, but the textual content is the same (or is different)”, I would be very happy. Instead of that, I am left to my own devices, potentially with a mass of files to examine manually 2 by 2…

Have no worry though, I’ll turn to any of the free comparison utilities for this job. And I won’t have to go through all this again. I’ll upload all my files in “binary” mode in future. As I understand things, this is supposed to forestall any such problems.
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

Slightly off-topic here, but as you've reluctantly become the local expert in file comparison utilities :D, is it just me or is ExamDiff Pro not a bit steep at $35US for a measly 1 year license?  I know there's a free version as well, but the comparison chart doesn't give it a lot of credence; besides, there's just something under-class about using something when you know it's inevitable that you'll be hitting you head off the ceiling every few weeks.  (I mention ExamDiff simply because it looks rather nice (from a malapropishly alliterative comparison of conspiratorial comparators point of view.)  It also has a humorous twist on the idea of a program icon, which is always something which catches my frivolous eye. :wink:

Pity that the field of (proper freeware) dedicated compare utilities isn't as wide open with quality as compared to the field of Renaming utilities (of which it's actually rather difficult to find one you have to pay for).
Flux
Member
Member
Posts: 34
Joined: 2005 Oct 04, 12:04

Post by Flux »

Pure binary compare is slow? Are you kidding?
It's the fastest (and easiest) way to compare two files!
In all the other cases you have to actually *parse* the files and that needs time.

I'm a very happy ExamDiff Pro user.
In the many options provided, there is the use of plugins.
Via the plugins you can effectively compare doc, pdf, html...
Robert2
Gold Member
Gold Member
Posts: 673
Joined: 2004 Jun 17, 15:39

Post by Robert2 »

First, I am no expert. Not in anything, not even in file comparison utilities! And I only have basic needs in the matter. This is why I am not prepared to spend any money on such utilities.

What's more, I am still using a German shareware called Miedit dating back to the time of Windows 3.1! It is a pretty good simple pure text editor with file comparison capability. So, most of the time when I really need to compare text files, I use Miedit which does a pretty good job of it. It has the added advantage that I can edit either of the 2 files within Miedit itself.
This said, the free version of ExamDiff has 2 shortcomings in my eyes, i.e. it does not compare folders, only documents, and it does not show which part of a line is actually different when lines are only partially different. ExamDiff has an option to “treat both files as text files”, but I am not sure this matters because the other comparison utilities seem to have that option on by default when they deal with text files.

CSDiff, on the other hand, can compare folders, and shows which actual words are different in a line. Note that there is a paying version called HTMLDiff which, they claim, shows differences “in a [more] graphical way” and, as the name indicates, deals specifically with HTML files.

Compare Suite Light (http://www.freefilecompare.com) originally has a rather more impressive graphical interface. It can compare both files and folders, and has more options (search, exclusion list, etc). It might be worth a look. It seems to be the most elaborate of the 3.

HTH.
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

For those with a peculiar brain type (mine?) playing with all these at the same time is extremely entertaining.

Direct integration with x2 for user-commands is unfortunately limited by a distinct lack of all-selected equivalents for $G (in other words, $A from the inactive pane), but accepting the basics of $L and $R one can just let the application sort out the flotsam and jetsam.

You're right though, Compare Suite Lite is certainly the flashiest to play with... CSDiff's approach to showing a pre-merged (single-pane) revision takes a little getting used to; CSL's dual pane revision is more familiar, given that I'm used to using Notepad++'s Compare plug-in (which is great for giving distinct differences in files, but falls down on the merge-front).

The limitations of these "lite" versions as regard non-binary-only (not a problem in ExamDiff Pro or CS Pro - hence the "Pro" bit :sad:) are a bit frustrating as far as being a full time solution, but that's the trouble with "free stuff", isn't it?  Always a hiccup.

Thanks for the suggestions... (though you owe me a few hours of my life back for making me fascinated with this weirdness :wink:).  That said, I don't consider the time spent to be a waste.

I really should get a nice, safe hobby, like building model planes or something.  But then I suppose I'd just fret over the differences in brands of glue.  Maybe sword-fighting?  Too many types of wounds to heal.  Horse riding?  Bloody big animals don't come with reliable brakes.

<Sigh>

Sometimes I hate my brain. :D
User avatar
vserghi
Silver Member
Silver Member
Posts: 309
Joined: 2002 Mar 19, 08:54
Location: UK

Post by vserghi »

Have you guys tried WinMerge?
Vas
Robert2
Gold Member
Gold Member
Posts: 673
Joined: 2004 Jun 17, 15:39

Post by Robert2 »

I just quickly ran WinMerge.
WinMerge has an option to “Ignore carriage return differences (Unix/Mac/Windows)”. Go figure!
When that option is off, all the files in my 2 folders are reported as different.
When that option is on, all the files in the same 2 folders are reported as identical, except for the one (test) file whose textual content is actually different.
On a given line, WinMerge does not highlight the differences quite as accurately as CSDiff, unless you activate “View line differences at character level” (“Editor” options).
WinMerge also has “Enable moved block detection”, which can be quite useful.
WinMerge seems to be a very good free offer.
HTH.
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

Robert2 wrote:WinMerge seems to be a very good free offer.
And she does binary, plus x64 integration.

The only thing I don't like (aside from default pane-size - is this adjustable?) is that it seems to insist upon single-pane folder comparison.  Am I the only one who doesn't find that intuitive, or is it just a symptom of being spoilt by x2?

Curiously, the internal left and right shell menus do not respect disabled shell extensions (you get them even if they are intentionally disabled, say with ShellExView, etc - be they active [dll] or passive [registry]).  How very odd. :shock:

vserghi, you just took all the fun out of it: instant winner, for me, anyway.  Cheers!
User avatar
snakebyte
Gold Member
Gold Member
Posts: 430
Joined: 2003 May 07, 07:14
Location: Seattle
Contact:

Post by snakebyte »

My choice of tool in this case would be Beyond Compare. BC has two comparison modes 1) Side by side folder and 2) Side by side files. You have an option of defining "unimportant text" rules. While doing file comparison in either of these modes you can ask BC to ignore unimportant text.

The UI to define unimportant text rules is very flexible. Following screenshot shows unimportant file comparison rule for C/C++/C# source files. If following rule is applied during file comparison, all the differences in white spaces and code comments will be ignored.
Image

BC is not free but you can download the evaluation version to try this feature.
Help! I'm an AI running around in someone's universe simulator.
narayan
Platinum Member
Platinum Member
Posts: 1430
Joined: 2002 Jun 04, 07:01

Post by narayan »

kdiff3, meld and FreeFileSync are also good apps.

A powerful feature in kdiff3/Winmerge: It aligns the matching lines vertically between all 2/3 files that are being compared, by inserting gaps. This enables you to see the lines side-by-side. In other apps, fancy color-bands correlate the matching lines, but you have to compare lines located at different heights.

AFAIK kdiff3 is the only app that actually highlights the difference in characters. All the other applications just color-code the entire lines, but do not point out exactly which characters are different. That is for you to find out!
User avatar
nikos
Site Admin
Site Admin
Posts: 15791
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

the diff tool in TSVN (tortoisemerge) is also good and free but i always find myself using the basic windiff which also tracks lines changed just in position (moved up or down). TSVN diff tool considers these as changed, doh!
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

snakebyte wrote:My choice of tool in this case would be Beyond Compare.
One rather nice feature of this one (for those who have a need for it) is proper image (png, jpg, etc.) comparison with a graphical representation of the differences between pics.  While this is nothing that can't easily be done manually in layer differentiation with a decent image editor, it is very convenient to have on the file-managing level.

A bit pricey, though, for the Pro version, but probably one of the best of the "paid-for" lot.
Post Reply