Checksum Results Are Not As Expected

Discussion & Support for xplorer² professional

Moderators: fgagnon, nikos, Site Mods

Post Reply
pj
Gold Member
Gold Member
Posts: 477
Joined: 2006 Jan 26, 14:01
Location: Florida

Checksum Results Are Not As Expected

Post by pj »

Nikos,

Background:
I have a group of files that I have carried over from several computers and hard drives.  I recently went to access one of the Excel files and received an "Invalid format" error.  When I accessed the file with Editor2, the file was all x00H's.  I discovered a group of my folders had all been written as x00H's.

The Real Question:
I went to use the Checksum column to see what was zeroed so I could go back through my backups, and the Checksum column didn't give me 00000000 as expected, but other "random" numbers. I could provide examples if you wish.  The real question is how the Checksum column is calculated. If you use contents only, or include the directory entry information (filename, date, time, etc.).  

My 2-cents worth:
If you include more than file contents (including the ctrl-Z EOF if that's included in the file size), then Checksum becomes less useful for locating exact copies of files, as NTFS, FAT and FAT32 directory entries may all have different contents for the same file.  My present home computer has a mix of formats between drives and partitions for legacy reasons. I especially have fun with comparing one set of files that has the "1 hour off" thing depending on where they're located :?.  

Just looking for an answer to "The Real Question", and comments on "My 2-cents worth".  :)

-----------------------------
PJ in (sunny / rainy / stormy) Fla
(All three have happened while writing this)   :o
User avatar
johngalt
Gold Member
Gold Member
Posts: 561
Joined: 2008 Feb 10, 19:41
Location: 3rd Rock
Contact:

Post by johngalt »

Well,

I took one file and copied it from one directory to another (both NTFS under Vista) and the checksum didn't change, so the directory info is definitely *not* being taken into consideration.

I then created a small FAT32 partition in some spare space I had and copied the same file to that FAT32 partition.

All three times the checksum remained the exact same.
Image

Image
User avatar
nikos
Site Admin
Site Admin
Posts: 15799
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

the checksum column isn't an MD5 or something accurate, it's just a simple position-aware sum for speed. The algorithm is this

state = 101
sum = 0
prb = file buffer

while(file)
sum += (*prb++) | state++;
pj
Gold Member
Gold Member
Posts: 477
Joined: 2006 Jan 26, 14:01
Location: Florida

Post by pj »

I don't understand if the contents of a file is all x00h's, then how can the displayed checksum be other than 00000000?

I created a simple file with 16 x00h's in it (I wish I knew how to upload a file or image <sigh>) and received the following checksum:

Code: Select all

Name	Size	Checksum
zerotest.hex	16 B	000006C8
Is the "sum" variable type floating point or something that includes a mantissa and exponent within the variable?  I'm too many years away from bit twiddling to really understand what's happening, but I do remember adding a bunch of binary 0's always resulted in 0's.

----------------------

puzzled PJ in (cloudy) FL
pj
Gold Member
Gold Member
Posts: 477
Joined: 2006 Jan 26, 14:01
Location: Florida

Post by pj »

Answering myself:

OK, I now see you are ADDING 101 to the first byte, then 102 to the next byte, etc.  For 16 bytes, the sum of those are 1736, or x000006C8h.

Now I know how you get the answer, my question is WHY!?

Is this some "best practices" algorithm for generating "checksum" -("checksum" is quoted because when I was programming EPROMs using IntelHex format, the checksum was only adding the bytes and nothing else)?

-----------------
PJ in (sunny :D ) FL
User avatar
fgagnon
Site Admin
Site Admin
Posts: 3737
Joined: 2003 Sep 08, 19:56
Location: Springfield

Post by fgagnon »

I can verify your conclusion, pj. :!:  :cry:  

... I had never questioned that what nikos calculates and enters under the "checksum" column is anything other than the simple sum of the bytes (my expectation, also from PROM programming daze experience)  :shock:
User avatar
nikos
Site Admin
Site Admin
Posts: 15799
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

without the order side sum, a file like "ab" would be the same as "ba"
User avatar
fgagnon
Site Admin
Site Admin
Posts: 3737
Joined: 2003 Sep 08, 19:56
Location: Springfield

Post by fgagnon »

Yes, I was just thinking about that, and decided that it is a simple (even clever) way to return a different 'sum' for files which are merely byte-swizzled, and would otherwise return the same (classic) checksum.  8)
pj
Gold Member
Gold Member
Posts: 477
Joined: 2006 Jan 26, 14:01
Location: Florida

Post by pj »

<sigh> OK, OK!  I surrender!  Nikos, you are absolutely correct the byte order has to be accounted for, thus the use of the incrementing "state" variable.  

Acutally I do agree it's very clever!

Now I need to find a byte-sum utility I can batch to locate all my zero'd files so I can restore the one's (pun intended) that need it.

---------------------------
(searching) PJ in (dark) FL
Post Reply