Editor² 2.6.0.4 (Unicode) corrupts file when edited

Questions & Answers

Moderators: fgagnon, nikos, Site Mods

Post Reply
User avatar
JRz
Gold Member
Gold Member
Posts: 560
Joined: 2003 Jun 10, 23:19
Location: NL

Editor² 2.6.0.4 (Unicode) corrupts file when edited

Post by JRz »

Nikos,

I found that editing a UTF-8 file (says so correctly on the statusbar) results in a corrupted file when saved.

I've reported anomalies when viewing files before (see weird characters in plain text files), showing all kind of weird characters. You said it had to do with the RTF control you use and that the underlying data was not corrupted.

But now it is!! When I save such a file (usually when I've changed something) it sometimes gets truncated, because there's probably an end-of-file character in there somewhere, due to the RTF anomalies I suspect.

This is something which will have to be fixed, because you canot rely on E² saving the file correctly this way :(

I'll send you a pm with the file I'm having trouble with. Maybe you can make something out of it (I hope)
Dumb questions are the ones that are never asked :turn:
User avatar
nikos
Site Admin
Site Admin
Posts: 15771
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

does this also happen for wordpad? because if it does, there's not much i can do...
User avatar
fgagnon
Site Admin
Site Admin
Posts: 3737
Joined: 2003 Sep 08, 19:56
Location: Springfield

Post by fgagnon »

Is this related to anomalous treatment of line termination character(s), as mentioned in another thread?
User avatar
nikos
Site Admin
Site Admin
Posts: 15771
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

:shrug:
User avatar
JRz
Gold Member
Gold Member
Posts: 560
Joined: 2003 Jun 10, 23:19
Location: NL

Post by JRz »

nikos wrote:does this also happen for wordpad? because if it does, there's not much i can do...
No it doesn't. But I've observed a few strange things. I'll try to describe as briefly as possible (it's complicated :( )

I edit the file using E². On a particular line a strange character is showing at the end of the line (hex view shows CR-LF pair).
I copy a block and paste it at another location in the file.
Then save it as UTF-8 and the file is broken (truncated at the location where the strange character was). It now shows the same strange character at another location (in middle of some text)

When I duplicate this action but save as OEM file, teh result is an unbroken file, still with the strange character showing at the location it was originally.

When I use Wordpad, it sometimes shows the strange char, but it never breaks when going through the same procedure as above.

I'll send you the file later (too busy now) :(
Dumb questions are the ones that are never asked :turn:
User avatar
nikos
Site Admin
Site Admin
Posts: 15771
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

however, wordpad cannot save utf-8 at all!
so how exactly do you manage?
User avatar
nikos
Site Admin
Site Admin
Posts: 15771
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

i have a conjecture for this
when a file is loaded the control asks for something like 4K at a time. If this boundary happens to be in the middle of a multi-byte character...

however shouldn't the control take care of that? i don't know a thing about utf-8, never mind knowing which characters are lead-only!

another question:
those files that get mangled, do they have a utf-8 BOM or you force utf from the command line?
User avatar
JRz
Gold Member
Gold Member
Posts: 560
Joined: 2003 Jun 10, 23:19
Location: NL

Post by JRz »

nikos wrote:however, wordpad cannot save utf-8 at all!
so how exactly do you manage?
No, I don't save as utf-8 from WordPad, but from E²!!

See the mail I sent you
Dumb questions are the ones that are never asked :turn:
User avatar
JRz
Gold Member
Gold Member
Posts: 560
Joined: 2003 Jun 10, 23:19
Location: NL

Post by JRz »

nikos wrote:[...] another question:
those files that get mangled, do they have a utf-8 BOM or you force utf from the command line?
I believe the files have a UTF-8 BOM (see the files I sent you)

I've tried to describe as accurately as possible what I did to the files. If something's not clear, don't hesitate to ask.
I really like this 'feature' to go away :)
Dumb questions are the ones that are never asked :turn:
Post Reply