Notepad bug in details
I have told you guys a brief explanation about how Bush Hid the facts notepad bug occurs.
It is because of IsTextUnicode function. This function is used to guess opening files are using which types of encoding (the traditional ANSI encoding or the Unicode encoding).
For a long sentences, it’s fine. But for a short word, it become pretty tricky. Let’s go into details.
To store “Hello” in notepad, it will convert to encoding. There are many types of encoding notepad support. Most encoding starts with their specific prefix to tell the program what types of encoding they are using. But the tricky things happen because Notepad also support encoding with NO prefixes, such as the traditional ANSI encoding (i.e., “plain ASCII”) and the Unicode (little-endian) encoding with no BOM.
Confused? See below Hello in different types of encoding.
48 65 6C 6C 6F This is the traditional ANSI encoding. (No prefix)
48 00 65 00 6C 00 6C 00 6F 00 This is the Unicode (little-endian) encoding with no BOM. (No prefix)
FF FE 48 00 65 00 6C 00 6C 00 6F 00 This is the Unicode (little-endian) encoding with BOM. (FF FE is prefix)
FE FF 00 48 00 65 00 6C 00 6C 00 6F This is the Unicode (big-endian) encoding with BOM. Notice that this BOM is in the opposite order from the little-endian BOM. (FE FF is prefix)
EF BB BF 48 65 6C 6C 6F This is UTF-8 encoding. The first three bytes are the UTF-8 encoding of the BOM. (EF BB BF is prefix)
So there is still a chance that short word with traditional ASCII or Unicode (little-endian) encoding with no BOM might start with EF BB BF, which is UTF-8 encoding prefix. Or maybe FF FE which is prefix for Unicode (little-endian) encoding with BOM.
When that happens, Notepad function IsTextUnicode might guess the wrong encoding as that prefix of other encoding. Remember no matter what encoding you saved with notepad, Notepad will try to guess again when you open the file.
But I don’t see this bug in Windows Vista anymore. So I think Microsoft guys have already developed improved IsTextUnicode function or make Notepad to open in saved file encoding.
Reference : MSDN blog
1 Comment
Leave a comment
Recent Posts
Recent Comments
Tags
Archives
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- November 2009
- July 2009
- May 2009
- April 2009
- January 2009
- October 2008
- September 2008
- August 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- November 2006
- February 2006
- January 2006
- December 2005


ok…on a lighter note I tried
bush did the f…s [fully typed though
] and it returned normal. :p