#### Archived

This topic is now archived and is closed to further replies.

# 3 bytes at beginning of xml files?

This topic is 6181 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Since importing my XML document in to .NET, .NEt seems to be appending 3 bytes to the beginning of each file. I''ve seen 16 bit unicode files and my parser(expat) can eat those but it chokes on these files. Notepad seems to understand these bytes as it hides them, XML Notepad seems to handle them as well but when saving from there it doesn''t write them back out. The bytes are always (hex) ef bb bf (or ascii) ï»¿ Anyone know what these could be? Chris Brodie http:\\fourth.flipcode.com

##### Share on other sites
Sounds like .NET is generating non standard XML. The first thing in any xml file has to be the <?xml?> processing instruction. Anything else isn''t really xml.

##### Share on other sites
Yes, Big B is right. This is yet another blatant attempt by Microsoft to take over the world. By simply placing three bytes in front of XML files generated with VS.NET, they render the whole technology useless to non-Windows platforms thus sealing their stranglehold on the operating system market.

But seriously, back to the real world, those three bytes are a standard tag to identify the XML file as being in UTF-8 format, similiar to the tag that UTF-16 files have. There''s a command in VS.NET under "File->Advanced Save Options..." that you can set it to not include the marker.

##### Share on other sites
Ahhh... thanks much...

##### Share on other sites
Whats wrong with

 

(This taken from an xml parser sample)

This concerns me as I am writing an SGML/XML parser right now. A standard xml parser is supposed to stop parsing as soon as it finds and xml error. Most parsers will see the first byte and quit since its not a ''<'', unless they are non standard.

##### Share on other sites
Uhm... isn''t this against the XML standard just because at least the tag structure has to be built from standard 7bit ASCII charset (00-7F)? I see a _big_ problem here. It''s a thing that should be brought up agains M\$ because they''re trying to pollute a standard again, so if they continue doing such crap, they weren''t allowed to declare .NET as "XML compliant" or anything since their own parser isn''t XML compliant.

##### Share on other sites
I''ll just ignore the ignoramus Shadowdancer...

Big B, the XML standard states that conforming implementations must support the Unicode standard in its UTF-8 and UTF-16 encodings. The three bytes that gimp was having troubles with are a standard (optional) marker for identifying UTF-8 documents, just as there are standard (not optional, I think) markers for UTF-16 documents (useful for identifying byte order). The reason this is needed besides just is because, as I''m sure you are aware, Unicode isn''t used just for XML. If you open a text file in Notepad, the only way it''s going to know, without you telling it (if you even know), that it''s encoded using UTF-8 (or UTF-16 or UTF-32 or whatever) is if the marker is present. This is why when you save XML to a text file in VS.NET, it adds the byte marker by default. The byte marker is optional because in some cases it''s not necessary, for example, on the web since character encoding is specified in the HTTP header. There are all sorts of interesting arguments between the people on the standards committees over what happens when the HTTP header, the byte marker, or disagree. To find out more about this stuff you can check out w3.org, xml.org, and unicode.org.

1. 1
2. 2
frob
15
3. 3
4. 4
Rutin
12
5. 5

• 13
• 12
• 58
• 14
• 15
• ### Forum Statistics

• Total Topics
632120
• Total Posts
3004220

×