Odd 3 characters added to the beginning of file in vb

Started by
1 comment, last by NotAYakk 17 years, 9 months ago
Hello. Just a quick question. I made a small scripting language in c++. I made an editor for the script files in vb.net. However, when I save the files in vb.net express, it adds 3 chararacters, invisible in the text file, but they are read in by my program. It was a bit of a headache figuring out that this was the problem, and I did a pretty nasty thing in c++. I just.. if (rawLines[0][0]=-17 && rawLines[0][1]==-69 && rawLines[0][2]=='¿') lastspace=3; Which causes my scripting interpreter to skip over the 1st 3 characters of the file if it is that strange set of 3 characters at the beginning. The 1st character is an i with 2 dots instead of 1 on the top The 2nd character is a double greater than sign The 3rd character is an upside down question mark. ¿ I was just wondering exactly why VB added those characters in, why they are invisible in notepad, and if there is an easy way to make VB stop putting those in. I just did a simple file write from a textbox. My.Computer.FileSystem.WriteAllText(SaveFileDialog1.FileName, TextBox1.Text, False) Thanks for your help. My 'hack' works, but I'd really like to know why it is doign that in the 1st place. Thanks for your help, Adam
Advertisement
Open the saved file in a hex editor (not a text editor) to see the true contents of the file (your text editor will likely omit characters it's encoding cannot display). If the 3 characters are in fact part of the file, then your problem is during the write stage. If not, your problem is likely to be in the read stage.

I've never used vb.net, but perhaps vb.net is even adding those characters as part of some sort of functionality. Is the text you're writing in rich-text format? It could also be some sort of header or version code.

Found this on Google.

Quote:I attempted to make use of a function in the new My namespaces (My.Computer.FileSystem.WriteAllText), but ended up with some strange characters in the first line of the file.


...

Quote:However, this method appears to work without any issues.


        Dim sOutput As String = ""        sOutput = DelimitData(dsData, sDelim, bNoHeader)        Dim sw As New StreamWriter(sOutputFile, False)        sw.Write(sOutput)        sw.Close()


And also found this.

Quote:Maybe the problem on your PC is that your System's default text encoding might be different from the .NET Framework's default text-encoding (By default the .NET Framework uses UTF8).
If you notice the My.Computer.FileSystem.WriteAllText() method has another overload which has 4 parameters. The 4th parameter specifies the text-encoding to be used. If you use the overload with 3 parameters, by default .NET uses UTF8, which is different from that of your system's, your getting the garbage values.
So all you have to do is specify the 4th parameter to that of your system, in the 2nd overload. If you don't know your system's text encoding, just use System.Text.Encoding.Default as your 4th parameter.
http://blog.protonovus.com/
What you are seeing is a Bite Ordering Marker, aka a BOM.

From
http://en.wikipedia.org/wiki/Byte_Order_Mark

Quote:While UTF-8 does not have byte order issues, a BOM encoded in UTF-8 may be used to mark text as UTF-8. Quite a lot of Windows software (including Windows Notepad) adds one to UTF-8 files. However in Unix-like systems (which make heavy use of text files for configuration) this practice is not recommended, as it will interfere with correct processing of important codes such as the hash-bang at the start of an interpreted script. It may also interfere with source for programming languages that don't recognise it. For example, gcc reports stray characters at the beginning of a source file, and in PHP, if output buffering is disabled, it has the subtle effect of causing the page to start being sent to the browser, preventing custom headers from being specified by the PHP script. The UTF-8 representation of the BOM is the byte sequence EF BB BF, which appears as the ISO-8859-1 characters "" in most text editors and web browsers not prepared to handle UTF-8.


Your visual studio is quite correctly including a byte ordering marker at the start of your text file, so other applications that open it know both the format and the big/little endianness of the data contained in the file.

(ISO/IEC 8859-1 is better known as Latin-1, or one of the most common 8 bit extensions of the 7 bit ASCII character standard.)

This topic is closed to new replies.

Advertisement