[.net] Preventing reading of invalid or empty xml files

Started by
2 comments, last by Headkaze 16 years, 3 months ago
Two users of my program have reported the following errors when my program reads a folder containing xml files. One user is getting
Quote:Root element is missing. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res) at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at bo.a(String A_0, Dictionary`2 A_1)
Another user is getting
Quote:Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String arg) at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace() at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at bo.a(String A_0, Dictionary`2 A_1)
Obviously I'm trapping these error okay, but for some strange reason these errors are causing problems in other parts of my program (not sure why). So I would like to prevent these exceptions from being thrown. The first error is caused by attempting to read xml from an empty file. So I guess my first question is how do I check if a file is empty before reading it? The second error I'm not quite sure about. Any ideas on how to check if an xml file is valid before reading it? Another reason I would like to prevent these errors is because I output exceptions to a log, so it looks ugly when the log is full of these errors. I have to expect some of these xml files will be either empty or invalid.
Advertisement
An empty file should have a size of 0, it should be trivial to check this on your stream before you begin reading.

XML file 'ought' to begin with "<?xml " In reality they sometimes don't. The first real (non whitespace) character in an XML file should always be "<"

This should cover some of your cases.

Hope this helps.
You indicate you're catching the exceptions; are you perhaps leaving some variables around that were only partially initialized? That could cause problems later in the application.

That is...

     XmlDocument document = new XmlDocument();     collection.Add(document);     try     {          document.Load(filename);     }     catch(Exception ex)     {          // Log exception.     }


In the above code the XmlDocument object is never pulled from the collection so it remains for further iteration/modification/manipulation.

Also, if you're expecting these errors it is more than acceptable to attempt to load an XML document and catch - and discard - the exception if it occurs.

private bool PoorMansDocumentValidator(){     try     {          XmlDocument document = new XmlDocument(filename);          return true;     }     catch(XmlException xex)     {          return false;     }}


The best way to handle the situation is to have a schema and validate the XML file against that schema using an XmlValidatingReader. Do the files you're looking for have a specified schema available?
..what we do will echo throughout eternity..
There is no schema and this needs to be a very fast check. Since this is generated xml I always know it will have "<?" at the beginning if it's valid.

Here is the working method I came up with. It also works with Unicode files which is a bonus.

public bool IsValidXmlFile(string Filename){	FileInfo fi = new FileInfo(Filename);	if(fi.Length == 0)		return false;	using (StreamReader sr = fi.OpenText())	{		char[] buffer = new char[2];		if (sr.ReadBlock(buffer, 0, 2) == 2)		{			byte[] bytes = System.Text.Encoding.UTF8.GetBytes(buffer);			String str = System.Text.Encoding.UTF8.GetString(bytes);			if (str == "&lt;?")				return true;		}	}	return false;}


Thanks for the advice :)

This topic is closed to new replies.

Advertisement