Sign in to follow this  

[.net] Preventing reading of invalid or empty xml files

This topic is 3624 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Two users of my program have reported the following errors when my program reads a folder containing xml files. One user is getting
Quote:
Root element is missing. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res) at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at bo.a(String A_0, Dictionary`2 A_1)
Another user is getting
Quote:
Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String arg) at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace() at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at bo.a(String A_0, Dictionary`2 A_1)
Obviously I'm trapping these error okay, but for some strange reason these errors are causing problems in other parts of my program (not sure why). So I would like to prevent these exceptions from being thrown. The first error is caused by attempting to read xml from an empty file. So I guess my first question is how do I check if a file is empty before reading it? The second error I'm not quite sure about. Any ideas on how to check if an xml file is valid before reading it? Another reason I would like to prevent these errors is because I output exceptions to a log, so it looks ugly when the log is full of these errors. I have to expect some of these xml files will be either empty or invalid.

Share this post


Link to post
Share on other sites
An empty file should have a size of 0, it should be trivial to check this on your stream before you begin reading.

XML file 'ought' to begin with "<?xml " In reality they sometimes don't. The first real (non whitespace) character in an XML file should always be "<"

This should cover some of your cases.

Hope this helps.

Share this post


Link to post
Share on other sites
You indicate you're catching the exceptions; are you perhaps leaving some variables around that were only partially initialized? That could cause problems later in the application.

That is...


XmlDocument document = new XmlDocument();
collection.Add(document);

try
{
document.Load(filename);
}
catch(Exception ex)
{
// Log exception.
}



In the above code the XmlDocument object is never pulled from the collection so it remains for further iteration/modification/manipulation.

Also, if you're expecting these errors it is more than acceptable to attempt to load an XML document and catch - and discard - the exception if it occurs.


private bool PoorMansDocumentValidator()
{
try
{
XmlDocument document = new XmlDocument(filename);
return true;
}
catch(XmlException xex)
{
return false;
}
}



The best way to handle the situation is to have a schema and validate the XML file against that schema using an XmlValidatingReader. Do the files you're looking for have a specified schema available?

Share this post


Link to post
Share on other sites
There is no schema and this needs to be a very fast check. Since this is generated xml I always know it will have "<?" at the beginning if it's valid.

Here is the working method I came up with. It also works with Unicode files which is a bonus.

public bool IsValidXmlFile(string Filename)
{
FileInfo fi = new FileInfo(Filename);

if(fi.Length == 0)
return false;

using (StreamReader sr = fi.OpenText())
{
char[] buffer = new char[2];
if (sr.ReadBlock(buffer, 0, 2) == 2)
{
byte[] bytes = System.Text.Encoding.UTF8.GetBytes(buffer);
String str = System.Text.Encoding.UTF8.GetString(bytes);

if (str == "&lt;?")
return true;
}
}

return false;
}


Thanks for the advice :)

Share this post


Link to post
Share on other sites

This topic is 3624 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this