Sign in to follow this  
jamesleighe

Error Management

Recommended Posts

(c++)

Making code robust in the face of errors such as missing textures, files and models.

For textures, if I cannot find a texture I simply return a debug texture.
This code will always return a valid texture, preventing any possibility of crashes since it doesn't return NULL:
[CODE]
ITexture* texture = MaterialAPI()->GetTexture( "chicken" );
[/CODE]

However, how can I check to see if an error has occurred in the process of loading a texture?
Something like this might work:
[CODE]
ITexture* texture;
bool succeeded = MaterialAPI()->GetTexture( "chicken", texture );
[/CODE]
But how would you do it?

Similarly, some of my code just returns NULL on failure, such as when a file isn't found.
[CODE]
IStreamFile* file = FileAPI()->GetStreamFile( "config.txt" );
[/CODE]
Here a failure to check for NULL will result in a crash. Maybe I can return a 'debug' empty file to prevent this in case of programmer error?

It seems to me returning NULL when you don't have to is asking for crashes as soon as something isn't just so, but I'm having some trouble deciding the best way to prevent this; At least as far as the engine interface APIs are concerned. Not only that, but returning NULL in some cases, and debug versions in others is simply not consistent.

Bonus Question:
Also, I surround my entry point with a 'try' statement, and during a fatal error I 'throw' with a specific type. It also allows me to catch any other potentially uncaught 'throw' statements and shutdown reasonably gracefully, but how would you guys handle this type of thing?

Thanks as usual!

Share this post


Link to post
Share on other sites
[quote name='James Leighe' timestamp='1344019758' post='4965915']
However, how can I check to see if an error has occurred in the process of loading a texture?
[/quote]

You don't. The system returns a debug texture. If it can't load *that* texture, then it's time to die. Personally I favor returning null and letting the inevitable uncaught exception to take down the app. No need to write and test code that you've done everything you reasonably can to prevent and cannot really do anything about anyways.

Share this post


Link to post
Share on other sites
Interesting philosophy and it makes sense.

I was trying too hard to prevent it from ever crashing I suppose and should take a more moderate approach.

Share this post


Link to post
Share on other sites
Yeah the others had answered very well.
What I want to say is, never try to patch for your mistake (or to say, bugs).

I had worked on a software, when there were any null pointer crash, the original developers said "hey, let's check when it's null we just don't do that function".
What's that? They were hiding the problems rather than solving them. The result is, the software became harder and harder to maintain and we had to rewrite quite a bit of parts. If we continued that kind of "hiding problem", the software would have died before our expectation.

Now when I find a bug, I would think why it happens, rather than try to get a workaround for it.

Just my personal experience.

Share this post


Link to post
Share on other sites
You definitely want to catch and handle your errors in some way. Allowing the application to just work with a NULL pointer and crash when it tries to dereference it is going to tell you that you have a bug, but it's not going to help you find it. The same is true of letting exceptions bubble up and crash the application.

This is what I like to do. Every function that has any potential at all to cause an exception, I will enclose it in a try-catch block, add some context information to the exception for function/file/line number, then rethrow the exception so the next level function in the call stack can handle it and it's information. Then the main function, or the main thread function, will handle the exception by throwing a dialog box, writing to a log file, or both. This way you can get a lot of information about where and how the error is occurring, even in a release build.

Writing those try/catch blocks can be pretty tedious, so I use some macros to make life easier. i.e.:

[CODE]
#define TRYCATCH_OPEN \
try {

#define TRYCATCH_CLOSE \
} \
catch( Exception& e ) \
{ \
std::ostringstream dump; \
dump << e.message() << " " << __FUNCTION__ << "[" << __FILE__ << "(" << __LINE__ << ")]" << std::endl; \
if (e.nested() == nullptr) \
throw Exception( dump.str(), e); \
else \
throw Exception( dump.str(), *e.nested() ); \
} \
catch( std::exception& e ) \
{ \
std::ostringstream dump; \
dump << e.what() << " " << __FUNCTION__ << "[" << __FILE__ << "(" << __LINE__ << ")]" << std::endl; \
throw Exception( dump.str() ); \
}
[/CODE]

Then when I throw my exceptions, I also use a macro so I can pin down exactly where the exception is being thrown. i.e.:

[CODE]
#define jp_throw(ExceptionType, ...) \
throw ExceptionType( what_(__FUNCTION__, __FILE__, __LINE__, __VA_ARGS__ ) );

[/CODE]


Then every function/method has something like this:

[CODE]
void Foo::Bar()
{
TRYCATCH_OPEN

Some logic...
if (somecondition != expected)
jp_throw(SomeException, "somecondition is: ", somecondition, " expected: ", expected );

TRYCATCH_CLOSE
}
[/CODE]


Still a little bit tedious, but much easier and cleaner than writing out all of those try/catch blocks, and it pays off.

I use custom exception classes that can contain nested exceptions, so I can preserve the original exception object while bubbling it up the chain.

This works out very well for me, and makes it much easier to track down bugs in beta and production builds without having to include debug information in the executable. The information isn't as precise as a true call stack, but it's much better than nothing.

As far as things like not being able to find a texture, I do it the way you do it, by returning a default debug texture. The only thing I do different is that I have a command-line option that can indicate that I want to throw an assertion first, so that I have the option of having the application complain loudly and definitively if there are missing resources, but also have the option of gracefully handling the situation so that development doesn't grind to a halt just because somebody broke the build with a missing resource. Edited by krippy2k8

Share this post


Link to post
Share on other sites
[quote name='krippy2k8' timestamp='1344109299' post='4966174']
it is going to tell you that you have a bug, but it's not going to help you find it.
[/quote]

I should've maybe clarified that I use a [b]vaguely [/b]modern programming language that includes stack traces with its exceptions.

Share this post


Link to post
Share on other sites
[quote name='Telastyn' timestamp='1344124914' post='4966208']
I should've maybe clarified that I use a vaguely modern programming language that includes stack traces with its exceptions.
[/quote]

A stack trace from a NULL pointer exception will only tell you where it is dereferenced, not where it originated.

And by "vaguely modern programming language" I guess you either A) are not talking about C++, which the question is about, B) think that errors are nice and friendly and only happen in debug mode, or C) bloat your release code with debug symbols and disable compiler optimizations. Edited by krippy2k8

Share this post


Link to post
Share on other sites
The thing about the unforgiving crash, is that it can hinder progress on large teams, you use function A developed by programmer 1 which processes data introduced by scriptwriter or game designer X who uploaded a last minute change who then went to his dentist appointment and forgot his cellphone... you just lost a whole afternoon in the task you were doing because of someone else's mistake and there is nothing to do about it.

That is, after it happened, you can do things to minimize the impact of this situations, which in a 10+ team, tend to happen often, even in AAA development.
What you can do is having builds with different error tolerance:
The development build would be most permissive, trying to keep on going as much as possible and returning placeholder data or content whenever the real thing cannot be delivered.
The debug build can be the one that provides the most data, and is the least forgiving of all, crashing as soon as an error is detected and performing extra sanity checks (conditioned by pre processor code not to be part of the release).
Then the release build would lean towards speed (if we are speaking of videogames this is crucial) and resort to try catch code with error reporting support.

This way, if an error is introduced you can still keep on working without wasting your time, perhaps your progress would not be in a deliverable state due to the broken data on debug and release, but at least you will be able to make progress.

Share this post


Link to post
Share on other sites
[quote name='krippy2k8' timestamp='1344109299' post='4966174']...[/quote]This can be refined quite a bit. This system only prints out the location of your error handling block, and throws away all other information about the crash, such as where it actually occurred, and the state of local variables in that stack frame.
C++ exceptions are simply not a good tool for this job, as they throw away all that information.

You should instead let the program crash, but catch the crash as an SEH exception, which allows you to collect as much information as possible and save it in a mini-dump file ([i]and continue from the catch-point if you want[/i]). You can collect the dump file from the user, and open it in your debugger, and you will have all the same information as if the game had crashed while you were connected to it with a debugger yourself -- such as the full intact call stack ([i]which includes where the dereference occurred, and where that funciton was called from)[/i], local variable values, etc... This gives you much, much more information, and doesn't require you to bloat your code text with macros everywhere, nor bloat your binaries with C++ exception handling.

[quote name='NEXUSKill' timestamp='1344136204' post='4966246']The thing about the unforgiving crash, is that it can hinder progress on large teams, you use function A developed by programmer 1 which processes data introduced by scriptwriter or game designer X who uploaded a last minute change who then went to his dentist appointment and forgot his cellphone... you just lost a whole afternoon in the task you were doing because of someone else's mistake and there is nothing to do about it.[/quote]Large teams usually have a build machine to catch these things.
e.g. at work, when we commit changes, they go into a queue at the build-box. This machine builds the changes, runs through all the tests, and only once they've passed successfully does it commit your changes into the shared repository. This means it's pretty hard to break the build.
Also, it can be argued that crashing on simple bugs like this encourages them to be fixed, and to keep the game bug-free, instead of allowing these things to accumulate to the point where the whole team has to crunch for a week before a milestone ;)

That said, yeah returning default data for missing/broken assets (such as a 3d model of the word "error") is pretty standard fare. And attempting to limp on after catching/dumping a crash is possible, though I'd recommend turning on a flag that prints something like "broken build" on the screen in red text from that point onwards, so it's hard to ignore. Edited by Hodgman

Share this post


Link to post
Share on other sites
[CODE]This system only prints out the location of your error handling block, and throws away all other information about the crash, such as where it actually occurred, and the state of local variables in that stack frame.[/CODE]

Look again. This part:

[CODE]
#define jp_throw(ExceptionType, ...) \
throw ExceptionType( what_(__FUNCTION__, __FILE__, __LINE__, __VA_ARGS__ ) );

[/CODE]

adds the function, file and line number to the original Exception object where it is thrown, and that original Exception object is added as a nested exception as it is bubbled up the stack. It won't automatically record local variables, but the jp_throw macro and the what_ function suite allows you to easily add whatever information you think is relevant at the exception site. It also won't give precise information about the line numbers downstream from the actual exception site, but it gives quite a bit of useful information without requiring debugging information to be embedded in the application or any additional performance overhead until an exception is thrown. It's also completely cross-platform and doesn't break in the face of compiler optimizations.

Letting the program crash is just a horrible idea. You can get more detailed call-stack information at the expense of the much more useful knowledge of [i]where the actual error occurs.[/i] Knowing where a NULL pointer is being derefenced is far less useful than knowing where a NULL pointer is originally generated, and unless it is generated immediately before the dereferencing occurs, all of the local variables in the world won't help you very much in figuring out what caused it.

It's still useful to catch SEH exceptions for those errors that you haven't thought about yet, but they are no substitute for assigning your own meaningful error messages where you know that errors can occur. I used the whole SEH/mini-dump system for a long time, but found that a very small percentage of error reports were useful when the application was built with full compiler optimizations. I have been using this system for about 2 years now, and it has been much more useful to me. Edited by krippy2k8

Share this post


Link to post
Share on other sites
[quote name='krippy2k8' timestamp='1344140850' post='4966260']Look again. This part ... adds the function, file and line number to the original Exception object where it is thrown, and that original Exception object is added as a nested exception as it is bubbled up the stack[/quote]Sorry, I missed that. Yes, you're printing the correct file/line.
However, an actual dump file that can be opened by your debugger is still better than just a manually printed line ([i]you can print this line to a log as well as generating the dump[/i]). Also, this print solution only works for explicitly caught errors, not unexpected crashes. Surely a solution that prints file/line numbers for any kind of crash is preferably over one that only prints from assertions ([i]your [font=courier new,courier,monospace]if/throw[/font] statements are just verbose [font=courier new,courier,monospace]assert[/font] statements, right?)[/i][quote]Letting the program crash is just a horrible idea.[/quote]Read the linked articles above for arguments as to why it's helpful. Also, I didn't say to just let it crash -- I said to let it crash and then catch the exception generated by the crash ([i]which allows you to respond at the crash site and catch site automatically[/i]) -- you can still handle this exception as in your example if you like.[quote]You can get more detailed call-stack information at the expense of the much more useful knowledge of where the actual error occurs. Knowing where a NULL pointer is being derefenced is far less useful than knowing where a NULL pointer is originally generated, and unless it is generated immediately before the dereferencing occurs, all of the local variables in the world won't help you very much in figuring out what caused it.[/quote]There's nothing stopping you from logging that kind of information as well......? How does an [font=courier new,courier,monospace]if/throw/catch[/font] allow you to collect more meaningful information than an [font=courier new,courier,monospace]assert/crash/catch[/font]? They're starting to sound pretty equivalent...
[quote name='krippy2k8' timestamp='1344109299' post='4966174']without having to include debug information in the executable[/quote]If you don't count the extreme code bloat from all that C++ exception handling code as 'debug information'. The standard approach only requires debug symbols to be kept privately by developers, and the debug-free optimized build made public (stripped of debug symbols). Edited by Hodgman

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1344142308' post='4966263']
Read the linked articles above for arguments as to why it's helpful. Also, I didn't say to just let it crash -- I said to let it crash and then catch the exception generated by the crash -- you can still handle this exception as in your example if you like.
[/quote]

The linked articles above basically agree with me. The author states that a call stack is more useful than a crash dump because you can see where the error is with a glance without having to load it up in a debugger. When he says "crash as early as possible" he doesn't mean don't handle the error and let some operating system level error handling kick in with undefined behavior and crash the application. He just means not to just show a message or write to a log file and continue running, or to expect the caller of an API to handle the exception and try to figure out how to deal with it. He says he doesn't like exceptions, but his dislike of exceptions is assuming that they are meant to be gracefully handled by the caller. My exceptions are always meant to result in a controlled crash, they are not expected to be "handled" by the callers, only the top level main function that writes the log file, displays an error message, or something else, and then crashes. If I were building library code for external use I would use a different approach.

[quote name='Hodgman' timestamp='1344142308' post='4966263']
There's nothing stopping you from logging that kind of information as well......? How does an if/throw/catch allow you to collect more meaningful information than an assert/crash/catch? They're starting to sound pretty equivalent...
[/quote]

Because you're crashing and recording the call stack at the site of the error, not at the site of the symptom. You can log the function, file, line of the exact place where the error occurs, but you can't log a call stack at that point without bubbling it up. You're also guaranteed that those bugs WILL result in a crash with information. Most bugs aren't guaranteed to result in a crash on their own, they just end up causing weird and possibly detrimental behavior if you ignore the unexpected values, and when they do crash they will often corrupt the stack, rendering that crash dump completely useless.

If you really feel the need to catch a mini-dump then you should raise an SEH exception as close to the actual error site as possible so you can get a dump that is relevant to where the error is first found. There is some merit to that approach, but only if you are developing strictly for Windows. Edited by krippy2k8

Share this post


Link to post
Share on other sites
[quote name='krippy2k8' timestamp='1344144650' post='4966270']...[/quote]Sorry, you've misunderstood me, and how modern assertions/crash handlers work.
An assertion catches bugs at the cause, just like your if/throw statements do. All good code should be full of assertions to catch problems at their immediate cause.
Your game's assertion system logs important information, and then crashes. This doesn't fall back to some poorly defined OS dump handler -- it triggers your game's crash handler, which is able to do further data capture such as stack-traces ([i]both at the sites of an unexpected crashes, and at the sites of your manually written assert statements[/i]). The most basic crash handler should record a full call-stack automatically ([i]yes, even in fully optimized/debug-stripped builds[/i]), without you having to pollute your code with manual instrumentation.

This is all equivalent to the code you posted earlier -- I'm not arguing against this design, but suggesting refinements to that C++-exception-based system which is unnecessarily bloated.

i.e. everything that your system does, is done by standard crash handling systems. Your try/catch blocks can be replaced by a stack-walker, and your if/throw statements can be replaced with assert statements, and no benefits are lost (however, we do end up with a huge reduction in binary size and don't require manual macro placement). Edited by Hodgman

Share this post


Link to post
Share on other sites
Another vote for the "crash early and force the offending team member to fix the thing" party here.

In C++ you can have the call stack dumped to a log file on crash, it's hacky but it is doable and it'll save you days of work in the long run.

In my experience the way to handle these errors varies a lot with the team size, the bigger the team the more automated the process has to be. Having a system that silently goes on loading a "default" texture when an asset cannot be deferenced, without some automatic way to check this, is guarantee to ship the game with some magenta texture popping here and there.

I have a small team, I designed the system to handle NULL textures, but quickly discovered it was NOT a good idea... the way I forced the inclusion of correct asset later is that the file converter will refuse to save a "map" with NULL texture, forcing the implementer to go and fix it.

The real problem is not really how to handle the errors, you can have a logger with stack dump and crash, or anything else, the real problem is to design a system that will stay up regardless of the exception.. this is very application specific, and trying to handle every possible production mistake with exception equals polluting the code with unnecessary checks all over the place... and what's the point of that?

While I understand the search for "uncrashable" software and I salute you for that, it's also important to be pragmatic and understand that this is not the space shuttle control software where a crash will result in a CNN special report and few coffins with a flag, it's just a game.. and it's much better to catch a bug in production than getting obscure weird things later. Edited by kunos

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this