Efficient Error Handling Methodology/Best Practices

Started by
22 comments, last by Servant of the Lord 8 years, 4 months ago

I believe that handling input and output errors of each and every function is absolutely vital and that doing so will help a code base's long term debug-ability as errors are immediately caught. However, I'm not exactly sure what the standard in the industry is for doing this (both from an efficiency stand point and a maintainability standpoint, as introducing additional code paths increases code complexity). I believe the long term cost of assert statements is very minimal as it doesn't introduce additional potential code paths that you have to maintain and they're generally compiled out for release builds anyway for efficiency, so my questions are as follows:

1 - What errors do you if check? (Or more specifically...)

2 - Do you if check every single potentially null pointer?

3 - Do you if check programmer errors, or do you just leave it up to the assert to catch it?

4 - If you believe you should if check absolutely everything, do you think that this has an impact on the maintainability and readability of the code base?

5 - How would your answers change in a team of 5, 20, or 100 people?

Thoughts?

Advertisement
First: The more you can do to make errors impossible to add to the code in the first place, the better. In C++, this is things like passing by reference instead of by pointer when null is not valid. Or using the new strongly-typed enums. Or making new light-weight "stub" types that don't support implicit casting for certain things so you don't mix up parameters that look similar like position and size. The list generally goes on and on smile.png

However, in general, you should never ignore an error. Errors are there for a reason, to tell you you've done something wrong.

Error checking, when done properly, should not obscure the maintainability and readability of the code. This is generally easier to do with exceptions and exception-safe coding practices, but some companies will not allow exceptions in their codebase for a variety of reasons (some of them valid).

If it's a "public-facing" function, then I always use pre-condition checks, or code so that pre-condition checks are unnecessary. If it is an internal function that I control all calls in, then I'm more comfortable with using asserts to catch contract violations.

And finally - sometimes you can't handle an error, so the best thing you can do is crash. Remember: crashing is fine and is sometimes far better than the alternative. Do not be afraid of crashes, crashes are the developer's friend. It's much easier to debug a crash than memory corruption because you ignored an error smile.png

The only valid cases I've seen for ignoring errors are where a profiler says that error handling code is significantly impacting performance - and in those cases it is generally possible to hoist the error handling to a higher level to eliminate the performance issue.
Best practice is to error check every return value, and to check every pointer.

Each function should check it's parameters and shouldn't assume validity of any input.

Others might tell you otherwise but if you don't error check liberally and be strict on what you expect, you are just opening yourself to a world of hurt later.

It's fine saying "but the error checking costs me in performance in my game loop" but you have to balance game performance with your ability to debug and finish the project and player perception if you release a buggy game.

Hope this helps!


It's fine saying "but the error checking costs me in performance in my game loop" but you have to balance game performance with your ability to debug and finish the project and player perception if you release a buggy game.

Agreed. The tiny cost is worth it.

Also you can lean on your compiler for additional benefits. Several of the good compilers provide compiler hints LIKELY and UNLIKELY. These guide the compiler to optimize the code such that the most likely side of the branch is the fast one, and the less likely side of the branch is out of the way.

Profile-guided optimizations can do this for you as well, if you use them. You run the program through its paces as the compiler tools create profile results looking at the branches taken. The next compilation will use those results to figure out which branches are more or less likely.

I believe the long term cost of assert statements is very minimal as it doesn't introduce additional potential code paths that you have to maintain and they're generally compiled out for release builds anyway

(emphasis mine)

Debug-only asserts may seem like a good idea in theory, but they are an absolute nightmare in production systems. Every piece of software I've worked on has ended up with a hard rule of no debug-only asserts.

The issue is that as soon as you have asserts that don't trigger in release builds, you are now testing fundamentally different code paths than are executing in production. That assert(x != nullptr) seems like a great idea at the time, but did you realise that you've just guaranteed that every piece of code after that line has never been tested with null values?

Typically the die-hard assert() fans end up creating two other macros instead, logAssert() which always logs violations, but never takes an action, and assertAlways() which dumps core in production builds as well (the latter used very sparingly, as it is equivalent to a crash-to-desktop for the end user).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

That assert(x != nullptr) seems like a great idea at the time, but did you realise that you've just guaranteed that every piece of code after that line has never been tested with null values?

If x can be a nullptr at that point, the assert is wrong in the first place.

I usually only check the return values of calls, and invariants and entry conditions of a function against the interface contract.

nullptrs aren't much worth checking, since any use of them will crash anyway, so they won't get far.

It was just an example. The point is, assertions check the wrong thing. They attempt to ensure that invalid values never occur, instead of ensuring that invalid values are handled gracefully.

It is impossible to ensure that invalid values never occur - you can't ever be sure that you've run every single branch during testing.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

First: The more you can do to make errors impossible to add to the code in the first place, the better. In C++, this is things like passing by reference instead of by pointer when null is not valid. Or using the new strongly-typed enums. Or making new light-weight "stub" types that don't support implicit casting for certain things so you don't mix up parameters that look similar like position and size. The list generally goes on and on smile.png

Agreed. I try to do this as much as I can in my personal projects. If I can design code in a manner such that it does not require obscene amounts of error checking, then that's a good thing.

However, in general, you should never ignore an error. Errors are there for a reason, to tell you you've done something wrong.

Also agreed. Unless you mean ignoring an error is only using an assert and not an if check, in which case, that's what I'm not sure yet, heh.

If it's a "public-facing" function, then I always use pre-condition checks, or code so that pre-condition checks are unnecessary. If it is an internal function that I control all calls in, then I'm more comfortable with using asserts to catch contract violations.

This seems more reasonable than using if checks everywhere in some functions (in this case, internal functions).

It's fine saying "but the error checking costs me in performance in my game loop" but you have to balance game performance with your ability to debug and finish the project and player perception if you release a buggy game.

Yeah. I guess my next questions would be: how do you personally decide what to do when an error has occurred? Where do you draw the line for performance and error handling? As an extreme example...are you really going to handle NaNs with if checks in a math/physics library?

Debug-only asserts may seem like a good idea in theory, but they are an absolute nightmare in production systems. Every piece of software I've worked on has ended up with a hard rule of no debug-only asserts.



The issue is that as soon as you have asserts that don't trigger in release builds, you are now testing fundamentally different code paths than are executing in production. That assert(x != nullptr) seems like a great idea at the time, but did you realise that you've just guaranteed that every piece of code after that line has never been tested with null values?

And this is why I'm on the fence about how I should approach error handling. If you even use a single assert (without a matching if check to exit the function or otherwise handle the problem) to eliminate one possible code path in a Debug build, your code is technically broken (or has a bug, at least) in Release builds.

Maybe what I should also ask is that in what cases should you: only use asserts, use asserts and some appropriate if checks*, or use asserts and if check* absolutely everything? Or is the slight performance hit and the development time cost of gracefully handling every single error worth it?

*By if check, I either mean setting the value to something appropriate so the function can continue running, exiting the function and/or deciding to crash the program (if the error is severe enough).

It's fine saying "but the error checking costs me in performance in my game loop" but you have to balance game performance with your ability to debug and finish the project and player perception if you release a buggy game.


Yeah. I guess my next questions would be: how do you personally decide what to do when an error has occurred? Where do you draw the line for performance and error handling? As an extreme example...are you really going to handle NaNs with if checks in a math/physics library?


In your math/physics library? That is unlikely. There are going to only be two ways you can end up with NaNs and that's invalid math operations (which should be caught by error checking higher up - like making sure you don't divide by 0), or loading data with invalid values. In which case put your NaN checks in the data load code since that code is likely already loading from a very slow hard drive and the addition of NaN checks won't slow you down, relatively speaking.

In other words, it's kind of an extension of the idea that internal functions need less error checking - the more exposed a function is, the more contract checking it should probably do.

Debug-only asserts may seem like a good idea in theory, but they are an absolute nightmare in production systems. Every piece of software I've worked on has ended up with a hard rule of no debug-only asserts.



The issue is that as soon as you have asserts that don't trigger in release builds, you are now testing fundamentally different code paths than are executing in production. That assert(x != nullptr) seems like a great idea at the time, but did you realise that you've just guaranteed that every piece of code after that line has never been tested with null values?


And this is why I'm on the fence about how I should approach error handling. If you even use a single assert (without a matching if check to exit the function or otherwise handle the problem) to eliminate one possible code path in a Debug build, your code is technically broken (or has a bug, at least) in Release builds.


Maybe what I should also ask is that in what cases should you: only use asserts, use asserts and some appropriate if checks*, or use asserts and if check* absolutely everything? Or is the slight performance hit and the development time cost of gracefully handling every single error worth it?

*By if check, I either mean setting the value to something appropriate so the function can continue running, exiting the function and/or deciding to crash the program (if the error is severe enough).


That's really a case-by-case basis.

Asserts give you nothing in release. If they are used, they should only be used to indicate contract violations - and only in the case where they are catching places where the coder himself has made a mistake. Not for anything that may be reading or processing outside data.

Error recovery is only useful if you have some way to actually recover at the level that the error is detected. For example, a file open function should report an error to the caller if the file does not exist (via exception, return code, whatever). It should not try to "recover" by guessing another filename based off the one it was given, or behaving as if it was "empty", or some other mechanism. The reason is that it doesn't have enough information to know what to do if it can't do its job.

Now, higher up, say your texture loader gets an error from the file loader that says a texture file can't be opened. Now you can use error recovery and, say, substitute a bright pink texture with the words "MISSING FILE" in blue or something. That lets you continue working on your game without being stopped by an assert or crash in a way that makes it obvious something is wrong, but the game is still "playable". (Well, unless the missing texture is your HUD and now you have a bright pink screen with blue letters obscuring everything...)

However, let's say the file loader is reporting an error when you're trying to load, say, your game code DLL file (let's assume you're an older iD tech game for a moment and your "game" is a dll run by the "engine" exe). Not much you can do here besides exit. In this case you can actually crash, or perhaps log the error somewhere and display a message box before terminating the program. Either way, something essential that cannot be replaced is missing, and so the game cannot continue.

And this is why I'm on the fence about how I should approach error handling. If you even use a single assert (without a matching if check to exit the function or otherwise handle the problem) to eliminate one possible code path in a Debug build, your code is technically broken (or has a bug, at least) in Release builds.
That depends on how you use assert. Breaches of contract are "impossible", so if you check the contract of a function, assert should never ever fire. If it does, you're in serious trouble, since the impossible just happened.

Asserts are not for error handling, they are for checking of basic sanity (as you almost never get complete coverage).

If you have a breach of sanity in a release build, you're doomed already, one more code path is not going to make a difference other than not catching a crash-in-progress early.

Checking error returns of a function call by assert makes no sense in general. If an function can fail, it will at some point (else, what use is the error return?). As SmkViper said, unless you can continue without whatever the function was doing, there is basically only one solution, namely die with a nice error message.

In particular, never do


assert(f(a, b, c) != 0);

In a release, the assert is removed, and with it, any side effect that it does, like performing the function call!

Maybe what I should also ask is that in what cases should you: only use asserts, use asserts and some appropriate if checks*, or use asserts and if check* absolutely everything?
Not checking and acting on return values from function calls is an open door to problems. You call the function for a reason. If it fails, you didn't get what you wanted. You have to cope with it.

How to check depends on how "impossible" the condition is.

- If a user needs to know of the problem, produce an error.

- Then decide how to go on.

- If sanity is at stake, use an assert (you're doomed already).

- If you cannot continue, a crash or exit is the only way left

- otherwise, move on.

Note that user report and deciding the next step are separate things. If you do report to the user, make sure you phrase the problem in user-terms. "Cannot find foo.dat" has no meaning for a user, "Tried loading '/path/to/the/foo.dat' file, but could not find it" has that little extra information where a user gets a handle on fixing the problem. It usually means extra work in programming, but it's very nice, and it works (even for yourself).

In the end, bad code will eventually crash no matter what. The question is how quickly can you catch it. In general, quicker is better. It reduces development time, and increases code quality. Always explicitly test all cases, including the last one (if (x == 0) {...} else { assert(x == 1); ... }). Always add a 'default: crash("Oops");' in your switch. Verify post-conditions if some code to compute it is tricky, eg assert(data[computed_value] == nullptr); . Reduce the space you work in as much as possible, try to catch as many false moves as you can.

On the other hand, don't go overboard:


x = 1;
assert (x == 1); // duh!

or


index = -1;
for (i = 0; i < N; i++) if (d[i] == 5) { index = i; break; }
assert(index < 0 || d[index] == 5);

There is a balance here. Code that you know will work does not benefit from a check.

Last but not least, all that you do in this area is just damage control. If you write perfect code, you wouldn't need any of it. Unfortunately, very few people write perfect code, and we end up adding more code (with more mistakes) to catch errors. So while it's useful to let errors not escape to reduce impact, your primary aim should be in writing correct code.

This topic is closed to new replies.

Advertisement