Descriptive Error Handling

Started by
21 comments, last by frob 10 years, 3 months ago

In this example, it could be formatted and displayed to the user, if they provide the regex or a higher-level file that happens to contain an invalid regex. If this is, for instance, a regex compilation method that is never intended to fail, because the regex is provided by the code, not from user input, the information would be formatted and placed in an assertion. Optionally, it may suffice for calling code to see the type of error and ignore the specifics, if they are irrelevant.

On a side note, that furthers the discussion, how would you choose to represent syntax error return codes for a regex engine? Say we're restricted to ERE, nothing fancy. Would you return a single generic syntax error code? Would you have a unique code for all possibilities?

Furthermore, would these codes be in their own enumeration, that is to say, however you choose to represent them, they don't have unique values across all modules, but separate types keeps them from being compared against a different group of return codes? Or, would they be part of one giant collection of codes, so that one type can represent any of the codes, and you could propagate the codes to anywhere that would expect them?

Advertisement

You don't really need a generic system then. The regex part needs to be able to spit out a human readable error of why it's failed. You can have bool TryCompile(string) and string GetReasonWhyCompilationFailed(void), etc, etc...

Someone who's using hard-coded regexes can then assert that TryCompile succeeds, and they can log the reason if the assertion fails (assuming you've got an assertion macro that takes a reason string).

Someone who's parsing a user-generated file can then check that all the regexes in the file compile successfully, and if they don't, they can return early from parsing in error, with their own reason string. This reason string could prepend the filename and line number that it was up to, and append the error string from the regex engine.

The GUI system that launched the file-parser can then take that returned string and shove it in a pop-up box, etc.

If that kind of thing covers all your use-cases, then there's no real need for error code enums... There's not a real need for exceptions either, unless it makes aborting the file-parser easier.


You don't really need a generic system then. The regex part needs to be able to spit out a human readable error of why it's failed. You can have bool TryCompile(string) and string GetReasonWhyCompilationFailed(void), etc, etc...

Out of curiosity, should the code care about why the function failed? I have an integer return code that is returned, and if it isn't successful, it propagates the code upward, but with a boolean approach, it either failed, or it didn't.

Now, this approach has overhead. Every object that can fail must now have a dynamic string object, whether it fails or not.

I have to wonder, am I out of options here? Either exceptions with overhead whether I use them or not, millions of return code checks with minimal information, or some Frankenstein's monster combination of both using polymorphism and dynamic allocation?

You could implement setjump/longjump exceptions that would not be disabled by compiler settings (provided that the compiler is linked with a complete C library). With some suitable macros you could make the syntax halfway decent. The problem is, calling setjump is expensive and must occur at every would-be try block, even in the normal flow of code.

Just listing this as an option.

I have to wonder, am I out of options here? Either exceptions with overhead whether I use them or not, millions of return code checks with minimal information, or some Frankenstein's monster combination of both using polymorphism and dynamic allocation?

You could implement setjump/longjump exceptions that would not be disabled by compiler settings (provided that the compiler is linked with a complete C library). With some suitable macros you could make the syntax halfway decent. The problem is, calling setjump is expensive and must occur at every would-be try block, even in the normal flow of code.

Just listing this as an option.

While I agree that it has its uses, it majorly disrupts the flow of execution, without providing any more information than a return code. My main goal is providing as much information as possible, in a way that I can propagate errors to callers, without the pollution of a billion different return codes in the same namespace.

You don't really need a generic system then. The regex part needs to be able to spit out a human readable error of why it's failed. You can have bool TryCompile(string) and string GetReasonWhyCompilationFailed(void), etc, etc...


Out of curiosity, should the code care about why the function failed? I have an integer return code that is returned, and if it isn't successful, it propagates the code upward, but with a boolean approach, it either failed, or it didn't.

Now, this approach has overhead. Every object that can fail must now have a dynamic string object, whether it fails or not.

You said above that the code doesn't care why, but it must print the reason why to the user. So you end up with a boolean success, and a message for the user.

If storing the error in the regex is a concern, then TryCompile can return the reason string instead of a bool if you like -- where a null error string indicates success.
Or it could return a pair{ bool success, string error };

Alternatively, you can copy the Windows API's solution. They allocate an error string per thread, not per object. Many functions fail with an error code that's basically equivalent to ERROR__SUCCESS_IS_FALSE. If you get a failure like this, then you call a global function to retrieve a human readable string of the most recent error to occur on the current thread.
Under this design, you'd have a global ThreadLocal<string> g_error, and TryCompile would write to g_error before returning false.


My thinking is that if the ultimate destination of the errors is always a human -- either a user-facing GUI, or an assertion/log message for a debugging programmer -- then what you really want is a very descriptive string, to which you can append specific information, such as line numbers, values of variables involved, etc, etc. An error code doesn't allow for that kind of extension.

Also, if the destination is a human, then at some point you need a mechanism to convert the error code into a string anyway. For the sake of loose-coupling, it seems bad for a regex module to return a code, which is then converted to a string by a completely different module (say, the GUI). This would mean that the GUI module has intimate knowledge about regexes, so that it can translate regex-specific errors to English... It seems cleaner to keep that translation local to the regex module.

However, this then complicates international localization -- you don't really want all your text hard-coded to English...
To get around this, you'd need your error objects to be quite fat, containing all the extended information (line numbers, values of bad variables, etc), and a string which acts as a key into a localization dictionary.

That ends up being something pretty complex, like:


Dict[string, string] locale = { "Err_regex_foo" -> "A regex was bad, see column %d" };
 
struct Error { string key; vector<Variant> data; }
 
function TryCompile(...)
  ...
  return Error{ "Err_regex_foo", { 42 } }
 
string TranslateError( Error e, Dict locale )
  format = locale[e.key]
  return sprintf( format, e.data )


You said above that the code doesn't care why, but it must print the reason why to the user.

The truth is, a regex can be used for anything. I have no way of predicting where it will be used, even if it is me that does it. Thus, I cannot know if the code will or won't care, and thus, I leave the opportunity to do so.


If storing the error in the regex is a concern, then TryCompile can return the reason string instead of a bool if you like -- where a null error string indicates success.

The way it is set up in my library is to have a pattern class, and a matcher class. It is possible to compile a new pattern in the same pattern object, and if compilation fails, it has the same state as it did before attempting compilation. It seems odd to keep error information in an object that has a valid state.

Since this is a lower-level library to be used by higher objects, this needs to be written first, yet this is sounding as if the design depends upon what will use it, not the other way around.

I have to wonder, am I out of options here? Either exceptions with overhead whether I use them or not, millions of return code checks with minimal information, or some Frankenstein's monster combination of both using polymorphism and dynamic allocation?

You could implement setjump/longjump exceptions that would not be disabled by compiler settings (provided that the compiler is linked with a complete C library). With some suitable macros you could make the syntax halfway decent. The problem is, calling setjump is expensive and must occur at every would-be try block, even in the normal flow of code.

Just listing this as an option.

While I agree that it has its uses, it majorly disrupts the flow of execution, without providing any more information than a return code. My main goal is providing as much information as possible, in a way that I can propagate errors to callers, without the pollution of a billion different return codes in the same namespace.

You could implement that with a global variable that stores a pointer to the thrown exception. In fact, you can implement all of C++ exception handling and more with setjmp/longjmp. (For instance, you can implement multiple concurrent exceptions, as in D.) The two downsides are speed, and ugly macros.

From your later post it sounds like you're trying to pass error information across a library boundary -- in this case setjmp/longjmp is probably too ugly to expose, but you could use it for internal error propagation.

There are other ways to be fancy: you could pass in an "on error" function, you could pass in a "continuation" which doesn't get called when there's an error, or you can pass around an "Object-or-error" class template. But being fancy is usually bad for a library interface.

You could implement that with a global variable that stores a pointer to the thrown exception. In fact, you can implement all of C++ exception handling and more with setjmp/longjmp. (For instance, you can implement multiple concurrent exceptions, as in D.) The two downsides are speed, and ugly macros.

My goal isn't stack unwinding and disrupting the flow of execution, but the informative aspect; an exception object is constructed on throw, and thus, you don't pay for it if you don't use it. However, you must pay for the potential to unwind the stack and RTTI, whether you use it or not.


From your later post it sounds like you're trying to pass error information across a library boundary

Yes, in a way. Other modules are within the same library that use it, so they're in the same translation unit, but code outside of the regex class will use it. There will be code outside of the library that uses it, too.

Though, since it is a header-only templated implementation, technically, it's in every translation unit that uses it.


But being fancy is usually bad for a library interface.

If there is no solution to this request, then I can just go back to using return codes, and make do. There isn't a point in coming up with exotic solutions that have most of the same pitfalls as the established best practices.


King Mir, on 11 Feb 2014 - 7:03 PM, said:

You could implement that with a global variable that stores a pointer to the thrown exception. In fact, you can implement all of C++ exception handling and more with setjmp/longjmp. (For instance, you can implement multiple concurrent exceptions, as in D.) The two downsides are speed, and ugly macros.
My goal isn't stack unwinding and disrupting the flow of execution, but the informative aspect; an exception object is constructed on throw, and thus, you don't pay for it if you don't use it. However, you must pay for the potential to unwind the stack and RTTI, whether you use it or not.

You pay for it either way.

Exceptions can be implemented with code or by data, or a mixed approach, depending on the compiler.

If they are implemented in code there is a runtime cost (even if you don't call an exception) is incurred every time a try, catch, or finally is placed in your code. It also adds a small cost to every function call's prologue. Based on numbers I've read, the cost is about a 6% penalty globally.

If they are implemented as data tables there is an executable size cost with a minor lookup fee when exceptions trigger. The exact cost depends on the implementation details. The system normally piggybacks on RTTI. If RTTI is not enabled then you can expect around a 10%-15% increase in executable size depending on how you uses classes, vtables, and other factors. If RTTI was already enabled the increase may be under 5%.

So if you use RTTI to get the info you pay for it. If you use exceptions to get the info you pay for it. You get to decide how affordable the payment is.

One of the major long-term complaints about C++ exceptions (and also RTTI) is that unlike many other language features, you absolutely pay a cost by their mere presence. This was a major sticking point in the initial standardization of the language and it remains an open performance issue. (Clicky 1, clicky 2, and many more.)These two features have always been among the first to go in game development in large part because of the size and space costs. When you are on a game console or embedded system having that much permanently wasted memory or that much globally lost performance is a very real concern. It is less pressure than it used to be, but when you are programming on a 66MHz console and paying for cartridge storage by the bit, the cost is huge.

This topic is closed to new replies.

Advertisement