Safety vs Efficiency

Started by
30 comments, last by Ravyne 8 years, 8 months ago


An error should be reported loud indeed, but it shouldn't fail hard, as it depends on the nature of the error.

I think perhaps we are operating under differing definitions of the term 'error'.

I generally define 'error' as any event with risk of data loss or security breach - artist placing mesh in the wrong place seems at worst to be some sort of validation warning.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Advertisement

I agree. The points I was trying to make were more about daily production and how to deal with bad data or when some assumption in your algorithm is not satisfied.

ail fast and fail loud, there is no other way smile.png

I recommend to rethink this. I do quite some interviewing these days and saying something like this in an phone screen or interview would be a dark red flag. I don't know what experience you have in professional development, but maybe don't think so much shipping, but daily development in large teams of maybe 100 people. If you crash loud and hard on every little bug because of not programming defensively I don't see such a team working effectively.
FWIW, in my professional experience, crashing on assertion failure (if a debugger isnt attached) is very common.

You should also have a basic automated build-and-test in between commits being pushed to the central repo, and commits actually being merged into the main branch.
i.e. if you try to push code that contains an assertion failure on any platform, your code should be automatically rejected before it has a chance to reach the other 100 people on the team.

Even at jobs where most asserts were continuable, we'd routinely have "broken builds", where someone's committed a bug (often non-crashing - e.g. cant navigate the menus - but would've still been caught by a simple autotest) or a broken art asset... And dozens of people waste half an hour waiting for a fix.
Most of the time even if you do continue from an assetion, the game is somehow in a corrupt/invalid state anyway, and will likely crash on its own, unless you've also spent time writing error-handling code that's designed to work-around buggy assertion-failing code, so you may as well crash and produce a useful memory dump and force the bug to be fixed/reverted right away.
IMHO testing all commits is a better solution to the broken-build problem, rather than just hoping that invalid, invariant-violating code happens to work well enough to not impact the team.
Well, my professional experience is opposite. E.g. We were using some popular middleware which was crashing on assertion as you describe. We got rid of it because it was major annoyance. Our assertion system is similar to SDL and the user can decide what to do. At Havok assertions are tagged and can be disabled. This is a very useful feature which shows their long expierience in the industry in my opinion. If crashing in an assertion is the only way to get a bugs fixed in a team I have a hard time seeing how this is effective.

Looking through the whole thread again I think there is no 'one' answer to that question. There are so many factors that come into play and I think it takes some experience to make good decisions what the right choice is.

Automated tests and build systems are great indeed. I totally agree those should be used,
As a side note: I never liked the assert that comes with the CRT as it doesn't break at the location of the actual assertion and you need to move up the call stack.

FWIW, I found the this post very useful in the past and the implementation also describes a way to register different handlers which can be used to implement a system that allows for continuation:

http://cnicholson.net/2009/02/stupid-c-tricks-adventures-in-assert/

Continuing after an assertion is such a brilliant idea, what could possibly go wrong by running the program in an invalid state..

From personal experience: I worked under a "senior" developer who thought allowing the user to continue after assertions and unhandled exceptions (..) was a good idea. I don't remember one day where I didn't waste my time investigating bogus bug reports that only existed because of invalid state. Seriously, fuck that.

Sure, I don't think it is a good idea sending bug reports after you decided to ignore an assert. I didn't advocate this.

Maybe an example to keep things less abstract: I recently wrote a Maya plugin for our physics engine Rubikon. In the solver I compute values which are supposed to be larger zero. This is an invariant so I enforce it with an assertion. If I would crash in that case this would shut down Maya and the artist would have lost all his work. The only thing that should happen (in my opinion!) is that the simulation looks bogus and he resets the scene and can ask me what went wrong. The assert gives me a hint to identify the problem. I walk over his desk, have a look with him and fix his scene. Then I fix the bug that led to the assertion by catching wrong user input much earlier (e.g. by ignoring the invalid constraint when building the scene and give a proper warning) and the problem is solved.

I don't want to be pedantic about assertions. Having a similar macro (e.g. Verify) would be totally fine with me. Then we can have assertions for the case where things get into a really bad state (e.g. memory corruption). It might arguable if the extra complexity with several macros buys you anything. Probably a matter of personal taste.

Well after yesterday boost uuid string invalid exception. Which got court in the network worker thread. That the network it self does not call generate uuid string. The error was not court where it happened and went down the chain till it found a try catch block. The only thing I new was that some where a string uuid failed, but where. Turns out it was in the disconnect for over due connection time. I was calling generate uuid on a empty string with out a try and catch around the statement. I would rather a return false. In fact all my own code used return valued all errors are handled where they happen.


The error was not court where it happened and went down the chain till it found a try catch block.

You are aware that you can set your debugger to break whenever an exception is thrown (not just when it is caught)?

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

You are aware that you can set your debugger to break whenever an exception is thrown

Umm no I did not. Im new to the whole exception thing forced(not really boost allows you to pass a error class boost::system::error_code& ec to functions).

We did learn about them in class some 20 years ago, just forgot never used them, did not like it at the time and never thought of it again until yesterday.

Search through the code base for boost::uuids::string_generator found this

.


boost::uuids::string_generator gen;
boost::uuids::uuid u1 = gen(appguid);

.

Where appguid on client that never finnished connecting can be NULL. Totally my fault for thinking I'll get the supported app class done quick and no error checking needed.

Which goes against the protocol I set for that very reason in the past, But every so often I breach that for what ever reason who knows.

This topic is closed to new replies.

Advertisement