How to cause bugs that you don't know how to do it?

Started by
9 comments, last by Steve_Segreto 6 years, 10 months ago

So I'm doing game testing and there is a bug that I encountered it once and even when doing the exact same thing, I still can't get that bug to come. I have spent hours trying to get the same bug to happen and even if it did, the programmer who is suppose to fix it asked me to do it again which is annoying because again, this bug that I did took so long to even cause it.

The worst part is that it is a very rare bug and a major one as well.

So how to get bugs if you don't know how it happened? What's the best way to figure out how that bug happened?

 

Advertisement
15 minutes ago, Paarth said:

So how to get bugs if you don't know how it happened? What's the best way to figure out how that bug happened?

 

No one can tell except maybe the programmer who wrote that code. The best way to figure it out would be to recall how you encountered that bug and repeat that process in hope of finding it again. 

A) You can try to repeat your steps to reproduce the bug.
B) You can create a log file which writes any buggy behavior to a file.
C) You can create a system to record your game, and then replay it. If you encountered a bug, then replaying your input actions and events should reproduce the conditions which created the bug.
D) At a bare minimum, write down what you were doing and the unexpected behavior you experienced.

There is also the possibility that the bug itself is non-deterministic. For example, many forms of memory corruption manifest differently depending on where in the address space individual objects happen to be allocated.

For a bug of that type, exactly reproducing the steps doesn't offer any guarantee that the bug will reproduce. You may have to fall back to stressing out the system in the hopes that it reoccurs randomly.

Or, if you can't reproduce the bug, then the programmer may have some luck running Valgrind (or an equivalent tool) to chase down buffer overruns and other common causes of memory corruption.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

2 hours ago, Paarth said:

The worst part is that it is a very rare bug and a major one as well.

So how to get bugs if you don't know how it happened? What's the best way to figure out how that bug happened?

I can identify fully with this. The difference is I am the programmer and the tester but that even makes testing more difficult. While testing one of the projects I'm working on I encountered a serious bug but I lost track of how to recreate it. It was rare but when it occurred it was serious (though it wasn't a crash). 

Herein there is no short cut, you have to gain real familiarity with the game/tool/program. At first i had only a very vague idea of what action paths led to the bug, so i set out to use process of elimination to recreate it. It wasn't any fun, I continued repeatedly working though various action paths sometimes the bug popped up again but because i had zillions of attempts in between that didn't reproduce the bug i was so often caught off-guard. But i was still relentless and slowly but surely i gained enough familiarity. It took more than four days of continuous testing but I finally nailed it. Afterwards fixing the code was still moderately difficult as it involved keeping track of several variables in a huge code base with some refactoring .... but still easier than recreating the bug-path

Yeah... just continue testing , slowly but surely you will get familiar  enough to recreate it   

can't help being grumpy...

Just need to let some steam out, so my head doesn't explode...

Networking, 3rd party code is involved? Those can cause random shit too.

Anyway, when I run into bugs like this, I usually apply a fallback-like "fix" at the smallest scope I can locate where the bug occurred. I test if this ""fix"" doesn't cause problems, and I am done with it for that time. I cannot put enough quotation marks around the word "fix".

It also helps if others test or use the software:
The latest of this random crap (a measuring amplifier fails to work in buffered data acquisition mode, even after) I ran into was eventually coming up in a rather deterministic way (always) for a specific use case and it turned out that my fix I applied months earlier worked pretty good, and I could improve the fix (still no buffered DAQ but at least the program properly fails back to software timed sampling). The root cause is still not clear, our best guest is that our special/custom/motherfucker corporate firewall is blocking a port for the program only (or rather it's opening the port only for a specific other program, which works fine with buffered DAQ). The device can connect and read samples, but buffered DAQ required another port to be open.

Dunno, maybe there's a proper way to test/debug these problems, but I'm not a real programmer, so :ph34r:

Story time over.

Can you get more testers?  The more people trying to duplicate it, the more chance there is of doing it.

 

edit: And the more you'll learn about circumstances that cause it to occur.

As mentioned, log everything whenever there is a crash.  Even if you can't reproduce it today a pattern may eventually emerge.

Among the non-reproducible crash bugs in one project, one tester noted a pattern for some of them. Rarely someone would crash after winning a multiplayer game. The specific tester noted the crash was typically on a Thursday mid-day to afternoon. That was finally the key information to a crash we'd seen a few times.

When the log files on the server was rotated and weekly scores reset, the server would send back an empty set of high scores to the first victor, and the code blindly dereferenced the first item on the list. It could happen under other circumstances, but always happened on Thursday mornings when the rotation and reset happened.

Thank you all for the advises! I managed to find it and hopefully, it got fixed.

 

Ask your programmers to add asserts that fast_fail in their code and then run your test code with asserts enabled, capture the crash dump plus heap and link it to the bug for the programmer to investigate the failed assert at their leisure. If possible link a keystroke or other input to an assert that always crashes the program. Use this only if you are experiencing a show-stopping glitch that doesn't actually trip an assert or crash in any other way.

This topic is closed to new replies.

Advertisement