• Advertisement
Sign in to follow this  

Test Driven Development in Game Dev

This topic is 4712 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Some things are very easy to test, some are quite difficult. For example, if you are working on a space craft for a game, it's a simple task to test the propulsion system: check the location of the ship, fire the engines in a certain way, check the location of the ship, and if it moved as you expected the test passes. but say you are testing the graphical representation of the ship on screen. How do you test that the ship has been rendered in the correct place, in the correct orientation, in it's entirety? I mean, I can only imagine that it would entail checking pixel by pixel the area that you expect the ship to be in. But the actual representation is essentially arbitrary, it only has meaning in a greater context that only the human brain can understand. How can a test be made to "know" that the representation is "correct"? Have I merely not thought about the problem enough, or are there certain aspects in which TDD is innapropriate/prohibitively difficult? I can see similar problems with the testing of playing sounds.

Share this post


Link to post
Share on other sites
Advertisement
It's not hard to just take some screen captures as part of your testing process, and then compare to these images. As long as your graphics can be run at an arbitrary time ( not tied to wall-time ), you can get repeatable results.

GPUs and graphics drivers are tested this way, using saved off images to test for regressions.

You obviously need a very specific test - you can't tell that all space ships are rendering correctly unless you render all possible space ships, but if you test 10% of them, and they all pass, you can have more confidence in your basic rendering loop.

Share this post


Link to post
Share on other sites
After having spent some time working on tests for hybrid pc/embedded, highly asynchronous communications and software rendering systems I've come to the same opinion. Some things are difficult to test with regard to standard unit testing frameworks. I've also come to realise that there are some things that you don't need to test at all - like third party API's that you have to rely on.

For example, unless you are writing a software renderer, are the output images the output that your tests should be examining? Are you questioning the correct rendering of your models in DirectX?

I'm sure you're not, but I'm sure that you do have 'business rules' that you can check every frame. If something goes wrong with the actual rendering output having a human in the loop may be the only way to perform thorough tests, but with some subtle things, would even a good beta tester even realise that a problem had occured?

I'm no expert on DirectX or OpenGL, but how about arranging your code so that a simple sequence of images can be written to buffers that can be compared for testing the obvious failures very quickly:

Images
------
a) just the background / terrain
b) a) + spaceship
c) b) + enemies
d) c) + particles

Tests
------
1/ That the spaceship is always is always rendered (it's not being clipped out)
2/ That particle systems are changing the output image in the correct area.
...

Once you start writing tests for the very obvious things, it becomes much easier to write tests for subtle things - which you generally add when things break. The hardest part is writing the first few tests. Even if you decide against testing, I believe that even time spent considering what areas of your code are high risk areas, prone to failure is useful - even if all it means is that you change the documentation of those classes to indicate them as high risk for future work, and what failures are possible from changes.

One thing I have found is that design for code with tests and without tests is substantially different, but that testable code makes up for the hassle when it comes to optimisation, refactoring and correcting faults introduced later when you can just run a set of regression tests. At times in the past I've seen teams (especially student teams) abandon testing entirely because a few aspects were too difficult to test rigorously. I think this becomes an excuse far too often to write no tests at all.

No matter how or what you decide to test, I would strongly recommend chaining testing to the end of your build cycle (on visual studio to write to the output window, or to produce html results which you can refresh in a window of your web browser) having test results 'in your face' encourages you when you pass, and encourages you not to let failures hang around for long when it doesn't.

Share this post


Link to post
Share on other sites
Quote:
Original post by XXX_Andrew_XXX
For example, unless you are writing a software renderer, are the output images the output that your tests should be examining? Are you questioning the correct rendering of your models in DirectX?


I would! The issue however is that the conditions that cause problems tend to be esoteric and are unlikely to be discovered through a 'compare to reference rendering' unit test. Or the reference renderer is broken too.

To test something like this you need a 'known good' starting point. So the first few times you manually inspect the rendering to ensure it is what you expect. After that you capture it to an image file and compare that against the current 'experiemental' rendering. So you have to have a functional screen capture working first. The image ought to be exactly the same every time after that on the same hardware and drivers. Crude image comparisons would break the image into macro-blocks and average the color and compare those and then report a % match; from that you could say something like a 99.99% or better match is a pass. Measure the amount of time it takes too, that way if something suddenly starts taking twice (or half!) as long you know. IIRC, there was once a change to the way nVidia handled compilers vertex buffers (OGL) that caused a big change in performance.

This might be useful for graphics engines and thier games that have many configuration options; it would let you show your players exactly what works and does not work on a given 3D card and even show them the performance hit enabling/disable a given option yeilds (for thier setup, you could run your regression test on thier PC).


With complicated options, I can see a shader being misconfigured and either render something it shouldn't or not render something it should. If you start using more sophisticated progressive meshing techniques you could make a mistake in the laydown of the vertex/color/texture buffers etc... It'd be nice to know that after todays 'improvements' to the CLoD algorithm there was a 7% decrease in the image match. It would singal you to go look at that rendering and see what's going on, maybe the CLoD just caused that much decrease and it should be considered normal (have a history in the regression test) or maybe there's a bad triangle.

I don't think it would be that hard either - or rather it's not much more work that what you already plan to do; assuming you have a regression test plan and plan on having screen-captures and in-game performance monitoring. You'd have setup unit test, but the value of regresion test is suppose to offset that cost. So the only thing left to do is write the image compare function which could be tricky.

Share this post


Link to post
Share on other sites
Definitely go with mock objects.

There are similar issues with unit testing GUIs. A good idea is to make the gui component as thin as possible and pass the real work onto something which can be tested.

This would apply in you case by having a mock renderer which, instead of drawing to the screen, could log all commands to a file, or create a composite object of commands which can be compared with a test fixture, for example.

If you are refactoring then you don't want changes to one part of your program to affect another part. So you'll be testing that the order of function calls hasn't changed, and the parameters passed through are the same, and assume that the screen will look the same.

If you are changing the way you're trying to achieve a graphics effect then that's not refactoring so a mock object wouldn't be helpful.

Share this post


Link to post
Share on other sites
All nice ideas. Now, if someone could tell me how to robustly unit test multi-threaded code, I'd be happy.

Share this post


Link to post
Share on other sites
Sometimes unit testing is more trouble than it's worth. Logic suggests that in those cases it shouldn't be used.

Share this post


Link to post
Share on other sites
I love the idea of comparing current data with known, good reference data. I have yet to try it out, but using the same technique it should be possible to automate play-testing of long game-play sessions. Here's the idea:

Have a tester play through a completed and working level. Record the tester's actions along with the visual output. Use the tester's actions as input to a demo mode -- and as long as it is deterministic the visual output should be the same. Compare the automated demo's output to the saved known good output and you have automated play-testing! If the portion being tested is still in a heavy state of flux, this obviously will not be that productive, as reference sessions will need to be re-recorded constantly. I think it would be a huge help to make sure that the first level of the game, made six months ago, still works correctly without requiring someone to play through it constantly. This idea should be extended to sound output and all development targets. For consoles, it will most likely not be possible to stream the framebuffer to a mass-storage device, so perhaps it should be recorded by intermediary hardware between the video-out and the T.V.

I'm curious if anyone has done anything like this. Feedback ahoy!

Share this post


Link to post
Share on other sites
We're are currently using test first developement for our game engine (2D). We broke our test in a few categories. Automated, Slow-automated, and interactive.

The Automated is a set of test case that can all be run under 20 seconds or so (and going up as we add more test).

The Slow-automated is our GUI. You can run these test in Recording mode or testing mode. The record mode will let you play and it will capture screen shots after each events has been played out. It's up to the recorder to make sure everything is in order. If it is, the session is saved and next time you run the test in testing mode it will compare the screen shot and output screen shot with highlighted difference (if anything different is found). At the initial staged these changed alot, since a change in fonts or tweaking on graphic position caused failure. Now, I'd say it's worth it.

The interactive, well it's just that. Testing outward device like printers or scanner that needs input or output. We do have virtual device when testing the application in automated mode, but we still need to test those device eventually. It's nice to have a set of test case that someone can follow and be prompt for a response. (i.e Here's how the ticket should look like, are they similar)

Shadx

Share this post


Link to post
Share on other sites
Well, the TDD was succesful, I found some integrity problems in my random map data, and was able to fix it based on the information from the tests. I guess things are working out okay then.

Share this post


Link to post
Share on other sites
Gamasutra recently posted a nice article "Automated Tests and Continuous Integration in Game Projects"
http://www.gamasutra.com/features/20050329/roken_01.shtml

It discusses regression tests (automated comparison against manually verified data) for graphics testing.

Share this post


Link to post
Share on other sites
Quote:
Original post by Deyja
All nice ideas. Now, if someone could tell me how to robustly unit test multi-threaded code, I'd be happy.


Buy a NUMA SMP server and present your software with the maximum work load it can encounter (suspend worker threads, queue up request, and then resume the workers?). Ensure the end results are identicle to the single-processor case.

Code reviews are probably the most effective for finding asyncronity bugs.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement