Jump to content

  • Log In with Google      Sign In   
  • Create Account

Eat Your Dogfood. It's Tasty.

Posted by ApochPiQ, 20 November 2012 · 677 views

So a few days ago I shipped Release 13 of the Epoch programming language.

Turns out, that was a bad idea.

R13 has some seriously aggressive features in it. There's support for native algebraic sum types, type aliases (both "weak" in the sense of C/C++ typedefs and "strong" in the not-weak sense), and - most interestingly for me - templates. Moreover, R13 includes a lot of back-end refactoring to support all these features in a clean and relatively nice way.

As it happens, all of the unit tests for R13 functionality pass. So I was comfortable firing it out the gate, thinking that it was covered well enough that I could start writing some test software against R13's compiler and get some cool demo programs done.

One of the demo programs I wanted to write is a simple raytracer. This is for two reasons: first, a good raytracer can be a compact piece of elegant code. A bad raytracer can be a nightmare, as I know from firsthand experience. If Epoch allows the program to be written in a clean and elegant way, that's good; if the implementation has to be messy and gross, that's bad.

The other reason is it'll give me a nice foundation for improving the native-code JIT system, which will be crucial for keeping performance acceptable in running Epoch programs.

I had barely started working on the raytracer implementation when I found my first type system bug in the R13 compiler.

Long story short, I've spent almost all my time post-R13 fixing bugs that shouldn't have shipped in the first place.

Why did this happen? I have a suite of unit tests, including tests for all the new features. They all passed. Where did the bugs creep in?

The problem with unit testing something like a compiler is combinatorial explosion. Sure, templates work. Type aliases work. Sum types work. But once you reach the point where you need to use a templated sum type to store a type alias which refers to a sum type that might contain a templated structure instance, stuff gets messy.

Epoch is still a pretty fast-and-loose project in terms of development discipline, and I rather appreciate the freedom to fire off quick releases and move forward at my own pace. I'm interested in preserving that "culture", but at the same time, I really need to revisit my reliance on simplistic unit tests.

The only remotely non-trivial program I routinely test in Epoch is the Era IDE prototype, which isn't exactly stretching the compiler's limits at this point. Adding the raytracer to the test suite will be good in multiple ways: it will ensure that a decent cross-section of features gets tested as it would be used in real software; it will demonstrate good practices for writing software in Epoch; and it will provide a benchmark of both compiler and VM performance that I can hopefully continually improve on over time.

But that may not be enough. I think I need to start writing borderline-pathological test cases to make sure that all the cobwebs are swept out of the corners of the language - both in terms of design and in terms of implementation.

Good times.

I am very interested to see where you go with testing.

At work, we developed a proprietary image analysis and measurement library. There are more than a hundred separate measurements to be done to various blobs on an image. These blobs can be overlapping, and completely separate sub-blobs can all be parts of a single blob. It is complete freedom.

Because you want to be able to automatically detect and then measure moving blobs as fast as possible (some cameras can give more than 2k x 2k resolution at 100 fps), the measurement library is designed to reuse as much as possible. Because some measurements require other measurements, we want to add them behind the scenes if needed, but we don't want to measure them twice, and we don't want to measure them at all if not needed - some measurements can be fairly expensive. If the object did not change structure, we can reuse many measurements, but if it moved we will have to remeasure the ones based on intensity.

Add to this that some images are mono and some are color, some are 8 bit and some are 16 bit, and other variables but I will stop there, and you have a very difficult testing environment.

Currently we actually do test every possible combination, by group, given internal knowledge of which groups of measurements use the same data. It takes 3 or 7 days to do 1 iteration (64-bit and 32-bit environments, respectively), and it just repeats endlessly on a couple dedicated machines in an application which automatically updates tests and dlls (and itself) as the library changes. But what happens when we add just 1 more variable? Eek.