Sign in to follow this  
CoffeeMug

Scalability of Unit Tests

Recommended Posts

I am trying to understand how to properly do test driven development. I know I'm doing it wrong, but I can't figure out how to do it right. Here's a simple example that illustrates my problem. Suppose I wrote a function, sqrt, that estimates a square root of a number with Newton's method. I also wrote a test for it, sqrt_test, that tests it for some input. Now, suppose I wrote a solve_quadritic function that takes cooficients for a quadratic equation and solves it using a well known formula, utilizing my sqrt function above. I also wrote a test for it, solve_quadratic_test, that tests it for some input. So far so good. Now, there is an obscure bug in my sqrt function. I have to fix the bug. Now I've broken two unit tests - the one for sqrt function, and the one for solve_quadratic, even though I only changed the code in sqrt. You could make a number of arguments about this example (that both tests are supposed to fail because my change in sqrt affects other parts of the program, that my tests aren't actually unit tests but some other kind of tests, etc.) but the fact of the matter is that this type of testing does not scale. If a small change in the lower layer propagates all throughout my software and almost every minor change requires me to change tests that are responsible for other layers, very soon I'll spend most of my time dealing with fixing tests rather than adding features. It seems that the fundamental problem here is that solve_quadratic_test doesn't just test the algorithm in solve_quadratic, but also its dependencies. I can think of a few ways to get around this problem (pass a function pointer to sqrt to solve_quadratic so that during the test I can pass a mockup, start abstracting away unit tests, etc.) but all of these aren't really workable. So, how am I supposed to go about solving this problem?

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Now, there is an obscure bug in my sqrt function. I have to fix the bug. Now I've broken two unit tests - the one for sqrt function, and the one for solve_quadratic, even though I only changed the code in sqrt.

What's this about breaking unit tests? You add unit tests for bugs you fixed. Add a unit tests for the bug to the sqrt test suite and you're done.

Quote:
If a small change in the lower layer propagates all throughout my software and almost every minor change requires me to change tests that are responsible for other layers, very soon I'll spend most of my time dealing with fixing tests rather than adding features.

Except that they don't propagate.

Quote:

It seems that the fundamental problem here is that solve_quadratic_test doesn't just test the algorithm in solve_quadratic, but also its dependencies.

Then you're writing your tests wrong. Write your quadratic test to test the quadratic solver. Let the sqrt unit tests test the sqrt function.

Share this post


Link to post
Share on other sites
It seems to me that having both tests fail is a GOOD thing, and part of the point of unit tests. That is, to have automated code helping you in tracking down all the dependencies and verifying that all code dependent on code A still works after you change code A.

The problem that you're describing seems more of one in the Unit Test reporting mechanism itself. I.e., you still want to know that all your unit test are failing, but you want to be able to see what the most likely candidates are for a common/shared piece of code that's causing the problem; perhaps some analyzer that goes through, finds all code that's failing, determines common functions, sees if those are failing... you could help the analyzer by explicitly specifiying dependencies in your test data structures (i.e., a UnitTest class would take as input UnitTests that it's dependent on).

Passing a function pointer to your solve_quadratic_test to get around using the sqrt you use in practice seems bad, because then you're not actually testing the real code. What if the bug in sqrt is only exposed by the usage of it in that function?

All that said, I'm not a test-driven development junkie or anything... all I know is I want strong assurances that when I change a piece of code that's shared and used throughout many layers, I want as many ways of catching side-effects as early as possible. The more tests that run that piece of code the better IMHO.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
What's this about breaking unit tests?

Well, the fix for my bug broke my existing unit tests. Suppose originally sqrt returned 1.415, and my test asserted that sqrt(2) returns this value. Now I fixed it to return the correct value 1.414 - my existing test is now broken. Furthermore, my test of the quadratic solver is broken because the test was a simple assertion that the quadratic solver returns some values for a certain cooficient.
Quote:
Original post by SiCrane
Except that they don't propagate.

Well, I just showed you that they do. I understand this means I am writing the tests wrong, but I'd like to figure out how to write them right.
Quote:
Original post by SiCrane
Then you're writing your tests wrong. Write your quadratic test to test the quadratic solver. Let the sqrt unit tests test the sqrt function.

But how? This is what I mean, how can I structure the code in such a way that solve_quadratic_test tests the quadratic solution algorithm but not the square root? If I do a simple assertion that solve_quadratic returns a correct value for some cooficients, I can't ignore the fact that this value depends on the square root. And if I don't do that, how can I structure the code otherwise? I could have solve_quadratic return high level description of the equation, test that the *equation* is right, and then write an interpreter for it, but that's clearly not a workable solution.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Well, the fix for my bug broke my existing unit tests. Suppose originally sqrt returned 1.415, and my test asserted that sqrt(2) returns this value. Now I fixed it to return the correct value 1.414 - my existing test is now broken.

That's not a bug fix breaking a unit test. That's an improperly written unit test, which is a different animal. You fix the unit test, check it to see if it works, and move on.
Quote:

Furthermore, my test of the quadratic solver is broken because the test was a simple assertion that the quadratic solver returns some values for a certain cooficient.

Why? What unit test broke?

Quote:
Well, I just showed you that they do.

No, you didn't. You stated it without demonstrating it.

Quote:

But how? This is what I mean, how can I structure the code in such a way that solve_quadratic_test tests the quadratic solution algorithm but not the square root?

By writing tests that tests the quadratic solution. If the test fails; then you determine why it fails. If the failure is because your sqrt function is borked, you fix the sqrt function, add a unit test for the bug, and rerun your quadratic equation solver test.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
By writing tests that tests the quadratic solution. If the test fails; then you determine why it fails. If the failure is because your sqrt function is borked, you fix the sqrt function, add a unit test for the bug, and rerun your quadratic equation solver test.

I don't understand how that would work. Here's some pseudocode:

float sqrt(x)
{
// Newton's solution with a bug
}

bool sqrt_test()
{
assert(sqrt(2) == 1.415);
}

float quadratic(a, b, c)
{
// Calculate via quadratic formula
}

float quadratic_test()
{
assert(solve_quadratic(1, 2, 3) == some_value);
}

I now fix a bug in sqrt to return 1.414 for 2. Now sqrt_test assertion is broken, yes? I need to fix the test. Also quadratic_test is broken, because now that I fixed sqrt, solve_quadratic returns a different value.

How should I be writing the tests correctly?

Share this post


Link to post
Share on other sites
Quote:
Original post by emeyex
It seems to me that having both tests fail is a GOOD thing, and part of the point of unit tests.

May be. But in practice what happens in my project is that I change lower-level layers, I *want* them to be changed, and then I just have to go through dozens of unrelated unit tests that demonstrate dependencies and fix them. It is very very rare that this actually catches an undesirable situation. The reason why I ask the question is because the process hinders me so often, and helps me so rarely, I started questioning its value.

Share this post


Link to post
Share on other sites
Well the very first step would be to never write a unit test using floating point values with an equality comparison. Use an epsilon value. The second step is to write tests that for things you actually know the answer to.

Seriously, why is your test

assert(solve_quadratic(1, 2, 3) == some_value);

if some_value isn't the answer to to the quadratic in the first place? Where did some_value come from? If you pulled it out of nowhere and dumped it in the test randomly, then of course you're writing your test wrong.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
if some_value isn't the answer to to the quadratic in the first place? Where did some_value come from? If you pulled it out of nowhere and dumped it in the test randomly, then of course you're writing your test wrong.

Well, presumably I took the existing result from my solver and looked for it in the test. In this example I can look up the value on the internet, but suppose I couldn't? In most cases what you're testing for isn't as solid as the result of a quadratic equation - you take the values that you think are right, code to get these, and then find out that what you thought is right isn't.

Share this post


Link to post
Share on other sites
You're writing your unit tests wrong, and now you know how. Seriously, why would you consider taking the result of a function that you don't know if it does what you want and putting it down as the test value to be a reasonable test? The test comes first and you get the function to conform to the test; not the other way around.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
You're writing your unit tests wrong, and now you know how. Seriously, why would you consider taking the result of a function that you don't know if it does what you want and putting it down as the test value to be a reasonable test? The test comes first and you get the function to conform to the test; not the other way around.

Because I adapted a real world situation to an example I could quickly describe in a single post [smile] Just assume that the expected result of the quadratic solver changes from time to time (which is 99% of the situations in the real world). How can I write the tests then?

Share this post


Link to post
Share on other sites
Try giving a real example. Quadratic equation solvers whose results change isn't what you would call a reasonable real world situation. In any case, you don't have to test exact results. For example, let's say you want a pseudo-sigmoid function f(x), something that has a range from -1 to 1 is monotonic and an odd function. This description lends itself to no concrete formula, but we can write reasonable unit tests for it. For example, one test could sample a number of points and make sure that f(x) is always greater than -1 and less than 1. Another test can sample successive points and make sure that f(xi) is less than f(xj) if xi is less than xj and a third test could check if f(x) is equal to -f(-x) (within a suitable epsilon) for a number of points.

Share this post


Link to post
Share on other sites
Quote:
Original post by CoffeeMug
Because I adapted a real world situation to an example I could quickly describe in a single post [smile] Just assume that the expected result of the quadratic solver changes from time to time (which is 99% of the situations in the real world). How can I write the tests then?


The idea is that you write tests that give verifiably correct results from the beginning. You can't rely on the original output from the function being tested to be the correct answer to the test, that defeats the entire purpose.

In your example, you'd want to test sqrt and solve_quadratic with values that you already know the answer to because you solved them on paper or using an existing solver proven to work (eg, for sqrt you could test your output against the provided sqrt function).

Share this post


Link to post
Share on other sites
SiCrane, sorry for misunderstanding, I now see why my example was confusing. Let me try a more realistic example.

I have an HTML rendering library. Let's consider a subset of it - a function to render an input box into HTML (render_input_box) and a function that based on some arguments renders a form that consists of a number of input boxes (render_form). So, render_form uses render_input_box - it calls it a number of times, depending on its parameters.

When I write the code, much of my programming is exploratory. I come up with what I think is good HTML, write a test for it, and then code the function to output it. I then play around and find out that HTML that I thought was good, really isn't (because of browser compatibility issues, accessibility problems, etc), so I modify the unit test to test for what I now think is correct HTML, rerun the test suite, and fix functions to output correct HTML based on broken tests. I repeat this process as I discover new information about how my library behaves (people submit bug reports, etc.)

How should I structure my tests? In the current library I simply compare the function's output HTML to the HTML hardcoded in the test. If I do this for rendere_form, then it will include form-specific HTML, as well as a number of HTML outputs from render_input_box. If, during development, I discover that render_input_box should output slightly different HTML, I change the test case for it, change render_input_box, rerun the tests, and find out that I've broken the test for render_form. Furthermore, if render_form is used elsewhere and I use this testing strategy, the tests for that code are broken as well. Now a simple change in the low level code affects tests all over the application.

The way I solve this currently, is refactor the unit tests a little bit so that my test for render_form has logic to look up expected HTML for render_input_box, and repeat it. Doing this for the whole library significantly complicates the tests. I wanted to find out what the "proper" solution to this problem would be.

Share this post


Link to post
Share on other sites
From your description I have no idea what the input and output for your functions are and what they are supposed to do. Are you feeding them HTML and getting HTML back?

Share this post


Link to post
Share on other sites
Quote:
I have an HTML rendering library. Let's consider a subset of it - a function to render an input box into HTML (render_input_box) and a function that based on some arguments renders a form that consists of a number of input boxes (render_form). So, render_form uses render_input_box - it calls it a number of times, depending on its parameters.


Writing tests at this point when you cannot verify correctness is basically pointless.

But other than that: You write a test for render_input_box. Then, you write test for render_form. render_form obviously doesn't know about render_input_box - you don't care. You're not testing render_input_box. As such, you provide a mock object as render_form's target, and measure whether it performed OK.


// Java-ish something...
class Renderable {
virtual void render_to( OutputRenderer r );
};
class Form extends Renderable {

void render_to( OutputRenderer r ) {
foreach (Renderable rr in renderables) {
rr.render(r);
}
}
List<Renderable> renderables;
};

class HtmlRenderer extends OutputRenderer{
void render( InputBox b );
void render( String s );
void render( FooBar b );
}



Those are our real objects.

Now we write the tests.

// Test renderer. It checks only how many of specific elements were rendered.
class MockRenderer extends OutputRenderer {
void render( InputBox b ) {
inputBoxes++;
}
void render( String s ) {
strings++;
}
void render( FooBar b ) {
foobars++;
}
void assert_all( int a, int b, int c )
{
assert( inputBoxes == a );
assert( strings == b );
assert( foobars == c );
}
};




void FormTest
{
Form f = new Form();
f.add( new InputBox("Hello") );
f.add( new InputBox("World") );
f.add( new InputBox("!") );

MockRenderer m = new MockRenderer();
assert( m != null );
r.render_to(m);
m.assert_all(3, 0, 0);
};



Test apply only to the class they are testing. If you find yourself in dependency situation, then you can only test the top-most object, or refactor.

In your sqrt example, you cannot write sqrt test, since it's not de-coupled. You need to provide ability to replace sqrt with a mock in quadratic solver, or test quadratic solver only.

Share this post


Link to post
Share on other sites
No, they're just high level functions for outputting HTML:


render_input_box("some-name", "some-value")
=> <input name="some-name" value="some-value" />

render_form("form-action", fields_array)
=> <form action="form-action">
<input name="some-name-1" value="some-value-1" />
<input name="some-name-2" value="some-value-2" />
<input name="some-name-3" value="some-value-3" />
</form>

Share this post


Link to post
Share on other sites
Ok, when you said HTML rendering I was thinking taking HTML and drawing pictures. Without knowing more about your development environment and the library it's hard to say, but from your two examples, there seems to be a few different approaches you can take. The most straightforward would be to maintain a set parsing functions for each rendering function, and use those parsing functions for each test. For example, when testing render_input_box(), write parse_input_box() and have it check to see if the rendered input box is of the right form and the contains the right data. (Not knowing what language you're using, I can't really say what the signature for this function would look like.) Then when testing render_form(), write parse_form() that will use parse_input_box() internally. Then use parse_input_box() for render_input_box()'s unit tests and parse_form() when writing render_form()'s unit tests. When you modify render_input_box() you also modify parse_input_box(). This will be necessary to get render_input_box()'s unit tests to pass, but it will also have the side effect of modifying the parsing and testing of all functionality based on render_input_box()/parse_input_box().

Less straightforward methods include mock objects and modifying your interface to be more testable.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this