Multithreading issues: Unit test fails sometimes

Started by
5 comments, last by Hodgman 9 years, 6 months ago
I'm at a loss here because I don't know how to debug this. Failure occurs on the line ASSERT_EQ at the bottom of the test case.

The test works most of the time, but has a 30% chance of producing the following error:

tests/src/test_StressTests.cpp:58: Failure
Value of: entity.getComponent<Position>()
  Actual: Position(1994, 1000)
Expected: Position(2000, 1000)
How is this possible? How can position.x be updated three times less than position.y? Sometimes it's also Position(1998, 1000) or Position(1996, 1000). The only thing I can observe is that position.y is never incorrect.



This is my test case:
#include <gmock/gmock.h>
#include <ontology/Ontology.hpp>
#include <math.h>

#define NAME Stress

using namespace Ontology;

// ----------------------------------------------------------------------------
// test fixture
// ----------------------------------------------------------------------------

struct Position : public Component
{
    Position(int x, int y) : x(x), y(y) {}
    unsigned int x, y;
};
inline bool operator==(const Position& lhs, const Position rhs)
{
    return lhs.x == rhs.x && lhs.y == rhs.y;
}


struct Movement : public System
{
    void initialise() override {}
    void processEntity(Entity& e) override
    {
        e.getComponent<Position>().x += 2;
        e.getComponent<Position>().y += 1;
        for(volatile int i = 0; i != 10000; ++i)
            sqrt(938.523);
    }
};

// ----------------------------------------------------------------------------
// tests
// ----------------------------------------------------------------------------

TEST(NAME, ThousandEntities)
{
    World world;
    world.getSystemManager()
        .addSystem<Movement>()
        .initialise()
        ;
    for(int i = 0; i != 1000; ++i)
        world.getEntityManager().createEntity("entity")
            .addComponent<Position>(0, 0)
            ;
    // udpate world 1000 times
    for(int i = 0; i != 1000; ++i)
        world.update();


    // all entities should have moved to 1000*[2,1] = [2000, 1000]
    for(auto& entity : world.getEntityManager().getEntityList())
    {
        ASSERT_EQ(Position(2000, 1000), entity.getComponent<Position>());
    }
}



Could someone take a look at the way I'm dispatching the worker threads and tell me if they spot something blatantly obvious? The relevant sections of code are as follows.

World class creates the thread pool as follows:

World::World() :
    m_IoService(),
    m_ThreadPool(),
    m_Work(m_IoService)
{
    // populate thread pool with as many threads as there are cores
    for(int i = 0; i != getNumberOfCores(); ++i)
        m_ThreadPool.create_thread(
            boost::bind(&boost::asio::io_service::run, &m_IoService)
        );
}
Where m_IoService, m_ThreadPool and m_Work are declared as:

    boost::asio::io_service         m_IoService;
    boost::thread_group             m_ThreadPool;
    boost::asio::io_service::work   m_Work;
World::update() calls SystemManager::update(). SystemManager holds a vector of System object pointers.

void SystemManager::update()
{
    for(const auto& system : m_ExecutionList)
        system->update();
}
System::update() pushes as many System::processEntity calls into the worker queue as there are entities (in this case 1000). System::processEntity is pure virtual.

        this->world->getIoService().post(
                boost::bind(&System::processEntity, this, boost::ref(it))
        );
        // wait for all entities to be processed
        this->world->getIoService().poll();
I have a hunch that the very last line, this->world->getIoService().poll(), only waits until the queue is empty instead of waiting for all worker threads to be idle (that's actually the behaviour I intend but I couldn't find any way to do this). This being the case, the test case could finish asserting all of the expected entity positions before the last few entities' positions are actually processed.

It doesn't explain why the X component of Position is screwed while the Y component is always fine.

[EDIT] In case someone wishes to obtain the project and run it themselves:
https://www.github.com/TheComet93/ontology
git checkout optimise
mkdir build && cd build
cmake -DBUILD_TESTS=ON ..
"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty
Advertisement

It doesn't explain why the X component of Position is screwed while the Y component is always fine.

I have no experience with boost - so, not sure i understand what it is expected to do here.

Anyway, if at any time multiple threads can work at the same data then what you are experiencing is exactly what one would expect.

In other words: "It doesn't explain why the X component of Position is screwed while the Y component is always fine." - No, that (unguarded overlapping data usage) explains it perfectly.

PS. testing MT parts of code only counts for the "basic-sanity-check" case - they are fundamentally utterly useless to check the actual validity of it beyond that.

Take a paper and pencil and revalidate your MT logic (in the end, that is the only way to write MT code) - you have a race condition (assuming you are proficient in MT land - then read relevant boost documentation to find where your assumptions clash with what it says).
Apparently poll returns "The number of handlers that were executed" - you should probably assert that the return value is 1000 as part of your test.

Looks like there's a race condition where the same entity is updated simultaneously by different threads. That could cause some updates to be lost. The fact that you're always witnessing 'x' updates be lost is probably down to the horribly random nature of MT bugs, combine with your hardware.


Could someone take a look at the way I'm dispatching the worker threads and tell me if they spot something blatantly obvious?

I can tell you what I don't see, which might be blatantly obvious.

There are no synchronization points, no locks around containers and/or use of lockless thread safe containers, there are functions that are not reentrant safe, and so on.

There is no locking, no verification that another thread is not in the middle of using an object.

What I suspect is happening is that your threads are running they have a load/modify/store pattern, and some of the built-in processor synchronization functionality is kicking in. So let's say a 4 processor system with 4 simultaneous threads. it loads the first time, hits the load instruction, and none of the caches are dirty so everyone gets 0. All of them add a number then store the result. Multiple writes are not a problem, they all overwrite the same line. Next, it goes to load x. They're in the same cache line, and the line is now dirty, so they all wait until synchronization is complete. The first one gets it and immediately updates the value, the second one waits for the cache line and gets the updated value in its load, loading the updated y and writing it. The next processor still waiting for the cache line gets the twice updated y in the load, updates and writes, then the final processor gets the triply updated y and updates it a fourth time. If you locked the data between read and write and used proper memory barriers

There is something else I notice in your code, not immediately obvious but a concern.

I see the use of 'volatile' in various places. Volatile in C++ is not necessarily a memory barrier, does not necessarily stop relocation and reordering of the code, and is not necessarily a portable thing. Some compilers will do some of those things, for example, most versions of Visual Studio when you set the parameter /volatile:ms on the compiler and it is targeting x86 or x86-64, volatile becomes a memory lock on the specific object for the duration of the single operation. But the behavior can be changed by compiler options, and change between versions of the compiler, and is not portable with other compilers.

Coming up with a consistent, easy to use, reliable locking system across an application requires a bunch of careful design. You'll typically need a series of priority-ordered locks, a system that locks and unlocks in priority order and reverse priority order to prevent deadlocks when operations involve multiple locks, and often want to use carefully crafted C++ objects to help ensure locks don't remain after exceptions or other bad behavior.

Apparently poll returns "The number of handlers that were executed" - you should probably assert that the return value is 1000 as part of your test.

I just tested this and the value returned by poll() is never 1000. Not even close to 1000, it's hovering somewhere around 100-300.

Is there a way to suspend the main thread until all 1000 entities have been processed? This is the core of my problem and would solve everything.

    for(entity : m_EntityList)
        this->world->getIoService().post(
                boost::bind(&System::processEntity, this, boost::ref(entity))
        );
    // this here is not doing what I thought it did. How can I wait until all entities are processed here?
    this->world->getIoService().poll();

There are no synchronization points, no locks around containers and/or use of lockless thread safe containers, there are functions that are not reentrant safe, and so on.

The design I'm aiming for should fundamentally not require any of that. The only time I will need to synchronize objects is in the event of entities accessing other entities. What I'm trying to do is something like this:

System1 update
    --> process all of System1's entities asynchronously
    --> wait for processing to finish
System2 update
    --> process all of System2's entities asynchronously
    --> wait for processing to finish
etc.
"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty


I just tested this and the value returned by poll() is never 1000. Not even close to 1000, it's hovering somewhere around 100-300.

Is there a way to suspend the main thread until all 1000 entities have been processed? This is the core of my problem and would solve everything.

The trouble is that you're using this ASIO hack thread pool instead of a real one, so there's no way to wait for anything. You're doing all of your work in a "completion handler", the thing that would normally either only run in one thread (single-threaded servers are fairly popular nowadays) or have some kind of synchronization mechanism inside. If you look at something like std::async, it returns a std::future, which can be waited upon. It's a neat little trick to make a thread pool out of this library, but that's not was it was intended for.

Here's my guess: The poll command most likely takes some/all completion handlers out of the queue and executes them, and keeps doing so until it is empty. Meanwhile, your run() commands in other threads are trying to execute both actual async io tasks (of which you have none) and completion handlers. The poll command does not know whether the handlers that are not in the queue have actually been executed, and it's not really the point of ASIO for it to know that. Your program is supposed to know what's finished by the fact that the "completion handler" itself was run, and you programmed it to do something appropriate in response to that.

Yeah you'd probably be better off using an existing library that's built around this type of threading, such as TBB, PPL, OpenMP

e.g. in OpenMP the code to make your entity updating loop run across multiple cores is as simple as:


#pragma omp parallel for
for( int i=0; i!=1000; ++i )
  ProcessEntity(i);

In my framework, it would look something like:


class MySystem
{
  struct Task { TaskSection update; }; // a TaskSection is a block of code that is run by multiple threads.

  Task* PrepareFrame(Scope& frameData)
  {
    return eiNew(frameData) Task();
  }

  void ExecuteFrame(Scope& frameData, Task* task)
  {
    eiBeginSectionTask( task->update );
    {
      int begin, end;
      eiDistributeTask( 1000, begin, end );//grab a unique begin/end range for this thread, out of the 0-1000 items
      for( int i=begin; i!=end; ++i )
        ProcessEntity(i);
    }
    eiEndSection( task->update );

    eiWaitForSection( task->update );//don't continue until all threads have finished the above section
  }
}

This topic is closed to new replies.

Advertisement