Jump to content

  • Log In with Google      Sign In   
  • Create Account

JoeJ

Member Since 30 Aug 2006
Online Last Active Today, 06:25 AM

Posts I've Made

In Topic: Is there any reason to prefer procedural programming over OOP

19 June 2016 - 02:33 PM

I'm constantly moving away from OOP over the years. The idea of inheritance never made much sense to me - it just complicates things and forces you to make decissions about software design. To me that's just blah blah and i prefer to spend this time on solving real problems.

 

So i ended up using C with classes style, but i moved away from that too, mainly because of this:

 

Class member functions hide some of the data they use because you don't know what member variables they access without looking at the implementation.

This way it's hard to see data complexity, which is important to optimizing / refactoring.

Often i ended up making member functions static, forcing me to add all data to the function parameters - just to see how many there are (ALWAYS more than you would expect).

 

Next i realized that static member functions can be used from anywhere, how practical.

So why did i still using classes?

My answer was simply: To group related functions together by 'topic', so i cand find them somehow.

 

But there is something better to do this: Namespaces.

With namespaces it's possible to group stuff in hirarchies without any restrictions or problems known from inheritance.

 

Today i create a classes very rarely, using them only as an interface to a large system which is implemented mostly procedural under the hood.

But i still use a lot of small structs with member functions for trivial functionality like indexing arrays or un/packing.


In Topic: Is C++11's library a magic bullet for ALL multicore/multithread programming?

17 June 2016 - 11:37 PM

Let's say we have 3 tasks to do:

Software occlusion culling (front to back dependency -> serial algorithm -> not good to parallelize)

Animating 100 characters

Physics simulation (100 islands of rigid bodies in contact)

 

The easy route would be to use one thread per task - maybe suboptiomal, but good enough if your speedup is about the number of cores.

The hard and error prone route would be trying to parallelize the occlusion culling - ending up with a small speedup for a lot of work and debugging time.

 

The practical route would be: One thread for occlision culling, the others are free to parallelize a job system processing all characters and after that all physics islands.

If a single character would be very fast, we would coose to process e.g. 4 characters per job to hide the synchronization costs.

 

std::async and other high level functionality can be used to achieve this, my approach using atomics is more the low level kind.

Looking at http://en.cppreference.com/w/cpp/thread/async i tend to think: There is no control on the creation of threads (which is expensive), there is no guarantee multi threading is used at all. So i'll probably never use it, but probably it's just a matter of personal preference.


In Topic: Is C++11's library a magic bullet for ALL multicore/multithread programming?

17 June 2016 - 01:18 PM

I've had a similar thread a while back: http://www.gamedev.net/topic/679252-multithreading-c11-vs-openmp/

 

The little Job system i made there may have some advantages over your current approach:

You divide Stuff in four 'groups' - but what if one group finishes early and another takes very long? The Job system balances itself here (if the jobs are small enough).

Also, if those groups do very different things, they need to access differnt memory. With the Job system it's easy to parallelize a single workload to utilize cache better.


In Topic: Multithreading - C++11 vs. OpenMP

11 June 2016 - 11:43 AM

It's not a binary tree i'm using, i just used it as a worst case example.

 

In case someone is interested, here's the correct code i'm using now. After two days using it i'm sure this time :)

It serves well enough as a minimal job system and i was able to speed up my application by the number for cores, also stuff where i would have expected bandwidth limits.

 

Usage example is:

int num_threads = std::thread::hardware_concurrency();
num_threads = min (64, max (4, num_threads));
std::thread threads[64];
 
ThreadsForLevelLists workList;
workList.AddLevel (0, 4); // first execute nodes 0-3
workList.AddLevel (4, 6); // then nodes 4-6, ensuring the previous level is done
//...
 
workList.SetJobFunction (ComputeBoundingBox, this);
for (int tID=1; tID<num_threads; tID++) threads[tID] = std::thread (ThreadsForLevelLists::sProcessDependent, &workList);
workList_LevelCounters_TopDown.ProcessDependent();
for (int tID=1; tID<num_threads; tID++) threads[tID].join();

 

Instead of dividing each levels work by the number of cores, it uses work stealing of small batches.

Advantage is that this compensates different runtimes of the job function.

 

 

 

struct ThreadsForLevelLists
{
    // Calls Threads in order:
    // for (int level=0; level<numLevels; level++)
    // {
    //     for (int i=0; i<numLevels; o++) jobFunction (levelStartIndex + i);
    //     barrier in case of ProcessDependent() to ensure previous level has been completed
    // }

    enum
    {
        MAX_LEVELS = 32,
    };

    int firstIteration[MAX_LEVELS];
    unsigned int firstIndex[MAX_LEVELS+1];

    int numLevels;
    void (*jobFunction)(const int, void*);
    void* data;

    std::atomic<int> workIndex;
    std::atomic<int> workDone;
    int iterations;

    ThreadsForLevelLists ()
    {
        numLevels = 0;
        firstIndex[0] = 0;
        workIndex = 0;
        workDone = 0;
    }

    void Reset ()
    {
        workIndex = 0;
        workDone = 0;
    }

    void SetJobFunction (void (*jobFunction)(const int, void*), void *data, int iterations = 64)
    {
        Reset ();
        this->jobFunction = jobFunction;
        this->data = data;
        this->iterations = iterations;
    }

    void AddLevel (const int levelStartIndex, const unsigned int size)
    {
assert (numLevels < MAX_LEVELS);
        firstIteration[numLevels] = levelStartIndex;
        firstIndex[numLevels+1] = firstIndex[numLevels] + size;
        numLevels++;
    }

    void ProcessDependent ()
    {
        const unsigned int wEnd = firstIndex[numLevels];
        int level = 0;
        int levelReady = 0;

        for (;;)
        {
            int wI = workIndex.fetch_add (iterations);        
            if (wI >= wEnd) break;

            int wTarget = min (wI + iterations, wEnd);
            while (wI != wTarget)
            {
                while (wI >= firstIndex[level+1]) level++;

                int wMax = min (wTarget, firstIndex[level+1]);            
                int numProcessed = wMax - wI;

                for (;;)
                {
                    int dI = workDone.load();        
                    while (dI >= firstIndex[levelReady+1]) levelReady++;
                    if (levelReady >= level) break;
                    std::this_thread::yield();

                    // todo: optionally store a pointer to another ThreadsForLevelLists and process it instead of yielding
                }

                int indexOffset = firstIteration[level] - firstIndex[level];
                for (; wI < wMax; wI++)
                    jobFunction (indexOffset + wI, data);

                workDone.fetch_add (numProcessed);
            }
        }
    }

    void ProcessIndependent ()
    {
        const unsigned int wEnd = firstIndex[numLevels];
        int level = 0;

        for (;;)
        {
            int wI = workIndex.fetch_add (iterations);        
            if (wI >= wEnd) break;

            int wTarget = min (wI + iterations, wEnd);
            while (wI != wTarget)
            {
                while (wI >= firstIndex[level+1]) level++;

                int wMax = min (wTarget, firstIndex[level+1]);            
                
                int indexOffset = firstIteration[level] - firstIndex[level];
                for (; wI < wMax; wI++)
                    jobFunction (indexOffset + wI, data);
            }
        }
    }

    static void sProcessDependent (ThreadsForLevelLists *ptr) // todo: move Process() to cpp file to avoid the need for a static function
    {
        ptr->ProcessDependent();
    }
    static void sProcessIndependent (ThreadsForLevelLists *ptr) // todo: move Process() to cpp file to avoid the need for a static function
    {
        ptr->ProcessIndependent();
    }
};

In Topic: Multithreading - C++11 vs. OpenMP

09 June 2016 - 10:59 AM

Hehe, already discovered that my code does not work... always :)

 

But i know that problems from GPGPU and usually after a first day of headache things get better and 'thinking multi threaded' becomes easier.

 

The loop i'm talking about is basically a breadth-first tree traversal with 11 tree levels - total runtime is 27ms.

First iteration has 1 nodes, second 2, forth 8 and so forth - so i process the first levels with a single thread while the others are wasted with waiting.

Bottom tree level has 60000 nodes so multi threading is a big win at the end (8ms).

 

The waiting could be avoided using a more advanced job system so they can grab other independent work.

Using this job system everywhere should limit the risk of an undiscovered bug and bad belly feeling to zero after time.


PARTNERS