A. Heap contention
After i finished my starfield implementation, i tried to put it in a separate thread in order to continue the rendering in paralell. It worked well except that the framerate, although still very high (300 fps, it's only displaying a sky box after all), was.. unsmooth.
I tracked it to the lines which are temporarily allocating some buffers for the blurring of the starfield (an operation that is done up to 30 times in the starfield generation process). The simple fact that i was calling malloc with a high amount of bytes to allocate, was enough to "suspend" the rendering thread for a few hundred of milliseconds. So, a pause of a few hundred of milliseconds happening every few seconds or so, was causing this stuttering effect.
I decided to investigate the issue a bit until i found that the C-runtime malloc is using a single heap. Two threads doing a malloc will hence by synchronized by a mutex to avoid messing up the heap.
After some googling, i found a good library called Hoard, which is an improved memory allocator for multithreading (and which solves the heap contention problem). I tested it and it worked flawlessly (fixing my slowdowns), but it is under the GPL. So i tried an alternative, ptmalloc2, which is under the LGPL (much better). It was a bit tricky to compile it under MSVC, but in the end i got a DLL and a LIB to which i linked my engine, and the problems were gone too. As a benefit, ptmalloc2 is in average 4 to 5 times faster than the standard malloc, even in non-multithreaded applications!
B. Task scheduler
Windows thread scheduler is BAD. I cannot stress it enough. One of the biggest issues in my opinion is its lack of fine control about the priority of the threads. Ok, you've got the SetThreadPriority function, but if you tried it to control your threads, you know how bad it is.
The problem with SetThreadPriority is this one: given two threads X and Y, if you set X priority to NORMAL and Y priority to BELOW_NORMAL (one level under), you'll get approximately 90% of the CPU in X and only 10% of the CPU in Y.
Now, let's say you want 70% of your CPU in X and 30% in Y. How do you do it ? Answer: you can't with SetThreadPriority. There is no combination of flags that give you this kind of CPU balancing.
I've been aware of this problem for months, so in my engine i decided to implement my own thread scheduler. And it works surprizingly well!
The idea is the following one: i create a scheduler thread which contains in an array all the threads with their associated priorities. This scheduler has a loop which sequentially picks up a thread, resumes it, goes to sleep for the amount of time the thread should run, then awakens and pauses the thread, and jumps to the next thread to process. For two threads it looks like this:
My code is obviously generalized. It can work on any number of threads and any number of CPUs. I might publish an article about it some day, since i haven't found anything similar on the net last time i checked.
I use it that way:
CScheduler *scheduler = new CScheduler();
CThread *thread1 = new CThread();
CThread *thread2 = new CThread();
CThread *thread3 = new CThread();
Assuming a single CPU machine, thread1 will run for 10 milliseconds, then thread2 for 20 milliseconds, then thread3 for 3 milliseconds. This is equivalent of a CPU usage of 30% vs 60% vs 10%.
Now i will be working on planet textures generation.