Multithreaded slower (despite Hyper-Threading)!

Started by
26 comments, last by CGameProgrammer 19 years, 4 months ago
Quote:Original post by CGameProgrammer
I did it that weird way because I couldn't find a good way in Windows of waiting for a thread to finish; the available functions seemed weird.

You can synchronize on the thread handle. When the thread exits its handle will be signalled so you can call WaitForSingleObject() or WaitForMultipleObjects() on the actual Threads[] variables. Your windows implementation is extremely badly written. Assuming your other two threads are completely CPU bound and have nothing to wait on, your original thread will take up 33% of the process' CPU time. I'm surprised you even see a small speedup, I'd expect this code to run longer than the single threaded version.
Advertisement
Quote:Original post by Anonymous Poster
It seems very odd that you get 50% boost in Linux. When you do two sequentical things as multithreaded, the time will be 100% + "relative time spent for switching between threads". Hyperthreading will only make the "relative time spent for switching between threads" lower -- Not negative! (At least that's how I remember hyperthreading) The multithreaded version should always be slower, but not necessarily much.

Well, I think it could give a (small) boost in performance. The thing about HT is that when one thread is stalling, waiting for memory, flushing the pipeline or whatever else, another can use the execution units. So in effect, you get a more effective use of the cpu. Since you get a lot of wasted idle time on a P4, HT can give you performance improvements. I doubt you'd get a 50% boost though. Sounds like there were something else holding back your singlethreaded version for some reason. Not sure if it speeds up the time for switching threads, but I'm pretty sure it allows on thread to work when another would otherwise stall the cpu.

However, HT isn't some magic performance-wizard. It's more of a hotfix for the horrible design of the Pentium 4... ;)
It only boosts performance because the cpu has so much wasted idle time normally.

Quote:
One possibility is that in Linux you have several computation-intensive threads running on background and all threads are same priority. So by doubling your own app's active thread count you will make Linux spend more time executing your app instead of other background threads. But the better solution would be to do it single-threaded and just increase that thread's priority.

Not sure how it works on Linux, but does a process get more cpu time if it has more threads? I thought it just split the process' timeslot between the threads, without actually allocating more time.
i have a P4 3Ghz Northwood with HT, 512MB ram and a Radeon 9500 Pro 1280, WinXP
SP2.

MT runs @ 12fps, ST @ 25fps.

1) id say your probably using you MT methods incorrectly, are you piplining or using parallellization (I think this method would suit AI)?

2) dont use Mutexes' unless you need cross PROCESS synchronization (hint: you don't) mutexes will use around 600 cycles to sync, critical sections are much more light weight.

3) you're wasting your main process thread, put it to sleep or do your rendering preperation in it?

4) there are guides to increasing performace on HT systems on intel's website, use them :)

http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p01_abstract.htm
there are a few more, but intels site is so bad and im really tired.

Cheers,
-Dan
Grrr, GDnet login is F'ing arund with me, that was me ^^^^^^^^^
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website
Hmm, even I can see that you need a sleep in that while loop in your main function.

That first AP post was quite insightful.

50% speed boost might seem high for the linux version, but I've always understood that Windows' threading isn't particularly grand.
I think that an extra thread in linux may actually get you more total CPU time for your program (equal amount per thread of equal priority perhaps).
Whereas under windows your single thread already took most of the processor time, and your second one meant that you had to divide it up.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Ysaneya: I have a theory why it's doing that, which I'm working on fixing. I think the problem is the two threads probably have wildly different workloads. They both evaluate the same number of ships, but ships that aren't moving are evaluated much quicker than moving ships, so they could be doing 16 moving ships in one thread and 16 stationary ones in the other. I'm changing the code to fix this and should have an updated build Wednesday.
~CGameProgrammer( );Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.
Hey Ysaneya, I've updated GDomin.zip and I think this version will run much faster on your multi-CPU setup. Please try it out. With just hyper-threading, there doesn't seem to be a speed-up, but it's probably just not the type of thing that benefits from HT.
~CGameProgrammer( );Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.
You need to use a semaphore to syncronize the execution of your worker threads.
They should only run when there's work for them to do.

Creating and destroying threads is expensive. I'd create a thread pool (with 2 threads) and bump the semphore by 2 when it's time to do the physics calculations. Then wait for the semaphore count to return to 0; each worker thread should reduce the count when it's done with its calclations. The main thread could make the first work split and configure some data structure with the part of the world each thread needs to handle. This ensures everything runs smoothly. You have to make special OS calls that put your threads to sleep until they are told it's time to wake up. The worker threads need to sleep until the semaphore is incremented by the main thread and the main thread needs to sleep until the count returns to 0. You might need to use two semaphores to make that happen; one to queue the the worker threads and another to tell the main thread when they are done.

Now with hyper-threading, I would not expect a performance increase with two busy threads. Hyper-threading works best if the competing threads are waiting for data most of the time.
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
CGameProgrammer, I think your current version of the game doesnt work right. An older version (maybe a week?) used to tell me the %'s for each catagory as well as run at a great speed. Now I get 0% for everything, except for Misc-AI which is -1% and the FPS are extremely low. Do you have a older version I could take a look at? Thanks.
Neat idea, Magmai. So I create two threads and a semaphore (initialized to 0) initially, and the threads sleep until the semaphore is above 0, and then they decrement it and do their calculations. So when it's time for the calculations, the main thread just sets the semaphore to 2 and waits for it to be 0 again. That makes alot of sense.

EDIT: I started to code this, but I am confused on how to implement it in Windows. On initialization, I create the two threads, and I call:

ThreadSignal = CreateSemaphore( NULL, 0, 2, NULL );


So that creates a semaphore with an initial value of 0 (and max of 2). The body of the thread function looks like this:

while( 1 ){    WaitForSingleObject( ThreadSignal, INFINITE );//  Do AI}


So that should wait for ThreadSignal to be greater than zero, then decrement it and proceed with the AI. The main thread does this when it's time to run the AI:

ReleaseSemaphore( ThreadSignal, 2, NULL );


Setting the semaphore to 2, thus allowing the threads to run. But my problem is the main thread needs to wait for the semaphore to be 0, and I don't know how to do that. Also, it seems like it could be theoretically possible for a single thread to decrement the semaphore twice before the other thread ever gets a chance to look at it. So I don't know how the semaphore code is supposed to work.

Drew_Benton: I do not, but it's worrying that you're having problems... all statistics are reported correctly for me. I assume you downloaded it after I made my lastest post? Oh, also it would be useful to know what OS you're using.

[Edited by - CGameProgrammer on December 17, 2004 3:52:31 AM]
~CGameProgrammer( );Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.

This topic is closed to new replies.

Advertisement