This topic is 4746 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

##### Share on other sites
Do you have any kind of synchronization ?

I'm actually surprized at your Linux result. 50% is a lot, especially for hyper threading. Are you sure the number is correct ?

Don't forget that HT is fake parallelism. You still only have a single CPU, so your two threads are going to fight for CPU/memory/IO access. Plus there is a cost in context switching threads.. especially in Windows, it seems :)

I do have a real dual core machine (dual AMD 2400). If you want i can test your program. The results should be a lot different.

Y.

##### Share on other sites
Windows doesn't support multiple CPUs - you need to get a multiple CPU version of the OS (NT only IIRC and I'm not sure it even supports HT). You pay a license per physical CPU, so a dual CPU in one chip counts as one. So, in Windows, your multithreaded app was running on one CPU so it was never going to be quicker.

Skizz

##### Share on other sites
Forgive me my ignorance of threading apps but I believe you have a busy loop here. The proggie will spent lots of time in that while loop, whereas it's probably not so critical to be checking those SimTime variables billions of times per second. Try adding an OS sleep call within the loop.

When you run it single threaded, you do the work of the 2 threads and nothing more, and now you have three threads. So in single threaded mode, it takes 1/17=0.0588 seconds per frame, that is 0.0588/2*3 seconds per frame for three threads that get even time. And 1/(0.0588/2*3)=11, the framerate you are getting in Windows when multithreaded..

On the other hand, this might be BS, but I'm convinced!

##### Share on other sites
Quote:
 Original post by SkizzWindows doesn't support multiple CPUs - you need to get a multiple CPU version of the OS (NT only IIRC and I'm not sure it even supports HT). You pay a license per physical CPU, so a dual CPU in one chip counts as one. So, in Windows, your multithreaded app was running on one CPU so it was never going to be quicker.

That's only true for a certain licence agreements with Windows Server on high-grade server machines, certainly not true for workstation editions.

NT, Win2K and XP Pro handle multiple CPUs and HT processors fine, right out of the box. XP Home, however, has been crippled and only supports one CPU (and HT is disabled).

##### Share on other sites
Ysaneya: I have no synchronization of any sort. Yes I'm aware of how Hyper-Threading works, and my framerate had increased by roughly 50%, though my performance in Linux was bad to begin with for some reason (partly ATI-related). Anyway, I have uploaded a new GDomin.zip. Press 'm' to toggle multithreading on or off (it's off by default). Note that you will see no difference unless you have a bunch of ships flying around, so select alot of ships and right-click somewhere on screen so they'll go there (don't forget to press Space first so you can select stuff). Let me know if it crashes; the MT conversion is not totally finished, but it should be stable. I can run the program forever and it never crashes. Also, ignore the workload statistics; they're wrong... the framerate is correct however.

Oh, I should probably mention the game will run much slower than it did with the first release; this is not due to multithreading, but due to improved collision avoidance. It's slower though, but there are still ways to speed it up.

Skizz: Windows XP Professional supports multiple CPUs; I can open the Task Manager and see the workload on each virtual CPU.

Anonymous: Interesting... very interesting. What you say makes alot of sense. I did it that weird way because I couldn't find a good way in Windows of waiting for a thread to finish; the available functions seemed weird. Linux just has a nice simple pthread_join function. Ah well.

##### Share on other sites
You can use Mutexes in Windows and WaitForSingleObject() to wait for a thread to finish.

##### Share on other sites
Thanks darookie. I haven't looked at mutexes yet, but I used WaitForSingleObject (and then switched to WaitForMultipleObjects which is better for me) and I no longer get a framerate decrease when being multithreaded, so the AP was correct. But I don't get a noticeable framerate increase either.

I've now updated GDomin.zip with the 'fixed' version, Ysaneya, so I'd be interested to here your results. I probably need some sort of synchronization, though I'm not sure how that works... I get using a mutex variable to prevent conflicting writes, but no memory is written to by both threads.

##### Share on other sites
Works. No crash.. but i don't see any framerate difference between ST and MT. However i do see that the CPU usage in the task manager changes a lot. Between 20 and 50% in ST, and 50 to 80% in MT.

Y.

##### Share on other sites
It seems very odd that you get 50% boost in Linux. When you do two sequentical things as multithreaded, the time will be 100% + "relative time spent for switching between threads". Hyperthreading will only make the "relative time spent for switching between threads" lower -- Not negative! (At least that's how I remember hyperthreading) The multithreaded version should always be slower, but not necessarily much.

One possibility is that in Linux you have several computation-intensive threads running on background and all threads are same priority. So by doubling your own app's active thread count you will make Linux spend more time executing your app instead of other background threads. But the better solution would be to do it single-threaded and just increase that thread's priority.

##### Share on other sites
Quote:
 Original post by CGameProgrammerI did it that weird way because I couldn't find a good way in Windows of waiting for a thread to finish; the available functions seemed weird.

You can synchronize on the thread handle. When the thread exits its handle will be signalled so you can call WaitForSingleObject() or WaitForMultipleObjects() on the actual Threads[] variables. Your windows implementation is extremely badly written. Assuming your other two threads are completely CPU bound and have nothing to wait on, your original thread will take up 33% of the process' CPU time. I'm surprised you even see a small speedup, I'd expect this code to run longer than the single threaded version.

##### Share on other sites
Quote:
 Original post by Anonymous PosterIt seems very odd that you get 50% boost in Linux. When you do two sequentical things as multithreaded, the time will be 100% + "relative time spent for switching between threads". Hyperthreading will only make the "relative time spent for switching between threads" lower -- Not negative! (At least that's how I remember hyperthreading) The multithreaded version should always be slower, but not necessarily much.

Well, I think it could give a (small) boost in performance. The thing about HT is that when one thread is stalling, waiting for memory, flushing the pipeline or whatever else, another can use the execution units. So in effect, you get a more effective use of the cpu. Since you get a lot of wasted idle time on a P4, HT can give you performance improvements. I doubt you'd get a 50% boost though. Sounds like there were something else holding back your singlethreaded version for some reason. Not sure if it speeds up the time for switching threads, but I'm pretty sure it allows on thread to work when another would otherwise stall the cpu.

However, HT isn't some magic performance-wizard. It's more of a hotfix for the horrible design of the Pentium 4... ;)
It only boosts performance because the cpu has so much wasted idle time normally.

Quote:
 One possibility is that in Linux you have several computation-intensive threads running on background and all threads are same priority. So by doubling your own app's active thread count you will make Linux spend more time executing your app instead of other background threads. But the better solution would be to do it single-threaded and just increase that thread's priority.

Not sure how it works on Linux, but does a process get more cpu time if it has more threads? I thought it just split the process' timeslot between the threads, without actually allocating more time.

##### Share on other sites
i have a P4 3Ghz Northwood with HT, 512MB ram and a Radeon 9500 Pro 1280, WinXP
SP2.

MT runs @ 12fps, ST @ 25fps.

1) id say your probably using you MT methods incorrectly, are you piplining or using parallellization (I think this method would suit AI)?

2) dont use Mutexes' unless you need cross PROCESS synchronization (hint: you don't) mutexes will use around 600 cycles to sync, critical sections are much more light weight.

3) you're wasting your main process thread, put it to sleep or do your rendering preperation in it?

4) there are guides to increasing performace on HT systems on intel's website, use them :)

http://www.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p01_abstract.htm
there are a few more, but intels site is so bad and im really tired.

Cheers,
-Dan

##### Share on other sites
Grrr, GDnet login is F'ing arund with me, that was me ^^^^^^^^^

##### Share on other sites
Hmm, even I can see that you need a sleep in that while loop in your main function.

That first AP post was quite insightful.

50% speed boost might seem high for the linux version, but I've always understood that Windows' threading isn't particularly grand.
I think that an extra thread in linux may actually get you more total CPU time for your program (equal amount per thread of equal priority perhaps).

##### Share on other sites
Ysaneya: I have a theory why it's doing that, which I'm working on fixing. I think the problem is the two threads probably have wildly different workloads. They both evaluate the same number of ships, but ships that aren't moving are evaluated much quicker than moving ships, so they could be doing 16 moving ships in one thread and 16 stationary ones in the other. I'm changing the code to fix this and should have an updated build Wednesday.

##### Share on other sites
Hey Ysaneya, I've updated GDomin.zip and I think this version will run much faster on your multi-CPU setup. Please try it out. With just hyper-threading, there doesn't seem to be a speed-up, but it's probably just not the type of thing that benefits from HT.

##### Share on other sites
You need to use a semaphore to syncronize the execution of your worker threads.
They should only run when there's work for them to do.

Now with hyper-threading, I would not expect a performance increase with two busy threads. Hyper-threading works best if the competing threads are waiting for data most of the time.

##### Share on other sites
CGameProgrammer, I think your current version of the game doesnt work right. An older version (maybe a week?) used to tell me the %'s for each catagory as well as run at a great speed. Now I get 0% for everything, except for Misc-AI which is -1% and the FPS are extremely low. Do you have a older version I could take a look at? Thanks.

##### Share on other sites
Neat idea, Magmai. So I create two threads and a semaphore (initialized to 0) initially, and the threads sleep until the semaphore is above 0, and then they decrement it and do their calculations. So when it's time for the calculations, the main thread just sets the semaphore to 2 and waits for it to be 0 again. That makes alot of sense.

EDIT: I started to code this, but I am confused on how to implement it in Windows. On initialization, I create the two threads, and I call:

ThreadSignal = CreateSemaphore( NULL, 0, 2, NULL );

So that creates a semaphore with an initial value of 0 (and max of 2). The body of the thread function looks like this:

while( 1 ){    WaitForSingleObject( ThreadSignal, INFINITE );//  Do AI}

So that should wait for ThreadSignal to be greater than zero, then decrement it and proceed with the AI. The main thread does this when it's time to run the AI:

ReleaseSemaphore( ThreadSignal, 2, NULL );

Setting the semaphore to 2, thus allowing the threads to run. But my problem is the main thread needs to wait for the semaphore to be 0, and I don't know how to do that. Also, it seems like it could be theoretically possible for a single thread to decrement the semaphore twice before the other thread ever gets a chance to look at it. So I don't know how the semaphore code is supposed to work.

Drew_Benton: I do not, but it's worrying that you're having problems... all statistics are reported correctly for me. I assume you downloaded it after I made my lastest post? Oh, also it would be useful to know what OS you're using.

[Edited by - CGameProgrammer on December 17, 2004 3:52:31 AM]

##### Share on other sites
A little improvement now. Framerate is 18 ST and 22 MT when the 3 groups fight. 60 when all's idle (vsync *is* disabled).

Y.

##### Share on other sites
Quote:
 Original post by YsaneyaA little improvement now. Framerate is 18 ST and 22 MT when the 3 groups fight. 60 when all's idle (vsync *is* disabled).Y.

That's a smaller change than I would have expected, though it's possible it's "correct". For me, the AI is about 60% of the workload (50% collision, 10% sim), so if it takes 10 seconds (bear with me) to render a frame, with 6 seconds for the AI, and that changed to 3 seconds, it would be 7 seconds per frame, or 10/7 = 42% improvement. That would make 18 FPS change to 25. Maybe the AI is a smaller portion of the workload for you.

##### Share on other sites
What you say would be true if you had two threads, having the same workload, working in paralell. I'm suspecting it's not the case.

What probably happens is that your first thread has a lot more work to do than your second one, because the first thread does a lot more things than A.I.: rendering, collisions, logic, input, etc.. while your second thread only does A.I. The second thread is probably sleeping a bit while the first one orders it to start the A.I. work.

Y.

##### Share on other sites
No no no, you misunderstand how the threads work. The main process spawns two threads and then sleeps until those threads have finished. The child threads do AI and nothing more.

The way the workload was split between the two threads has changed a number of times, but with the last build I posted, the way I do it is I have 32 ships evaluated each frame, but the threads do not necessarily do 16 ships each. In pseudocode, the thread function is this:
while(1){    EnterCriticalSection      Index = SharedData->Counter      SharedData->Counter ++    LeaveCriticalSection    Ship[Index].DoAI( );    if( Index == SharedData->LastShip )        break;}

Counter could initially be 0, for example, and LastShip could be 31. So each thread evaluates ships until a specified number of them have been evaluated. This code ensures the threads do roughly the same amount of work, even if that means one thread does 24 ships and the other does 8.

##### Share on other sites
But it doesn't matter if you have a parent thread and then two threads for AI, the problem remains.

Your two AI threads (now balanced) are working in parallel. But they're not doing that continuously, they are interrupted when the parent is doing its own stuff. So you have the two processors busy for a short amount of time (when doing the AI), but the rest of the time only one CPU is really working. So you cannot expect your framerate to double. Maybe you can check exactly how much time the A.I. threads are taking, and how much the parent thread is using.

Y.

##### Share on other sites

This topic is 4746 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628719
• Total Posts
2984388

• 25
• 11
• 10
• 16
• 14