[C++] Threads and CPU usage

Started by
24 comments, last by Super Llama 13 years, 1 month ago
Hey everyone, sorry for having three topics on the first page of the same forum section, but they are very separate and arrived at different times. My new situation is this-- I just modified my fledgling game engine so that logic, rendering, and input handling are each in a different thread (input handling is part of the main process, the other two use CreateThread). My reason for this is mainly that my engine so far does not rely at all on the order that these three systems are executed in, and the three of them seem to cooperate very well asynchronously, so I thought I might as well go ahead and multithread it before i end up relying on synchronous systems. It works very well and is running lightning fast-- but that's the problem. According to task manager, it's running impossibly fast, using 0 seconds of cpu time after being open for 10 minutes. This is definitely, definitely, NOT possible, so I'm suspecting that I initialized my threads in a way that makes it so taskmgr can't properly judge the cpu usage of the collective threads.

Is it possible to change this so that each thread's cpu usage contributes to that of the actual process? Is it possibly going into rundll32 or something that I didn't notice? Is it even a good idea to go multithreaded at all? I mean, it looks like the pros outweigh the cons given that my engine was asynchronous anyway, and the fact that most people have multicore processors now (I myself have a quad core).

Here's my CreateThread code, it's pretty simple:

DWORD RTI, TTI;
CreateThread(NULL,0,&RenderThread,NULL,0,&RTI);
CreateThread(NULL,0,&ThinkThread,NULL,0,&TTI);

RenderThread and ThinkThread are, of course, my thread functions, which each contain a while loop for their task and a Sleep call for 15 ms.

EDIT:
It appears to only happen some of the time-- I just launched it again and it looked normal, though it was using 4 cpu instead of 2-- I'm assuming that's just the threading overhead and is worth it when the engine has a lot of jobs to do. I'm definitely going to make multithreading an option that users can enable or disable, though.

EDIT 2:
Wow, I really need to get my facts straight. Apparently the cpu usage is identical whether threading is turned on or off. Discussion on whether threading is worth it or not is still welcome, and perhaps someone can explain my initial issue though it seems to have fixed itself.
It's a sofa! It's a camel! No! It's Super Llama!
Advertisement
It is very possible that what you are seeing is a problem of sampling frequency.

Task manager is a sample based tool, meaning that at a given interval it will query the OS for information about the running processes. If your process is sleep()'ing during the sample time then it will assume that it was sleep()ing for the whole sample duration.

A sleep() in windows will cause a thread to go into a wait state. Threads in a wait state aren't consuming resources. Only threads in the running state show load in task manager and then only if they are sampled while in the running state.

In general, task manager is a really poor tool for profiling development. It will generally only catch infinite loops or excessive memory leakage.
I see, thanks for the insight. I ran a code profiler (called very sleepy) and I'm very happy with the results-- it seems that my main users of cpu are ID3DXFont and std::vector, rofl. I actually tried to avoid vectors becuase I knew they were really slow, but I assumed if I only used them for functions that don't run every tick then it'd be fine... but then I realized that the offset operators called the iterator functions and it appears to be slowing down the code regardless. I'm probably just going to replace my vectors a custom dynamic array class or something. Obviously the cpu use is very trivial on my own system, but a friend with a fairly modern pc said it was using 20% of his cpu, so I'm kind of making sure that every last bottleneck is eliminated to the best of my ability.
It's a sofa! It's a camel! No! It's Super Llama!

I actually tried to avoid vectors becuase I knew they were really slow


99% says you didn't disable iterator debugging.
In case the problem lies in remaining 1%, then your custom implementation will not do better. Only changing the algorithm will improve running times.
I would be 100% surprised if your custom dynamic array class comes even close to the performance of std::vector<>, used properly. Most of the time when people say "vectors are slow", they are actually saying "I don't know how to use vectors".

To eliminate bottlenecks, you must first understand them. I can guarantee you that either std::vector<> is being used incorrectly, or your bottleneck is that you are using the wrong data structure/algorithm and replacing it with a custom dynamic array will not help.



As Antheus mentioned, just passing the right set of flags to the compiler can result in a massive performance boost.
99% says you didn't disable iterator debugging.

this term is new to me, lol.

I just disabled it and re-ran the profiler, and the vector iterators are still using more cycles than even my most complicated functions.
It's a sofa! It's a camel! No! It's Super Llama!
First off, std::vector IS an array. When compiled properly, with the appropriate flags set, it will have no more overhead than AN ARRAY. Iterators, in release mode (again with the appropriate flags) will devolve into pointers, and as such shouldn't be costing you much of anything.

The fact that you're getting "iterators" in your output suggests you're doing something naughty. Like trying to profile debug mode. Of course, without seeing your usage we can't even tell you what you're doing wrong, if anything.

Furthermore, blindly profiling will not a bottleneck find. As an example, if you're iterating over arrays constantly then you'll find that code to be at the top of the profiling charts, but it may not be your BOTTLENECK. It just might be what is doing the majority of the work.

So a few things:
1. As you're clearly inexperienced I would suggest posting some code.
2. Unless you actually HAVE a bottleneck, there's no reason to profile yet. You're just sampling noise.
3. Sleep will surrender your threads remaining execution time to the operating sytsem. This does not guarantee that other threads of yours will run. Do note that your threads can be scheduled in any order, or not at all depending on system load. Don't expect to see 100% CPU usage, and don't expect (even if you change your code to not have sleeps) that the OS will schedule your threads on the same cores even. The OS tries to keep threads localized to individual cores, but if another thread is running on that core... guess what?
4. Task manager can't tell you anything. Its memory usage is commit amounts (which does not reflect actual usage) and CPU usage are best guesses based on samples.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.


Furthermore, blindly profiling will not a bottleneck find. As an example, if you're iterating over arrays constantly then you'll find that code to be at the top of the profiling charts, but it may not be your BOTTLENECK. It just might be what is doing the majority of the work.
I see. Then how exactly would I go about finding the cause of the high cpu usage on said friend's computer?


1. As you're clearly inexperienced I would suggest posting some code.
Yeah, it's funny how everyone I know thinks I'm somehow extremely good at programming. Obviously they've never been to this forum. I'm quite experienced with programming in general, but I have PLENTY to learn as far as how the deep insides of C++ work. My code is quite far from simple though due to the basic structure of the engine and my coding style, so it'd probably have to be modified a lot before I could post it here. I have a slight problem with obsessively searching for optimization methods when I'm not really sure what kinds of low-level things need to be optimized...


2. Unless you actually HAVE a bottleneck, there's no reason to profile yet. You're just sampling noise.
Indeed. I'm just a little worried that my built-from-scratch basic scene graph and logic engine with no more than 10 objects in the test scene is using 20% cpu on a not-quite-terrible computer, though no actual performance issues have been spotted yet.


3. Sleep will surrender your threads remaining execution time to the operating sytsem. This does not guarantee that other threads of yours will run. Do note that your threads can be scheduled in any order, or not at all depending on system load. Don't expect to see 100% CPU usage, and don't expect (even if you change your code to not have sleeps) that the OS will schedule your threads on the same cores even. The OS tries to keep threads localized to individual cores, but if another thread is running on that core... guess what?
I'm not quite sure what you're saying here, but isn't the benefit of multithreading the fact that it gives clear separation to the OS so it can throw the threads around as needed
It's a sofa! It's a camel! No! It's Super Llama!
[font=arial, verdana, tahoma, sans-serif][size=2]
[font=arial, verdana, tahoma, sans-serif][size=2]
[font=arial, verdana, tahoma, sans-serif][size=2]

[quote name='Washu' timestamp='1299614143' post='4783250']
Furthermore, blindly profiling will not a bottleneck find. As an example, if you're iterating over arrays constantly then you'll find that code to be at the top of the profiling charts, but it may not be your BOTTLENECK. It just might be what is doing the majority of the work.
I see. Then how exactly would I go about finding the cause of the high cpu usage on said friend's computer?[/quote]
Most improvements will be algorithmic. If you're doing things like looping through an array frequently to find an element, why not sort the array, or use a sorted/hash based container? Similarly, what are you doing in the loop? Does it need to be looped over, or just some subset? Are you saving your results and reusing them, or recalculating each time?[/font]


2. Unless you actually HAVE a bottleneck, there's no reason to profile yet. You're just sampling noise.
Indeed. I'm just a little worried that my built-from-scratch basic scene graph and logic engine with no more than 10 objects in the test scene is using 20% cpu on a not-quite-terrible computer, though no actual performance issues have been spotted yet.[/quote][/font]
20% CPU use for 10 objects suggests you're doing something wrong. Even just a brute force submit to the GPU shouldn't cost that much.[/font]

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Turns out it was my input handling loop-- I was using PeekMessage every frame instead of GetMessage only when input was needed. After separating things into threads, I made it so the main thread which was responsible for input handling used GetMessage, which of course sleeps until windows gives it a message. Brought it from 20 cpu down to 8, which I'm sure is a lot better. Also, it's not just a quick render or anything, it has a fairly complicated scene graph system which can also handle logic.
It's a sofa! It's a camel! No! It's Super Llama!

This topic is closed to new replies.

Advertisement