Sign in to follow this  
AndyPandyV2

Inaccuracy of windows thread waits

Recommended Posts

AndyPandyV2    298
Is anyone aware of a method to get greater then the 15 millisecond or so accuracy used by the windows thread wait functions( such as Sleep(), WaitForSingleObject()SignalObjectAndWait())? MSDN claims timeBeginPeriod() should increase the accuracy, but even setting it to 1 does not help. I really need far greater accuracy for what I am doing and am rather amazed at how inaccurate these functions are.. Thanks

Share this post


Link to post
Share on other sites
Shannon Barber    1681
You can call timeBeginPeriod (and timeEndPeriod) to lower the slicing time to roughly 1ms. Out-of-the-box many PCs are configured to slice at 16ms intervals today (in the past it was 10ms!).

Note that Sleep() does not make any timing guarantees other than your thread will be serviced again sometime 'later'.

To do better than that you would have to hook into a hardware timer and I don't think there's an easy way to do that.

Share this post


Link to post
Share on other sites
frob    44962
Quote:
Original post by AndyPandyV2
I really need far greater accuracy for what I am doing and am rather amazed at how inaccurate these functions are.
The OS is not designed for hard Real-Time computing requirements.

The OS works on scheduled time slices, and yielding the rest of your time slice sticks you at the back of the queue within your priority level. You have no guarantees about when (if ever) the OS will return to you.

What exactly are you trying to do that requires being woken up at more frequent intervals?

There may be other options available, such as Sleep(0) mentioned above so you wait until others in the slice have all yielded their time, using a callback from some other system or driver, operating at the kernel or device driver level, or otherwise digging your application into the OS internals. Deciding the best course of action depends on your eventual goal.

Share this post


Link to post
Share on other sites
AndyPandyV2    298
I was hoping to use these wait functions so that the primary threads of the game could surrender any leftover time in the frame to other secondary threads. And while setting timeBeginPeriod seems to slightly increase the accuracy, there are still instances where it takes up to 45+ milliseconds for the Wait to return which is causing very ugly stuttering. I guess I'll just have to waste those extra cycles spinning and doing nothing...

Share this post


Link to post
Share on other sites
iMalc    2466
Quote:
Original post by AndyPandyV2
I was hoping to use these wait functions so that the primary threads of the game could surrender any leftover time in the frame to other secondary threads. And while setting timeBeginPeriod seems to slightly increase the accuracy, there are still instances where it takes up to 45+ milliseconds for the Wait to return which is causing very ugly stuttering. I guess I'll just have to waste those extra cycles spinning and doing nothing...
If other games can manage without going to this trouble, then your game should be able to as well. If you're getting inconsistencies in your frame rate then you're probably hitting the disk during play, even if it's just paging memory.
Spinning until the desired time elapses probably wont help. Hogging the CPU can make your app less smooth, as the task scheduler compensates somewhat I believe.
Any chance you can profile your code?

Share this post


Link to post
Share on other sites
AndyPandyV2    298
I've profiled it, it is the Wait calls, they sometimes take 10x longer then they should. It makes sense now that I know about the OS scheduling rather lengthy time slices for different threads, likely there just aren't any slices available when the wait runs out so it has to wait until the next round of slices. I was assuming the OS would return control when the time was up but it is obviously not consistent so I'll just have to rework it to function differently.

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by AndyPandyV2
I was hoping to use these wait functions so that the primary threads of the game could surrender any leftover time in the frame to other secondary threads.


Wrong reason. Concurrency is all about *not* waiting for anything, but simply going on and doing useful work.

If you need to wait in this manner, then cooperative threads or single-threaded approach will be much better.


Multi-threaded programming is annoying because of such details. One solution is to simply not wait:
while (true) {
if (currentTime < nextFrameTime) {
doSomethingUseful();
} else {
renderFrame();
}
};
The other, when using worker threads:

while (scheduler->running() {
Task * task = NULL;
if (this->isRenderThread()) {
task = scheduler->getRenderTask();
} else {
task = scheduler->getNextTask();
}
if (task) {
task->run();
} else {
scheduler->waitForTask();
}
}
You can then just spawn a number of these workers, and they'll wait for work to come in. Rendering however will always be performed in same, main thread. Obviously, more generic approaches exist for this.

WaitForSingleObject() would be inside waitForTask().

Quote:
SetThreadPriority


This is generally wrong solution to the problem. There are uses for priorities, but this isn't one of them.

Quote:
I was assuming the OS would return control when the time was up


Unless the OS and hardware you're running on comes with a large fancy certificate saying "Real-Time" and costs a fortune, you will never have such guarantees.

Quote:
I really need far greater accuracy for what I am doing and am rather amazed at how inaccurate these functions are


For concurrency, you don't need a timer at all, at least not for this. OS times the slices quite well. Even for cooperative threads.

Share this post


Link to post
Share on other sites
Hodgman    51324
Quote:
Original post by Antheus
For concurrency, you don't need a timer at all, at least not for this. OS times the slices quite well. Even for cooperative threads.
That is, unless there are far too many processes/threads running - I've seen a 1ms Sleep take 5 seconds to return before on a badly designed multi-process system :(

Share this post


Link to post
Share on other sites
Raghar    96
Quote:
Original post by Antheus
Wrong reason. Concurrency is all about *not* waiting for anything, but simply going on and doing useful work.

Using CPU power unnecessary just increases power consumption for no reason. Some people like playing games and render on background, unnecessary consumption of CPU resources might alienate them.

Quote:
If you need to wait in this manner, then cooperative threads or single-threaded approach will be much better.

Isn't sheduler much better than a single programmer? Current CPUs love multithreaded applications, single threaded applications are quite outdated.





I recall that in my Java programs, I try to do ONLY things that thread must do, and then yield quickly. (Or interrupt the thread willingly. OS looks more smooth with threads that sometimes sleep, than with 200 threads that compete for CPU resources. With 200 CPUs there is about 200 speedup. ^_^)

Quote:
Original post by Adam_42
You may find that instead of sleeping, blocking waiting for a critical section / mutex / event will work better.

Sadly yes. While preemptive multithreading should be a norm current OS somehow didn't cought up reasonably quickly. Ability of a low priority thread to preempt a high priority thread is sometimes nifty as well (extremely low priority AI threads that should run about 1 per hour are sheduled doesn't matter on CPU load).

Share this post


Link to post
Share on other sites
stonemetal    288
Quote:
Original post by Hodgman
That is, unless there are far too many processes/threads running - I've seen a 1ms Sleep take 5 seconds to return before on a badly designed multi-process system :(


How is that a badly designed system? You asked to not run for at least 1ms not for exactly 1ms or at most 1ms. When you understand what the code does you understand the output given. Mostly what happens is after the timer expires your process is put on the back of the ready to run queue.

Share this post


Link to post
Share on other sites
Hodgman    51324
Quote:
Original post by stonemetal
Quote:
Original post by Hodgman
That is, unless there are far too many processes/threads running - I've seen a 1ms Sleep take 5 seconds to return before on a badly designed multi-process system :(
How is that a badly designed system? You asked to not run for at least 1ms not for exactly 1ms or at most 1ms. When you understand what the code does you understand the output given. Mostly what happens is after the timer expires your process is put on the back of the ready to run queue.
Well, haven't you just showed that expecting a 1ms sleep to be predictable *is* a bad design? ;)

It was actually a bad design though, because:
* Most of the processes were unnecessary, they could have easily been rolled into a single multi-threaded process.
* Most of the threads were unnecessary - lots were being used in places when a queue/call-back structure would have worked just as well and wouldn't have caused the OS's scheduler to crap it's pants.
* This particular sleep call was hidden under many layers of APIs, in what was supposed to be an "asynchronous network send" function that could be called from the main graphics loop. When the graphics loop can stall for 5 seconds (when it's supposed to have a guaranteed 33ms turn-around), I'd say it's a bad design!

Share this post


Link to post
Share on other sites
stonemetal    288
Quote:
Original post by Hodgman

* Most of the processes were unnecessary, they could have easily been rolled into a single multi-threaded process.
if the threads are scheduled by the os then there is no difference.
Quote:

* Most of the threads were unnecessary - lots were being used in places when a queue/call-back structure would have worked just as well and wouldn't have caused the OS's scheduler to crap it's pants.
* This particular sleep call was hidden under many layers of APIs, in what was supposed to be an "asynchronous network send" function that could be called from the main graphics loop. When the graphics loop can stall for 5 seconds (when it's supposed to have a guaranteed 33ms turn-around), I'd say it's a bad design!


Ok so again someone didn't understand what their code does and got burned by it. Sleep(0) is not a no-op, it is a I am done see you soon. It happens to everyone from time to time. It even has a name in this case it is called a leaky abstraction.

Share this post


Link to post
Share on other sites
Hodgman    51324
Quote:
Original post by stonemetal
if the threads are scheduled by the os then there is no difference.
Almost no difference - the process that a thread belongs to can affect scheduling. If you've got one process that Windows has deemed as "interactive" then it can starve threads belonging to other processes.
On a related note, you kind kind of predict the upper-bound on a Sleep call (~5 seconds) thanks to the starvation-fighting system:
Quote:
Whenever a thread in the foreground process completes a wait operation on a kernel object, the kernel function KiUnwaitThread boosts its current (not base) priority by the current value of PsPrioritySeparation.
The reason for this boost is to improve the responsiveness of interactive applications...
...
Threads that own windows receive an additional boost of 2 when they wake up because of windowing activity, such as the arrival of window messages. The windowing system (Win32k.sys) applies this boost when it calls KeSetEvent to set an event used to wake up a GUI thread. The reason for this boost is similar to the previous one—to favor interactive applications.
...
Once per second, the balance set manager (a system thread that exists primarily to perform memory management functions and is described in more detail in Chapter 7) scans the ready queues for any threads that have been in the ready state (that is, haven’t run) for approximately 4 seconds. If it finds such a thread, the balance set manager boosts the thread’s priority to 15.

Share this post


Link to post
Share on other sites
Oluseyi    2103
Quote:
Original post by Hodgman
Well, haven't you just showed that expecting a 1ms sleep to be predictable *is* a bad design? ;)

You can't request a 1ms sleep. You can only request a sleep of at least 1ms.

Quote:
It was actually a bad design though...

If stonemetal interpreted your earlier quote like I did, then he thought you were saying the OS was badly designed, because a "1ms sleep" took 45ms. If, in fact, you're saying someone badly designed/implemented their own multi-threading/multi-process application or system on top of the OS services, well...

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by Raghar
Using CPU power unnecessary just increases power consumption for no reason. Some people like playing games and render on background, unnecessary consumption of CPU resources might alienate them.


There is X work to do. That determines the CPU load. If done on one or 16 cores makes no difference.

Not waiting does not necessarily mean running a tight loop.

Quote:
Isn't sheduler much better than a single programmer?


Nope. It's merely more convenient, at possibly high run-time cost.

Quote:
I recall that in my Java programs, I try to do ONLY things that thread must do, and then yield quickly. (Or interrupt the thread willingly. OS looks more smooth with threads that sometimes sleep, than with 200 threads that compete for CPU resources. With 200 CPUs there is about 200 speedup. ^_^)


Interestingly enough, Java threads have more in common with co-routines than system threads. They are light-weight, and do not necessarily have a 1:1 mapping to OS threads. Context switches are much less expensive (relatively). Same goes for timing. Same goes for other VMs, or similar light-weight threading run-times.

If anything, hundreds of threads in Java are much less of an issue than equivalent in native application (not that there would be realistic or practical need for that many).

Share this post


Link to post
Share on other sites
Hodgman    51324
Quote:
Original post by Oluseyi
If stonemetal interpreted your earlier quote like I did, then he thought you were saying the OS was badly designed...
Ahh, I didn't even consider that it might be interpreted that way! Yes, the "bad system" I was talking about was an app built on top of windows.

Share this post


Link to post
Share on other sites
Raghar    96
Quote:
Original post by Antheus
Quote:
Original post by Raghar
Using CPU power unnecessary just increases power consumption for no reason. Some people like playing games and render on background, unnecessary consumption of CPU resources might alienate them.


There is X work to do. That determines the CPU load. If done on one or 16 cores makes no difference.

Imagine there is A work to do, there are B parallel working units each require A. The total work on single core would be B*A, total work on B cores would be A. With 100 threads a sheduler that can use 8 CPUs could offload 12-13 work units to each core. Because 1/100 < 1/12, the possibility of a GUI/input thread to commit action is higher and the program would be more snappy with great throughput.

Of course a computer with 256 cores would work even better.
Quote:
Quote:
Isn't sheduler much better than a single programmer?

Nope. It's merely more convenient, at possibly high run-time cost.

I agree it's convenient, however the programmer can't know runtime characteristic of other programs that runs on the same OS. Sheduler do. In addition the sheduler can switch cores to evenly load each core. (Basically threads with the same core mask are running on the same core, however the physical core changes without threads knowing anything.)

Quote:
Quote:
I recall that in my Java programs, I try to do ONLY things that thread must do, and then yield quickly. (Or interrupt the thread willingly. OS looks more smooth with threads that sometimes sleep, than with 200 threads that compete for CPU resources. With 200 CPUs there is about 200 speedup. ^_^)


Interestingly enough, Java threads have more in common with co-routines than system threads. They are light-weight, and do not necessarily have a 1:1 mapping to OS threads. Context switches are much less expensive (relatively). Same goes for timing. Same goes for other VMs, or similar light-weight threading run-times.

If anything, hundreds of threads in Java are much less of an issue than equivalent in native application (not that there would be realistic or practical need for that many).


Is that so? I run a test few minutes ago, Windows reported 300+ threads working heavily and competing for resources. If they are using non native threads they are hiding it fairly well. (Sun Java 6 properly updated)

I also recall a Java bug that was caused by changing OS thread shedulling parameters.

An application that was designed to run on cluster, or as parallel as possible to scale well on things like Larrabee, or dual oct core CPUs, basically needs as many native threads as possible without any synchronization whatever.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this