Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


FGBartlett

Member Since 22 Sep 2004
Offline Last Active Aug 06 2012 09:26 AM

Posts I've Made

In Topic: How many average number of threads does a game needs, regardless of simplicity?

06 August 2012 - 08:17 AM

Sorry, I meant lag the audio-- the audio in must be delayed to match the video lag. These are easy to build-- continuously running samplers, to a ring buffer, with an offset to drive the output sampling. The offset determines the audio delay.

In Topic: How many average number of threads does a game needs, regardless of simplicity?

06 August 2012 - 08:01 AM


You can still benefit from more threads than cores; here is an extreme example: a single core/single lane machine.

Is there ever any need for multithreading on such a machine?

No, there isn't very often a need for it -- it may be one way to solve the problem, but I assure you there's probably a single-threaded solution as well.

But it provides performance and behavior you can't achieve in a single threaded model.
as in --------Please wait....scene loading---------...

For example, background loading of scenes definitely is possible in a single-threaded game...

Sure; suppose you have a prep thread that is waiting on I/O or some other condition, like a FIFO being less than half full. It can yield while waiting to another thread. If you are single threaded, then that single thread eats all the latency in your model, and latency can't be hidden at all..

Threading should not be used for I/O-bound tasks (only processing-bound tasks). The OS is designed to handle I/O-bound tasks asynchronously without using extra threads already -- use the OS's functionality instead of reinventing it with extra threads.
If a single-theraded program wanted something to occur when a FIFO was half-full, it would likely use a callback that is triggered by the push function.

If you have threads that never yield at all, then each thread will try to consume a complete core. (Depending on the O/S, it will still occasionaly lose bandwidth and be parked, but it will always be crying for attention. Sometimes that is a necessity, but if such tight polls or freewheeling threads can be minimized, modern O/S can manage lots of well behaved yielding threads far in excess of the number of available cores/lanes...

But if you're writing a high-performance real-time system (such as a modern game engine), then you want a small number of threads that hardly ever yield to get predictable performance. Yielding a thread on Windows is totally unpredictable in length, with your only guarantee being that it's unlikely to be longer than 5 seconds (although yes: that case shouldn't occur unless you're massively oversubscribed)...

Headsup with the design of the threadsafe FIFO; it must use a two-step allocate and release model, because there is finite execution time between when a handle is pulled and when it is prepped or consumed. But that is easily done.

What's this "
two-step allocate and release model"? Is that specific to your OpenGL resource FIFO, or are you talking about thread-shared FIFOs in general?

A FIFO object is basically tracking a head and a tail in a circular fashion, with some maximum FIFO size. The FIFO should provife booleans for IsFull, IsEmpty, etc.


Functions like IsFull and IsEmpty are nonsensical in the context of a shared structure like a multiple-producer/multipler-consumer FIFO -- they can never be correct. It makes sense for Push and Pop to be able to fail (if the queue was full or empty), but simply querying some state of the structure, such as IsFull is useless, because by the time you've obtained your return value, it may well be incorrect and any branch you make on that value is very dubious.


Re: FIFO half full. There is no need for this to be precise; the point is, if you respond to the event FIFO Empty, it is too late. The assumption is that 'about half a FIFO' is enough latency to respond and keep the FIFO 'not empty', which is all the draining thread cares about. You never want to starve the draining thread or you get a stall.

Re: Two Step FIFO accessors. They can always be safely used; the STL variants can sometimes be used, with more care, and if your resource model changes, you have to review each usage. So I always use the two step scheme. IOW, if the two step variants are used, they are always thread safe. If the STL single step variants are used, they are sometimes thread safe. It depends on your resource model, how and if they are cycled, reused or shared.

If you are using a pooled resource model (resources cycled/resued)when both the filling thread and draining thread are finite in time between accessing a FIFO member and doing soemthing with it, you don't want the act of accessing the FIFO member to change the state of the FIFO; you want the act of releasing that FIFO member to change the state of the FIFO. The two step FIFO usage makes that explicit. (It can usually be implicitely handled without the two step process...and that will work as long as it is. The explicit two step process makes it harder to implement this wrong.) It can be and usually is arranged that every filling thread is done with the resource before touching the FIFO state, As long as it is. And dittto the considerations for the draining thread.

The assumption is, only the filling thread adds to the FIFO and only the draining thread pulls from the FIFO. So the filling thread a] get the next FIFO slot, b] does something to the associated resource(even if just to define it)and c] releases the FIFO slot, changing the FIFO state. Ditto the draining thread. Otherwise, if the act of accessing the FIFO slot simultaneously changes the FIFO state, you could have a condition where an EMPTY FIFO immediately changes state to NOT EMPTY, the waiting draining thread accesses the resource in process, and the filling thread is not finished prepping the resource.

The above assumes that some kind of pool of resources is being reused/cycled, not being continuously allocated anew by the filling thread. In that case, a filling thread -could- create the resource complete and then change the state of the FIFO and no harm. And the draining thread can as well, because the scheme is not re-using resources (like a buffer, FBO, VBO, or texture handle) but continuously allocating them and destroying them. But if you switch to a rotating pool of resources(to eliminate the constant creation/destruction of resources)then you might run into this need for 'two step' FIFOs.

These two step thread safe FIFO things are actually pretty simple. They are tracking a few integers (head, tail, max, count) and maybe maintaining a few booleans(FIFO empty, FIFO full, FIFO half full, etc., to drive events). (The models I usually use don't actually try to push resource objects themselves through any FIFOs-- the resources are from a pool and aren't copied, but are recycled. A Pool manager allocates a new resource if a free one of the requested flavor/size isn't available. When the first filling thread in a chain needs a resource, it requests it from a PoolManager. When the last draining thread in a chain is done with a resource, it returns it to the PoolManager. The PoolManager, most of the time, is simply changing some state value on the resource, to make it served or available. I also usually wrap the resource with some pool attributes so I can trace which thread is currently banging on a particular resource. But the models I use push handles to resources through the FIFOs. Because the locks are on the FIFOs and not the resources, because the thread access to these FIFOs is a low % of total thread bandwidth (beginning and end of each thread process cycle), and because mods to the FIFO are trivial, you really have to work at it to serialize your threads using this model. The concept of the FIFO itself isolates resources between the threads. The locks are not on the resources, but on the objects that isolate the resources. In that sense, the resources themselves are never locked, but isolated just the same. The things that are locked are seldom accessed(on a % basis). So no thread is ever left starved waiting for a resource conflict while any length process is being done on it.

A draining thread only cares about FIFO EMPTY. If its resource source FIFO is not EMPTY, it can process. If it is EMPTY, only its filling thread can change its state. Same thing with the filling thread. It only cares about FIFO FULL. If its output FIFO is not FULL, it can process. If it is FULL, only its draining thread can change its state.

In most chained thread models, the gating thread is the final compositor thread that pulls resources from its source FIFOs at whatever frame rate is required. The filling threads that service the FIFOs either need to keep up or else the render thread will be starved and frames will be skipped. But that is always the case, even in a single thread model. The output is usually driven by some target frame rate.

This gets hairy in real time video processing models, in which there exists both an input contract (sampled video input frames) and output contract(output video frames) This is a 'two clock' gating problem, even if it is the same clock, and in this case, the function of all those FIFOs in the process is to provide compliance for latency. This is why video processors almost always have a video processing delay; there is significant compliance in the streaming model, to accomodate latency. Video processors must lag the output video to accomodate this. You can always tell when they don't because the audio will be ahead of the video by the amount of the video processing lag.)

This is why, in the old days with DirectShow, you always saw canned examples of video to disk, and disk to video, but never video to video... it was a largely rigid model tolerant of only one gating sink or gating source. You can always cache ahead disk access, smooth it out and gate video out, or gate video in and cache it to disk, but gating both video in and video out in a streaming model is a challege. And FIFO's as caches are critical elements. Also, no way anything like that happens in a single threaded model. If live video input(not from store, but live video)ever becomes a significant part of game processing, this will become apparent. Games might tolerate glitch/missed/stuttering frames in the playback, but broadcasters definitely do not.

I also diagree re; mutlthreading I/O, even async. If your process spends any time at all waiting for an async I/O to complete, that time waiting can be put to better use. I just completed a project that demanded highest possible throughput to disk, and it was a streaming model that was not only async but multithreaded; in practical terms, it was the difference between a disk access light that blinked and one that was solid, running at full bus bandwidth. This also required lining up write sizes with target sector size multiples, unrelated to threading, but while this is occurring, a 400Hz update streaming waterfall plot is being handled as well, part of the same streaming chain. (The GUI thread isn't updated at 400Hz, but the FBOs are updated in the background at 400Hz and presented to the foreground GUI at reduced frame rate as a streaming freezable/scrollable in the GUI waterfall plot, without interrupting continuous stream to disk. I don't think anything close to that is possible in a single threaded model. Not only would you be trying to do it with maybe 1/8th the available bandwidth, but any time waiting for async I/O to complete is lost..

In Topic: fragment shader wipe effect

06 August 2012 - 06:36 AM

Here is an approach.

You have two full frame textures you want to composite in some fashion, TLeaving and TArriving. You've setup those with Texture2D samplers so that your fragment shader can sample each of them. At the beginning of the transition, you want 100% TLeaving and 0% TArriving. At the end of the transition, you want 0% TLeaving and 100% of TArriving. In between you want a uniform (set on each frame by your driving CPU code) that is varying from 0% to 100% to drive your transition. If it's a wipe, then you can imagine the transformed X coordinate in clip space to be what is driven as the boundary between sampling TLeaving and TArriving in your shader.

(By 'frame' I mean, render cycle during this transtion. You decide how fast you want to drive the transition, how many render cycles it is going to take to complete. You drive that externally, your shader responds to the current render cycle/frame. If it is 60 frames, then frame 0 is 0%, frame 59 is 100%, etc.

You could use an if but don't. Instead, calculate the clip X(gl_PointCoord.X, 0 to 1) for this Transition%, divide every clip X by that value, and assign to an integer to get 0 or 1.(the index of the sampler to use, not to be confused with the normalized range of the clip space, 0 to 1) Use that as an index to sample either sampler [0] or sampler [1]. The alternative, using an if(pos.x > transClipX), will have unique paths in your shader, causing a stall. A stall isn't fatal but your shader will execute faster if every fragment instance takes the same conditional path. (ie. if you can arrange it, only use conditionals when each path will the same for every instance in a workset.)

Make it a vertical wipe by using ClipY(gl_PointCoord.Y)

Make it a fade by alpha blending the two samples, etc.

In Topic: How many average number of threads does a game needs, regardless of simplicity?

01 July 2012 - 08:46 PM

You can still benefit from more threads than cores; here is an extreme example: a single core/single lane machine.

Is there ever any need for multithreading on such a machine?

Sure; suppose you have a prep thread that is waiting on I/O or some other condition, like a FIFO being less than half full. It can yield while waiting to another thread. If you are single threaded, then that single thread eats all the latency in your model, and latency can't be hidden at all..

Another approach would be to put that thread in a tight loop constantly polling to see if the FIFO was less than half full but that is the poont od using FIFOs... lets assumr its a filling or prep thread; its job is to make sure that its draining thread never sees an empty FIFO. The prep thread can periodically detect 'half empty. wake up and process some number of resources, and then yield again.

Same for mutli-core/multi-lane machines. As long as threads can intelligently yield when possible(they are waiting for some condition, like a FIFO to be less than half full) then that time yielded can be used by another thread.

If you have threads that never yield at all, then each thread will try to consume a complete core. (Depending on the O/S, it will still occasionaly lose bandwidth and be parked, but it will always be crying for attention. Sometimes that is a necessity, but if such tight polls or freewheeling threads can be minimized, modern O/S can manage lots of well behaved yielding threads far in excess of the number of available cores/lanes...

In Topic: Dynamic branching in shader not working. Keeps jumping out.

01 July 2012 - 08:25 PM

I think conditional branching in shaders only provides a performance hit if each SIMD instance might take a unique branch. If so, you will get a SIMD 'stall'. The shared branch instances will execute in the current active working set, then the other branch set will execute, and then eventually they will 'sync' up afer all branch sets and once again grinf away efficiently as SIMD across the entire working set.

But just because you take a hit doesn't mean you can't do it. Just be aware there is a hit. Instrument and measure the hit, compare normal cases with extremes. It might not be so bad, that totally depends on the logic.

This issue becomes more front and center with openCL but it also applies to shaders.

PARTNERS