Graphics Hardware simultaneous or asynchronous?

Started by
5 comments, last by Anthony Serrano 16 years, 9 months ago
I guess this is a question about how the operating system of a (WinXP) computer interacts with the graphics accelerator card. I don't completely understand the issues regarding wether the GPU and graphics card are actually operating "concurrently" with the rest of the computer, or wether the GPU must complete all its processing before returning control of the code back to the executable/Operating System. What difference this makes to a programmer will become clear after I show an example. Let me be a little more explicit about what I am trying to ask here. We have two possibilities: [Possibility A] The GPU sits idle and waits for commands from an executable in memory. Upon receiving a command from the executable, it then stops the processing of any further code, performs the action, and then returns control of the computer back to the executable. Possibility A means that the CPU in a computer runs asynchronously with the GPU. The "acceleration" gained from a graphics accelerator here means that the GPU is merely compressing the time that it takes to perform 3D graphics, rather than actually loading the work off the main CPU while the CPU does "other things". [Possibility B] The GPU is never idle, and is constantly processing graphics operations regardless of what the rest of the computer is doing, and regardless of the speed at which the rest of the computer is doing it. In this scenario, the GPU runs *simultaneously*, or "concurrently", with the CPU and the operating system, and the executable itself. An executable that is issuing openGL functions is actually "dispatching" commands to the GPU; but after dispatching such commands, it does not sit there and wait for a result, but goes on to the rest of the code in the main executable. I have a few large books on programming 3D games using graphics hardware, and none of them address this issue with any clarity. I don't know whether I am programming a machine that is engaging in Possibility A, or one that is engaging in Possibility B!! If you think that this issue doesn't matter or whatever, consider the following code from a callback function for a WinXP executable: // Data initialized elsewhere before OnIdle() Vector4f* avInput = <array of vectors> Vector4f* avOutput = <array of transformed vectors> float * afPIn = (float *)avInput; float * afPOut = (float *)avOutput; // BOOL CGameApp::OnIdle(LONG lCount) { // Let the base class perform its overhead tasks. CWinApp::OnIdle(lCount); if( firstidlecall ) { firstidlecall = false; // Never do this again // Perform initialization. glMatrixMode(GL_MODELVIEW); } else { // Otherwise get the results of commands // issued the last time. glGetFloatv(GL_MODEL_VIEW_MATRIX, afPOut ); glPopMatrix(); } AppDefined_UpdateMatrix(avInput); glPushMatrix(); glMultMatrix3f(afPIn); return TRUE; // call ::OnIdle() again. } If Possibility A is going on, then placing the glMultMatrix3f() right before the return call will not gain any speed over just performing all the gl operations in a row and waiting for the result. However, if the GPU is running "concurrently" with the CPU then the glMultMatrix3f() may be performed while the rest of the system is performing other tasks, such as dispatching messages and executing other active threads elsewhere in memory. This issue is very important if you are considering writing code that is going to use the GPU as a "second processor" in the computer meant to perform computations other than 3D graphics. If they do not run cuncurrently, then you are really not loading anything off onto the graphics card, but merely comparing the relative speed of the GPU against the same operations performed in software.
Advertisement
Possibility A is actually synchronous, not asynchronous.

At the hardware level, the CPU and GPU are highly asynchronous: they can operate independently of one another. In fact, the CPU runs independently of nearly everything: it sends queries through the various buses (including the PCI Express or AGP bus used by the GPU) and to the various computing units, and then starts doing other things while waiting for the results. If there are no other things to be done, it stalls (becomes idle until it gets the results to do something else).

At the software level, things are a little bit more complex. The CPU interacts with the peripherials by sending data along buses using protocols described by drivers. If the protocol provided by the driver involves waiting for the GPU computation to end, then this is what the CPU will do. If the protocol provided by the driver involves queuing commands and using another thread to send the batches to the GPU when it's done with the previous one, then the CPU will be able to do other things. It all depends on the driver.

OpenGL itself is asynchronous by design. It was designed to be able to send rendering commands across a network back in a time when networks were slow, and is still used today to send very costly and complex rendering commands without losing interactivity. So, I would assume that, unless the driver is very nasty about it (for instance, on embedded systems with limited thread support), OpenGL should be able to make asynchronous calls.
Quote:Original post by ToohrVyk
Possibility A is actually synchronous, not asynchronous.

[snip]

OpenGL itself is asynchronous by design. It was designed to be able to send rendering commands across a network back in a time when networks were slow, and is still used today to send very costly and complex rendering commands without losing interactivity. So, I would assume that, unless the driver is very nasty about it (for instance, on embedded systems with limited thread support), OpenGL should be able to make asynchronous calls.


So you seem to be saying that some of the commands are asynchronous and some of them are not, depending on the driver used. If I am writing a WinXP single thread application that hopes to move some of the computational burden to the GPU, what steps should I take to ensure that I'm not simply idling to wait for results?

In animated games, I have noticed that it is up to the application itself to tell the graphics card when to swap the rendered "backbuffer" onto the screen. I assume that the application simply idles while that takes place, and regains control after the graphics card has finished doing this. It seems to me that it HAS to work that way if the application code itself is telling the graphics card when to swap in the "backbuffer". The alternative would be the graphics card itself updating the frames by itself. But this simply does not go on, am I wrong?


It seems this problem cannot be solved by simply using multiple threads, since multitasking is really an illusion created by the Operating System that juggles pieces of threads so quickly that the human eye is tricked into thinking they are really happening at the "same time". So even if I were rendering every even row using software, and having some other thread render every odd row of an image using the GPU, that they really are not happening at the same time, since the control of either thread is performed to the exclusion of the other. (Maybe I'm wrong?)
The key to it comes down to two things;
1) the command buffer
2) reading back results

When you submit instructions to the graphics card they are queued up for execution by the driver and the GPU works it way through this buffer. It's kept topped up while there are things to draw, however certain operations act as a break in this stream of data because they rely on the results from operations before it.

A good example is glReadPixels; for the readback to happen all the commands executed before it must completed, the readback can then take place and once that has completed the GPU can continue to execute drawing commands.

However, while it is waiting for the read to take place the CPU has stalled and is waiting on the GPU to finish work.

The other way a stall can happen is if the command buffer gets full and the driver needs to wait to be able to flush some of the work to the GPU's buffers before continuing. There isn't anything you can really do about this one.

When it comes to the backbuffer swapping I believe current drivers (NV and AMD Vista anyways) just insert the swap into the command stream and return right away. But dont' quote me on that [smile]

In the end, avoiding readbacks is the best way to avoid stalls, as well arranging your loop so that while the rendering is going on you are updating your logic, this should overlap things nicely.

(granted, PBOs allow non-stalling async transfer of data, which can remove some of the 'sync points' which occur; these are basically any time data needs to be sent or pulled back from the card).
Quote:Original post by phantom
The key to it comes down to two things;
1) the command buffer
2) reading back results

[snip]

A good example is glReadPixels; for the readback to happen all the commands executed before it must completed, the readback can then take place and once that has completed the GPU can continue to execute drawing commands.

However, while it is waiting for the read to take place the CPU has stalled and is waiting on the GPU to finish work.



Ok I think I really have an idea of how this works now. The main executable sends commands to the graphics card in a command buffer. The graphics card pulls the commands off the command buffer at whatever pace it operates. The the main CPU will idle if it needs some data from the GPU, (glReadPixels()) and this idle time will depend on how backed up the command buffer has become. I think it would be interesting to write the code for a driver for a modern GPU.

Phantom's answer is basically correct for most modern GPUs.

To be more specific, the GPU has a command processor that DMAs commands from a buffer in system memory in kernel space. The driver is responsible for allocating this buffer and configuring the GPU to use it.

However, not all GPUs will do this... on some systems (perhaps embedded systems with low memory), the GPU will operate in a mode where all commands are sent directly to the card via register writes.

About the frame buffer swapping, the CPU is free to run on its own once the command has been sent to the card (either via DMA or direct register writes). In fact, by the time the CPU reaches the next frame, it is possible that the GPU has not yet swapped the previous frame or is only partially done swapping the frame. This is actually what causes tearing to occur in your image and it also explains why enabling VSYNC slows things down, since the CPU has to wait until it knows the GPU is done swapping the frame before drawing the next frame.

Thats what I know to the best of my knowledge.. please correct me if i'm wrong about anything :D
Quote:Original post by Clapfoot
About the frame buffer swapping, the CPU is free to run on its own once the command has been sent to the card (either via DMA or direct register writes). In fact, by the time the CPU reaches the next frame, it is possible that the GPU has not yet swapped the previous frame or is only partially done swapping the frame. This is actually what causes tearing to occur in your image and it also explains why enabling VSYNC slows things down, since the CPU has to wait until it knows the GPU is done swapping the frame before drawing the next frame.


Swapping buffers is generally a very fast operation.

Tearing happens when the buffers are swapped while the frame is actively being scanned (because actually DISPLAYING the image on the monitor isn't instantaneous). Enabling VSYNC tells the driver to only swap buffers during the vertical blanking period; therefore, with VSYNC on, you can't swap buffers more often than the monitor refresh rate.

This topic is closed to new replies.

Advertisement