Difficult Question: input/output calcs gpu

Started by
9 comments, last by mattnewport 18 years, 9 months ago
Hi, I'm not a game programmer, but I figured you guys (knowing the GPU architecture better than me) would be able to help me out. I'm trying to run some non graphics related calculations on the GPU and get the results of the calculations. There's programs like BrookGPU and what not but I'd like to do this in ASM so I need a lower level solution. OK heres the question. Using the API in d3dx9_26.dll how can I give the GPU say 3 floats, have the gpu add them together then add a constant, and retreive the result of the accumulation. I don't need any graphics or drawing on the screen. I assume I'd need a shader like (started learning gpu asm a few min ago) vs_1_1 ;v0.xyz = 3 floats, c0.x = const value to add to the accum'd #define ;don't realy know #decl ;what these are for :add r1.x, v0.x, v0.y ;add input floats add r1.x, r1.x, v0.z ; add oPos.x, r1.x, c0.x ; add the constant to the accumulation Then an API call to AssembleShader which points to a buffer that holds the shader code above. I assume there's some API to set the constant register But how to set the input register ? But after that I don't know how to run the program on the gpu and retreive the result. Or how to input an array of floats and retreive the output. SORRY FOR THE TOTALLY OFF TOPIC QUESTION But running nongraphics code on the gpu is going to become very popular very soon and you guys (being game programmers) seem to have the best grasp in the implementation knowledge of it.
Advertisement
What's wrong with SSE/SSE2 and/or 3DNow! ?
Quote:Original post by BetaASM
Hi, I'm not a game programmer, but I figured you guys (knowing the GPU architecture better than me) would be able to help me out.

I'm trying to run some non graphics related calculations on the GPU and get the results of the calculations. There's programs like BrookGPU and what not but I'd like to do this in ASM so I need a lower level solution.

What's wrong with the level of abstraction offered by Brook? Remember that in GPU terms, shader asm is just as much an abstraction as HLSL, Cg, or Brook.
Quote:OK heres the question.
Using the API in d3dx9_26.dll how can I give the GPU say 3 floats, have the gpu add them together then add a constant, and retreive the result of the accumulation.

If you're doing GPGPU stuff, I suggest you use OpenGL, not Direct3D. It has a lot better support for this sort of thing. In OpenGL, you'd render to a pbuffer, and then read back from the pbuffer. For more information, you should check out GPGPU.org; it's sort of a clearinghouse for this stuff. If you really wanted to do just one addition, it'd be a 1x1 pbuffer. Of course, this would be incredibly inefficient. GPGPU stuff thrives on parallelization.
Quote:SORRY FOR THE TOTALLY OFF TOPIC QUESTION
But running nongraphics code on the gpu is going to become very popular very soon and you guys (being game programmers) seem to have the best grasp in the implementation knowledge of it.

As a graduate researcher in graphics programming and in GPGPU, I disagree. Vectorized processing in consumer-level hardware? sure, it'll be popular. But it won't be hosted in the GPU. The GPU is just the wrong place to do this stuff, for many reasons. The current popularity of GPGPU is a response to the current (transient) situation where everyone has a nice GPU and nobody has a general-purpose vectorized coprocessor.
-Level of abstraction
I was ambiguous, I meant that the programs I would be writing would be in x86 asm (86-64 in the near future) but anyways I wasn't refering to the GPU assembly language. Which is why I wanted to use the lowest availible valid level of shader programming, which was the assembly style syntax.

-SSE/2 3d Now instructions
They can't touch the single precision vector FP speed of the GPU. If you don't beleive me check out http://sourceforge.net/projects/ffff/ run the benchmark built into that program and you'll see around a 4x speed increase from the optimized SSE instructions to the GPU processing.

-The simple addition
Was an EXAMPLE I required to get a handle on the technique K.I.S.S. ala HelloWorld. Obviously its ineffecient to run 3 instructions on the GPU that's not the point at all.

Well I guess I'm forced to look into GLUT and openGL.
I know it was unintentional but you've managed to dodge helping me all together LOL. Thanks anyways.

Quote:Original post by BetaASM
-SSE/2 3d Now instructions
They can't touch the single precision vector FP speed of the GPU. If you don't beleive me check out http://sourceforge.net/projects/ffff/ run the benchmark built into that program and you'll see around a 4x speed increase from the optimized SSE instructions to the GPU processing.

That's true; currently high-end GPUs can easily beat CPUs in some non-graphics computational tasks. However, this will most likely not last for more than 2-3 more years. Once architectures similar to that of IBM's Cell processor become more commonplace, it will be more or less useless to do general computation on the GPU.
Quote:Original post by BetaASM
But running nongraphics code on the gpu is going to become very popular very soon

I doubt it. Technical issues aside, its never going to give you a significant gain for any practical app. Either you need a certain degree of precision (scientific apps) in which case GPUs just don't cut it. If you can live with the inaccuracy you're probably doing something realtime like a game where your graphics card is already busy doing proper display stuff.

Slightly more practically, you can't just write a GPU shader and tell it to 'do stuff'. You need to render dummy geometry(*) in order for the shader to actually process the vertices and fragments generated. Your end result becomes avalible in the form of your framebuffer (or other render target). You'll have to read back the framebuffer (backwards over the AGP - slow!) to actually get the results on the CPU.

How you actually set constant input registers depends on your API. If you're looking at OpenGL then the orange book (OpenGL Shader Language) is a good place to start.

(*) Or more likely, carefully massaged geometry. Another example of why this is impractical, as you end up with lots of non-trivial setup and tear down just so your GPU can rattle though the instructions.
GPUs are getting a lot more general purpose - in the Longhorn time frame with WGF 2.0 it will become a lot easier to use them for non-graphics computations as a lot of the current difficulties and restrictions are relaxed. The GPU in the Xbox 360 hints at the sort of things that we will probably see on the PC in the future - unified vertex and pixel shaders that can write to main memory as well as to the frame buffer for example. The Cell is an example of a CPU becoming more like a GPU (in some ways it is more restrictive) whilst GPUs are becoming more general purpose and more like a CPU. At the moment I think it's far from clear that CPUs will eventually displace GPUs rather than the other way around.

Game Programming Blog: www.mattnewport.com/blog

Quote:Original post by mattnewport
GPUs are getting a lot more general purpose - in the Longhorn time frame with WGF 2.0 it will become a lot easier to use them for non-graphics computations as a lot of the current difficulties and restrictions are relaxed. The GPU in the Xbox 360 hints at the sort of things that we will probably see on the PC in the future - unified vertex and pixel shaders that can write to main memory as well as to the frame buffer for example. The Cell is an example of a CPU becoming more like a GPU (in some ways it is more restrictive) whilst GPUs are becoming more general purpose and more like a CPU. At the moment I think it's far from clear that CPUs will eventually displace GPUs rather than the other way around.

Oh, I don't think either will displace the other.

GPUs are indeed getting more general purpose; they need to, to provide the shading capabilities developers are demanding. But at the same time, their basic data transfer paradigm has not shifted much: They stream data at blistering speeds but do not reverse their flow direction easily.

When you say "The Cell is an example of a CPU becoming more like a GPU" what you really mean is "The Cell is an example of a CPU incorporating vectorized processing". It's important to recognize that this isn't a capability particular to GPUs. DSP hardware, in particular, has been doing this stuff for a long time. And a GPU is not JUST a vectorized processor. Despite the move towards programmable pipelines, a GPU is still optimized for performing very specific tasks, such as triangle rasterization and 4x4 matrix ops, to the detriment of other tasks. For instance, the GPU lacks a "scatter" operation.

Ultimately, as the Cell processor has shown, there's nothing magical about a GPU. Intel could decide tomorrow to stick a GeForce 6800 GPU on a P4 die and get the sort of vectorized performance. But if Intel really wanted vectorization, they wouldn't use a GeForce; they'd use something with more general vectorization. I think what we'll see is the GPU becoming more general but remaining a GPU, while CPUs evolve vectorization support similar to that found in GPUs and other vectorized processors.
Quote:Original post by Sneftel
In OpenGL, you'd render to a pbuffer, and then read back from the pbuffer.


What is that different than rendering to texture in DirectX, and reusing that texture as input in next iteration?
Bulma
I kind of agree with Sneftel, in that I don't think GPUs will be used that much for general computation...They are so stream oriented and much of their speed comes from their pipeline nature. What I think that the current GPGPU interest shows is that some people want more general purpose math acceleration and the only place they are finding it now is the GPU. That doesn't mean that GPUs are the solution.

On another note, there seems to be a lot of hyping of GPGPU stuff these days. I know there are many demos of GPU computation and I am very familiar with the power of the GPU's vector float processing, but does anyone actually use the GPU for something useful and practical besides graphics and GPGPU demos? Pardon my ignorance if they do, I'm just curious.

This topic is closed to new replies.

Advertisement