Jump to content
  • Advertisement
Sign in to follow this  
hick18

Passing work to GPU

This topic is 2900 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I wanted to test the performance of passing work down to the GPU and then reading the results back, as opposed to doing it all on the CPU, and Im finding it slower than I'd like. Could someone just check Im doing this correctly?

Im wanting to pass a buffer down to the GPU, let the GPU work on it in a Vertex shader and then read it back on the CPU. So im doing the following

Creation
----------
- Create a Dynamic vertex Buffer, with CPU write
- Create a Staging vertex Buffer, with CPU read
( both the exact same size )


Each Frame
----------
- Map dynamic buffer as write discard, write to it, and unmap
- Run vertex shader to perform work on dynamic buffer
- Copy Dynamic resource to Staging resource using CopyResource()
- Map Staging buffer as read, read results, and unmap

I havent got around to testing the stream out bit yet. But that would require 3 buffers right? 2 to ping pong and then the 3rd for the staging?

Share this post


Link to post
Share on other sites
Advertisement
What kind of work are you trying to do on the GPU?

I generally use pixel shaders do to any gpgpu stuff. Pass the data in as a texture, draw a full-screen quad then have the pixel shader do whatever you want with the input data. It works great for image manipulation kind of things. I've also done water simulation and seen others use it for cloth physics calculations.

While not such a big deal now with unified shaders but in the past gpus generally had more pixel shaders then vertex shaders so doing work on the pixel shaders made more sense.

Share this post


Link to post
Share on other sites
It was just a test, and I wanted to see the performance. But my plan was to try a dynamic occlusion technique by sending down the AABBs of all objects that need to be rendered, and testing these against the depth buffer. Basically my plan was to do something like this.

For each frame
------------------------------
- Create list of items that need to be drawn
- Run list through AABB\Frustum test
- Render remaining objects to depthbuffer( no colour, just depth )
- Unbind depthbufferand set as shader resource
- set a vertex Buffer containing the remaining objects AABB's, where each Vertex is an object AABB
- set a stream out buffer
- Run vertex shader on the vertex buffer which tests that AABB against the depthbuffer, and outputs true if any of the AABB points pass the depth test or false otherwise. So basically, calculate each of the AABB 8 points into NDC space, and compare against depth texture. ( Like shadow mapping )
- Render all objects in the list with full colour that passed the occulsion test.

I figure its going to be slow as it requires a write to a buffer and then a read from a buffer as well, but wanted to test it.

The other way would be to simply copy the depthbuffer across to the CPU and do what the vertex shader was doing on the CPU. The deph buffer could also maybe be down sized to a quarter of the size so less data would need to be copied.

Share this post


Link to post
Share on other sites
Getting a frame from the GPU every frame is extremely slow. It causes the GPU to have to lock the resource and stop everything until the frame is retrieved. I've found that some GPUs can do this without too much of a performance hit but others are diabolically slow taking 50-100ms to get a textures data from the GPU.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!