Jump to content
  • Advertisement
Sign in to follow this  
hick18

Best way to downsize buffer?

This topic is 2954 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Under DirectX10, whats the best way to downsize a rendertarget? Would it be to render to a screen quad with a another much smaller rendertarget? or would it be to create mip maps on that buffer and then copy one of those across?

Im wanting to copy across the depth buffer to the CPU so I can use it for occusion culling, and figured this process would be quicker if I was copying across a much smaller buffer.

Share this post


Link to post
Share on other sites
Advertisement
3 things:

1) Don't do what you are doing. Use a hardware occlusion query instead.
2) People often forget that PCIe is symmetric. This means the bandwidth up is equal to the bandwidth down. This means that you might (might) be able to just download the whole thing without a problem. Try it first. Premature Optimization is the root of all evil.
3) I imagine that making a new render target and rendering with that will be better. you will have more flexibility and you won't have to generate an ENTIRE mipmap chain.

Share this post


Link to post
Share on other sites
Well, from what I've read, hardware occlusion queries are expensive as they themselves are a draw call. And by having a copy of the depthbuffer accessable by the CPU I figure it will provide you with more freedom and power to do other things with it too. But I shall do more research into Hardware queries to see if they are a better option.

Not sure what you mean with your second point. I dont see how the comparison of the upload/download speeds will affect how fast uploading different sized buffers will be.

Share this post


Link to post
Share on other sites
2) I just mean that there's this common knowledge (from the AGP days) that gpu readback should be avoided at ALL COSTS NO MATTER WHAT, and people spent a lot of time developing (expensive) algorithms to do things (like hierarchical GPU gather using mipmaps) that are actually slower on modern hardware than just doing a readback is. People forget that while your PCI bus might not be the greatest thing in the whole world, it IS designed to upload hundreds of megabytes of game textures in only a few seconds. At the very least, even crappy graphics cards are generally capable of neal-time or psuedo real-time rendering of a full-screen framebuffer from cpu memory. My point was that, therefore, since PCI is symmetric, it might be completely efficient enough to read back the whole framebuffer even though a lot of people assume that it wouldn't be.

Share this post


Link to post
Share on other sites
Hardware occlusion queries are not necessarily expensive, and they will definitely be a LOT cheaper than what you're proposing to do (never mind that bandwidth up == bandwidth down with PCIe, you'll still need to stall the pipeline to do the readback. Not optimizing at all is the root of even more evil. Or Neverwinter Nights 2. Take your pick.)

The naive version of hardware occlusion queries will do a seperate draw pass to lay down the occluding objects, then draw the potential occludees, then try to read back the results, all in the same frame. This is expensive, and will stall the pipeline.

The non-naive version will use the regular draw pass for objects that are not to be occluded, run the queries against that, and will assume that things are not going to change much from frame-to-frame (this is more valid than you might think, even in a fast-action FPS) and therefore read back the results in the following frame. If the results are not yet ready it will use the most recent set of valid results. Objects smaller than a certain number of triangles (200 or so works well for me) don't get queries run as it will be just as cheap to draw them anyway as it would be to test.

The end result here is an implementation of hardware occlucion queries that does not stall the pipeline, is just as fast as if you weren't using them for scenes where you don't need them, and will kick in with the necessary perf boost for scenes where you do.

Don't believe everything you read. ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by mhagain
Objects smaller than a certain number of triangles (200 or so works well for me) don't get queries run as it will be just as cheap to draw them anyway as it would be to test.


But generally its not the amount of triangles being drawn that takes up the time, its the setting up of the draw call.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!