Best way to downsize buffer?

Started by
4 comments, last by hick18 13 years, 9 months ago
Under DirectX10, whats the best way to downsize a rendertarget? Would it be to render to a screen quad with a another much smaller rendertarget? or would it be to create mip maps on that buffer and then copy one of those across?

Im wanting to copy across the depth buffer to the CPU so I can use it for occusion culling, and figured this process would be quicker if I was copying across a much smaller buffer.
Advertisement
3 things:

1) Don't do what you are doing. Use a hardware occlusion query instead.
2) People often forget that PCIe is symmetric. This means the bandwidth up is equal to the bandwidth down. This means that you might (might) be able to just download the whole thing without a problem. Try it first. Premature Optimization is the root of all evil.
3) I imagine that making a new render target and rendering with that will be better. you will have more flexibility and you won't have to generate an ENTIRE mipmap chain.
Well, from what I've read, hardware occlusion queries are expensive as they themselves are a draw call. And by having a copy of the depthbuffer accessable by the CPU I figure it will provide you with more freedom and power to do other things with it too. But I shall do more research into Hardware queries to see if they are a better option.

Not sure what you mean with your second point. I dont see how the comparison of the upload/download speeds will affect how fast uploading different sized buffers will be.
2) I just mean that there's this common knowledge (from the AGP days) that gpu readback should be avoided at ALL COSTS NO MATTER WHAT, and people spent a lot of time developing (expensive) algorithms to do things (like hierarchical GPU gather using mipmaps) that are actually slower on modern hardware than just doing a readback is. People forget that while your PCI bus might not be the greatest thing in the whole world, it IS designed to upload hundreds of megabytes of game textures in only a few seconds. At the very least, even crappy graphics cards are generally capable of neal-time or psuedo real-time rendering of a full-screen framebuffer from cpu memory. My point was that, therefore, since PCI is symmetric, it might be completely efficient enough to read back the whole framebuffer even though a lot of people assume that it wouldn't be.
Hardware occlusion queries are not necessarily expensive, and they will definitely be a LOT cheaper than what you're proposing to do (never mind that bandwidth up == bandwidth down with PCIe, you'll still need to stall the pipeline to do the readback. Not optimizing at all is the root of even more evil. Or Neverwinter Nights 2. Take your pick.)

The naive version of hardware occlusion queries will do a seperate draw pass to lay down the occluding objects, then draw the potential occludees, then try to read back the results, all in the same frame. This is expensive, and will stall the pipeline.

The non-naive version will use the regular draw pass for objects that are not to be occluded, run the queries against that, and will assume that things are not going to change much from frame-to-frame (this is more valid than you might think, even in a fast-action FPS) and therefore read back the results in the following frame. If the results are not yet ready it will use the most recent set of valid results. Objects smaller than a certain number of triangles (200 or so works well for me) don't get queries run as it will be just as cheap to draw them anyway as it would be to test.

The end result here is an implementation of hardware occlucion queries that does not stall the pipeline, is just as fast as if you weren't using them for scenes where you don't need them, and will kick in with the necessary perf boost for scenes where you do.

Don't believe everything you read. ;)

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Quote:Original post by mhagain
Objects smaller than a certain number of triangles (200 or so works well for me) don't get queries run as it will be just as cheap to draw them anyway as it would be to test.


But generally its not the amount of triangles being drawn that takes up the time, its the setting up of the draw call.

This topic is closed to new replies.

Advertisement