200 passes, 1 stream?
I red in a NVIDIA paper that when you render many instances of a mesh you could send the transformation matrices as a stream, to minimize overhead. I have a mesh that I want to render 200 times but with 200 passes, one pass per instance. Could I still use auxiliary streams? And how would I do it (I work in DirectX)? Another related question is if I can tell the GPU during run how many pass it should on do on one technique? One frame I want to render 190 instances and the next 210, optimal would be if it could be told by the length of the auxiliary stream. I know I can do it with 200 application calls, but that’s so much overhead.
To do stream instancing you need to have two vertex buffers, and set the streams accordingly. The first buffer is for the normal geometry, and the second buffer contains all the transformations..
There are some demos out there that show how to do it.
There are some demos out there that show how to do it.
Of course I don't know what exactly you want to do, but generally, if you are doing 200 passes, then chances are good that you're doing something wrong.
If this is for example about rendering 200 lights, then you are off a lot better by using a different method (for example deferred shading). Alternatively, if you don't do shadow computations, you may consider collapsing all those lights into one shader.
Also, doing 200 lights is usually not sensible, as most people can't even distinguish 4 lights from 5 lights.
Lastly, I am not sure if instancing really does what you want. What you normally use instancing for is to render large numbers of generally similar objects that have the same state, but for example different translations/orientations/scales and modified texture coordinates (or whatever else).
Multiple passes normally involve changing textures, shaders, and shader constants, i.e. rendering identical objects with identical translations/orientations/etc. but different state.
Telling the GPU how many "passes" to run is a matter of changing one number. That's the easiest thing about it.
If this is for example about rendering 200 lights, then you are off a lot better by using a different method (for example deferred shading). Alternatively, if you don't do shadow computations, you may consider collapsing all those lights into one shader.
Also, doing 200 lights is usually not sensible, as most people can't even distinguish 4 lights from 5 lights.
Lastly, I am not sure if instancing really does what you want. What you normally use instancing for is to render large numbers of generally similar objects that have the same state, but for example different translations/orientations/scales and modified texture coordinates (or whatever else).
Multiple passes normally involve changing textures, shaders, and shader constants, i.e. rendering identical objects with identical translations/orientations/etc. but different state.
Telling the GPU how many "passes" to run is a matter of changing one number. That's the easiest thing about it.
Quote:Original post by Damon Shamkite
Of course I don't know what exactly you want to do, but generally, if you are doing 200 passes, then chances are good that you're doing something wrong.
If this is for example about rendering 200 lights, then you are off a lot better by using a different method (for example deferred shading). Alternatively, if you don't do shadow computations, you may consider collapsing all those lights into one shader.
Also, doing 200 lights is usually not sensible, as most people can't even distinguish 4 lights from 5 lights.
Lastly, I am not sure if instancing really does what you want. What you normally use instancing for is to render large numbers of generally similar objects that have the same state, but for example different translations/orientations/scales and modified texture coordinates (or whatever else).
Multiple passes normally involve changing textures, shaders, and shader constants, i.e. rendering identical objects with identical translations/orientations/etc. but different state.
I am using deferred shading, every pass is for one light volume. I'm working with a GI method so I need many lights.
Quote:Original post by Damon ShamkiteI had the impression that you had to tell explicitly in advance every pass that has to go in a technique. But maybe I was wrong? Where do you set this number, I'm using DirectX/HLSL/fx-file.
Telling the GPU how many "passes" to run is a matter of changing one number. That's the easiest thing about it.
For deferred shading, you do not really do 200 "passes". Well, at least I'd not call it that. It's one pass for the geometry (with n objects) and one pass for the light volumes (with 200 light objects/instances).
Though it is unsure if you really gain from it, you could use instancing for this.
For example, if you render a z-culled light sphere for each light, you could put a static mesh into a buffer, and render it 200 times. You would also for example put 200 translation vectors (to address the simplest case only, no rotation/scale) into a second buffer and call SetStreamSourceFreq accordingly, so DX reads one complete mesh and one translation vector from the two buffers, respectively. Then you need to translate your static light mesh in the vertex shader using the uniform value that is read from the stream.
I don't use fx, so I wouldn't know if/how that can be done in there (doubt it). The way it works using the API is described in http://download.nvidia.com/developer/presentations/2005/GDC/Direct3D_Day/D3DTutorial05_Instancing_FPSpecials.pdf
However, I don't think you will really get a considerable performance improvement. When rendering the light geometry, you are normally doing considerable pixel shader work, so the cost of a humble 200 API calls will probably be hidden by that latency.
Though it is unsure if you really gain from it, you could use instancing for this.
For example, if you render a z-culled light sphere for each light, you could put a static mesh into a buffer, and render it 200 times. You would also for example put 200 translation vectors (to address the simplest case only, no rotation/scale) into a second buffer and call SetStreamSourceFreq accordingly, so DX reads one complete mesh and one translation vector from the two buffers, respectively. Then you need to translate your static light mesh in the vertex shader using the uniform value that is read from the stream.
I don't use fx, so I wouldn't know if/how that can be done in there (doubt it). The way it works using the API is described in http://download.nvidia.com/developer/presentations/2005/GDC/Direct3D_Day/D3DTutorial05_Instancing_FPSpecials.pdf
However, I don't think you will really get a considerable performance improvement. When rendering the light geometry, you are normally doing considerable pixel shader work, so the cost of a humble 200 API calls will probably be hidden by that latency.
Quote:Original post by Damon Shamkite
For deferred shading, you do not really do 200 "passes". Well, at least I'd not call it that. It's one pass for the geometry (with n objects) and one pass for the light volumes (with 200 light objects/instances).
If I have 200 light volumes then it has to go 200 times through VS/PS unless I concatenate some lights, right?
Quote:Original post by Damon Shamkite
Though it is unsure if you really gain from it, you could use instancing for this.
For example, if you render a z-culled light sphere for each light, you could put a static mesh into a buffer, and render it 200 times. You would also for example put 200 translation vectors (to address the simplest case only, no rotation/scale) into a second buffer and call SetStreamSourceFreq accordingly, so DX reads one complete mesh and one translation vector from the two buffers, respectively. Then you need to translate your static light mesh in the vertex shader using the uniform value that is read from the stream.
The thing is - for every pass I do wouldn't the auxiliary stream go back to the beginning again? Could I increment it even between draw calls?
Quote:Original post by Damon Shamkite
However, I don't think you will really get a considerable performance improvement. When rendering the light geometry, you are normally doing considerable pixel shader work, so the cost of a humble 200 API calls will probably be hidden by that latency.
Maybe you're right about this, the only reason I was consider it was because of I red it in several places. So sending many calls from CPU to GPU is not slowing down the process when the shader is doing considerable amounts of work? Are that because the calls are queued up so the latency doesn’t bother?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement