Sign in to follow this  

HLSL - storing geometry on the video card

This topic is 2847 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

DirectX makes it easy to init global variables in your fx file...
g_pEffect->SetMatrix( "worldViewProj", &worldViewProjection );
g_pEffect->SetTexture( "testTexture", g_pTexture );

but what if you want to store the verts, uv and vertex normals? Is their a way to dynamically allocate this on the video card? If not, that would suggest the fx file would have to be hard coded for that one mesh. Even so, how would you transfer that data to the video card? And on that level, I assume all you would need is a vertex shader that passes the data along. Would you have to do the transformations or is that handled somewhere else down the line?

Share this post


Link to post
Share on other sites
As I tried to explain in the other thread, you upload geometry to the GPU memory by creating default pool (or managed pool) vertex and index buffers. Then you set the active geometry source for the device by calling SetStreamSource (and SetIndices in case you use indices). Now, when you call the draw functions, the geometry is loaded (from the GPU memory to GPU internal registers) by the pipeline and passes thru the vertex shader one vertex at a time. The buffers themselves are in the GPU memory unless they are evicted by a memory manager (which you cannot forcibly prevent).

It is not practical to store large amounts geometry in shader constants (which store effect variables) with pre-d3d10 cards because the register space for the constants is very limited due to practical reasons. It is still somewhat limited in newer cards too, which makes the standard input assembler (read up on it 1) the preferred way to use mesh-type data.

With shader model 5.0 (d3d11 hardware), it is possible to treat vertex buffers as general data buffers in the shaders. However, you will lose the benefits of automatic post-transform cache, thus losing at least 50% of vertex throughput performance as compared to just doing it the classical way. And constant registers (constant buffers in d3d10+) are still limited in space.

1: Even though the public D3D9 pipeline diagrams do not usually include the input assembler, there is still one in legacy systems too. D3D10+ documentation explains what it is, I recommend that you read about it on the SDK.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nik02
...you upload geometry to the GPU memory by creating default pool (or managed pool) vertex and index buffers. Then you set the active geometry source for the device by calling SetStreamSource...
Now I understand what you ment. Yeah, I'm already doing that. Now that I looked closer at the SetStreamSource function, you can specify a stream number which suggests you can load different streams onto the video card. I just don't see a way to specify which stream to use when it's time to draw.

Share this post


Link to post
Share on other sites
The input assembler (which constructs primitives from the geometry streams) is partially controlled by the vertex declaration. When you create the declaration object, you can specify which vertex elements come from which stream. In this context, the "streams" are just vertex buffer contents which are read in stream style (strictly sequentially) by the IA.

See the topic "Programming One or More Streams" in the SDK docs for more info.

Note that when you use multiple streams with different read frequencies, you are effectively using hardware instancing which is a very useful technique for drawing multiple slightly differing copies of meshes. The frequency for a stream can be set with SetStreamSourceFreq.

Share this post


Link to post
Share on other sites

Out of curiosity, how efficient is this interleaving of data from multiple streams?

I know from experience it's quite fast, but let's say it would for example be useful for me to split vertex Position and Normal data into 2 seperate buffers. Can I just stick these on 2 streams without a second though, or could the interleaving (same frequency) be a serious performance hit compared to 1 continuous buffer? Does modern hardware handle this better than older (SM2) hardware?

Share this post


Link to post
Share on other sites
There is still a small overhead due to the interleaving operation. A single stream is always faster to read because the hardware doesn't need to apply any combining logic and can just put the data in the shader source registers. Think about a scenario where you asynchronously read multiple files versus reading one file. Physical memory is essentially the same, although you don't have the seek latency - the memory controller can still access only one address at a time per bank.

Combining two streams with the same frequency will cause a performance hit at render time, but in addition to the convenience factor with certain techniques, it may actually end up having the same effective performance as using a single stream because you can potentially save some driver calls (though your uploading logic must be smart).

That said, if you have (complex geometry)*(many instances), the benefits greatly overshadow the interleaving overhead.

Share this post


Link to post
Share on other sites

Thanks for your input Nik. Splitting the data up in multiple buffers would allow for both more efficient calculation and more selective uploads of fresh data in my scenario. I'm pretty confident now that the interleaving won't kill any performance gains [smile] This could be a clear win, but I guess I'll have to run some tests to make sure.

Share this post


Link to post
Share on other sites
The theoretically optimal scenario for same-frequency interleaving would be if 50% of the elements of your vertices were static and the other 50% dynamic. This way, you'll balance the bandwidth savings and utilize the memory controllers most efficiently (GPU and AGP memory working in unison).

Note that this is not a concrete recommendation, as the performance depends entirely on what else is going on in the system, as well as the system itself. Therefore, it is always wise to profile your particular app (on many systems) for bottlenecks.

Share this post


Link to post
Share on other sites

This topic is 2847 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this