SetStreamSource() [Updated Benchmark]

Started by
8 comments, last by c t o a n 20 years, 5 months ago
When using set stream source, you must specifiy an offset into the vertex buffer and a vertex stride. Now, if you call SetStreamSource() with the same vertex buffer twice, but with different strides, does this induce the same performance cost as calling SetStreamSource() twice with two different vertex buffers? I merely need to change the vertex stride, but not the vertex buffer. If there is another way of doing this (besides SetStreamSource()) or have DirectX use it's calculated (from FVF or shader declaration) stride instead (which would work as well), please feel more than free to tell me! Chris Pergrossi My Realm | "Good Morning, Dave" [edited by - c t o a n on November 8, 2003 1:49:01 AM]
Chris PergrossiMy Realm | "Good Morning, Dave"
Advertisement
well the docs say that the stride should match the size of the FVF. Now if both your FVFs are the same size, I would assume you dont need to change the stride at all. But why do you have 2 different vertex types in the same vertex buffer? isnt that against the "rules"

dunno how SetStreamSource acts internally in the case of only a stride being changed, but I''d guess that since the actual vertex buffer is the same d3d would filter the call. That is unlesss d3d compares strides *as well* as strides

anyway, so...why 2 strides in same vb? are they really different sizes of just different formats?

:::: [ Triple Buffer ] ::::
[size=2]aliak.net
Well, DirectX does nothing to prevent you from putting more than one vertex type in a buffer (in fact, I''d say it implicitly encourages you to do it), and these are potentially different format AND different size vertices. This is meant to reduce the amount of processor blocking time required to perform the vertex buffer switch, simply because you can store all the data in one buffer. I made the buffer object versatile enough to wrap one vertex type just as easily as 20, so it''s up to the programmer how to organize the data (read: flexibility).

I guess I''ll write a simple app that benchmarks this ability, I suppose by switching vertex buffers between two different buffers, say, 500 times, then changing the vertex stride of the buffer but not the buffer itself 500 times. Perhaps I''ll call DrawPrimitive() as well, so we know the buffer is getting referenced. Hmm...Thanks for your time. I''ll post the results of this benchmark when it''s finished.

Chris Pergrossi
My Realm | "Good Morning, Dave"
Chris PergrossiMy Realm | "Good Morning, Dave"
cool idea,
please let us know what you come up with
Ok, I''ve finished the benchmarking program, and I have the results. Here are the two tests I performed, with one switching between two buffers (rendering primitives after each switch, to make sure the buffers are used), and the other changing vertex stride and vertex offset (in SetStreamSource()) but using the same vertex buffer (so there are two vertex types in the same buffer in the second test). Here''s the source code for the two tests:

double RunSwitchTest( u32 NumTimes, IDirect3DDevice9* Device, IDirect3DVertexBuffer9** Buffers ){	s64 Start, Stop;		Start = GetTime();	for( u32 k = 0; k < NumTimes; k ++ )	{		Device->SetFVF( D3DFVF_XYZ | D3DFVF_DIFFUSE );		Device->SetStreamSource( 0, Buffers[ 0 ], 0, sizeof( Type1 ) );		Device->DrawPrimitive( D3DPT_TRIANGLELIST, 0, 34 );		Device->SetFVF( D3DFVF_XYZ | D3DFVF_DIFFUSE | D3DFVF_TEX2 );		Device->SetStreamSource( 0, Buffers[ 1 ], 0, sizeof( Type2 ) );		Device->DrawPrimitive( D3DPT_TRIANGLELIST, 0, 40 );	}	Stop = GetTime();	cout << setw( 6 ) << ConvertToMSec( Stop - Start ) << " ";	return ConvertToMSec( Stop - Start );}double RunStrideTest( u32 NumTimes, IDirect3DDevice9* Device, IDirect3DVertexBuffer9** Buffers ){	u32 Start, Stop;	Start = GetTime();	for( u32 k = 0; k < NumTimes; k ++ )	{		Device->SetFVF( D3DFVF_XYZ | D3DFVF_DIFFUSE );		Device->SetStreamSource( 0, Buffers[ 2 ], 0, sizeof( Type1 ) );		Device->DrawPrimitive( D3DPT_TRIANGLELIST, 0, 34 );		Device->SetFVF( D3DFVF_XYZ | D3DFVF_DIFFUSE | D3DFVF_TEX2 );		Device->SetStreamSource( 0, Buffers[ 2 ], sizeof( Type1 ) * 102, sizeof( Type2 ) );		Device->DrawPrimitive( D3DPT_TRIANGLELIST, 0, 40 );	}	Stop = GetTime();	cout << setw( 6 ) << ConvertToMSec( Stop - Start ) << endl;	return ConvertToMSec( Stop - Start );}

I tried to be as consistent as possible, so the same # of primitives are drawn in the first test as the second, and only SetStreamSource() is different between the two. The results I''ve come up with, by setting NumTimes = 10000, I get the following results (averaged over many, many iterations):

Test 1 (Switch Test): avg 521.406311 ms
Test 2 (Stride Test): avg 393.875275 ms

So there IS a clear cut difference between the two. I''m going to now try the test without the DrawPrimitive() code:

Test 1 (Switch Test): avg 2.815491 ms
Test 2 (Stride Test): avg 2.790567 ms

With the same NumTimes variable. Therefore, the call to SetStreamSource() doesn''t change THAT much with the difference in vertex buffers, but the call to DrawPrimitive() afterwards DOES change depending on whether you''re using the same buffer or not. I''m guessing this might have to do with cache coherency or possibly DirectX loaded the entire vertex buffer in the background while drawing the first call in the second test, which means it was all ready to go for the second call... This might need some more testing, but for the moment, it''s definately faster to use the same vertex buffer and call SetStreamSource() with the appropriate byte offset and vertex stride then use two different buffers and call SetStreamSource() with the same byte offset and different stride.

Thanks for your help guys.

Chris Pergrossi
My Realm | "Good Morning, Dave"
Chris PergrossiMy Realm | "Good Morning, Dave"
If your two vertex types are roughly equal in size, it might be worthwhile to pad out the smaller one so they are exactly equal in size.

quote:Thanks for your help guys.


no.

.... thank *you*

:::: [ Triple Buffer ] ::::
[size=2]aliak.net
oh this is really cool,
there is probably some maximum size where there is not more to gain, but this seems really interesting. wonder how changing textures in between affects it, if any.

another thought: if you have X objects in one stream, and the culling procedures tells me not to render the first X-1 ojects, this will probably degrade performance, or? so, if you want to do this maybe try to put static things in there that are close together, or something.

well just some thought, anyone knows the answer to that?
Well, that's why I did the second set of tests, to show that SetStreamSource() with parameters 2 and 3 other than zero is only slightly (read: minimally) slower than SetStreamSource() with unique parameters. This means that if you have lots of vertex types in one buffer, but only need to render one vertex type (and lose out on the vertex caching benefits) it will only be about 2/1000 / 10000 = 2e-7 of a second slower then in a single vertex type buffer (because the Draw[Indexed]Primitive() call doesn't change between the two cases). Sorta like a win-win, isn't it?

EDIT: missing some zeros...

Chris Pergrossi
My Realm | "Good Morning, Dave"

[edited by - c t o a n on November 10, 2003 2:53:56 AM]
Chris PergrossiMy Realm | "Good Morning, Dave"
yes it sure is,
i missed that... ^_^

This topic is closed to new replies.

Advertisement