Archived

This topic is now archived and is closed to further replies.

Draigan

Weird Dynamic VertexBuffer Problem

Recommended Posts

Draigan    130
I''ve been profiling my game and found some interesting and troubling results. It seems that the slow point is adding vertices to my dynamic vertex buffer. I correctly call NOOVERWRITE or DISCARD depending on how much space is in the vertex buffer. And I create it with the WRITEONLY flag. I''ve commented out all the code around my drawing routine. If I lock the vertex buffer and then unlock it, and add nothing to it, then my game runs at 1000fps. But if I do a: VB->Lock(..., (void**)&pVertex,...) pVertex->x = 1.0f; VB->Unlock() then my frame rate drops WAY down... and the more I modify the memory I''ve locked, the slower it gets. I''ve tested it on both a GeForce2Mx and a GeForce4 Ti4200 and the same result. Anyone have any ideas? When I lock, I lock enough room for 1000 vertices... and I''m not making any calls to DrawIndexedPrimitive() or anything... I''m just locking the VB and filling it with data... that''s what''s killing the performance.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Well... probably your drivers check to see if there is an alteration and, from there on, once it knows it has data it needs to compress the data...

Another bottleneck you might be facing is the fact that you reserved a lot of space, try reserving the adequate amount of space that you need for every call...

If possible don''t feed you VB every frame (but you can do it if you don''t have another solution), instead, play with a Index buffer every frame. If your making a terrain engine... well all taht ROAM and CLOD stuff has nice results but it''s come to my attention that playing around with blocks (various sizes) of terrain works better then with individual verts (not only can you fit more verts, but you can also do it many folds faster and your only problem arises with how to do transition between blocks). There''s a presentation about this on the net, but I don''t remember where... all I know is that the author was a former Bullfrog employee and worked on the earlier Populous games and Magic Carpet (he even presents some techniques they used on these games)... I think it was either a ppt or a pdf... if you need more keywords for google then I''m not sure (COMPLETELY NOT SURE) but the company name was something like clockworks or clock-something (but you should only try these words as a last resource)...

Share this post


Link to post
Share on other sites
Draigan    130
Okay, done some more testing and here''s the result of where I am..

Case 0:
DynamicVB->Lock(space for 1089 vertices, &pVertex)
DynamicVB->Unlock()
Frame rate: 2000fps

Case 1:
DynamicVB->Lock(space for 1089 vertices, &pVertex)
for(i = 0; i < 1089; i++) {
pVertex[0].x = 1.0f;
}
DynamicVB->Unlock()
Frame Rate: 2000fps

Case 2:
i = 0
DynamicVB->Lock(space for 1089 vertices, &pVertex)
for(i = 0; i < 1089; i++) {
pVertex.x = 1.0f;
}
DynamicVB->Unlock()
Frame Rate: 100fps


Anyone have any ideas what''s going on here? I''m correctly locking the VB using NOOVERWRITE or DISCARD depending on if there is room or not. It seems that filling the memory that Lock returns is killing my performance.

Share this post


Link to post
Share on other sites
Draigan    130
It''s a little too much to probably add here... not THAT much but more than I''m willing to sift through and put here. If anyone is willing to give me a little hand, I can send you a zip of all the source. It''s just a simple little terrain engine using DirectX8 and the problem seems to be with copying data to a dynamic vertex buffer.

So if anyone out there has a terrain engine going with a dynamic vertex buffer, or anything that uses a dynamic vertex buffer, I''d love to take a look and I''ve love it even more if u''d request the zip and give me a hand. This is really getting to me that I can''t figure out what the problem is.


Here''s my basic setup...

1. I create a dynamic vertex buffer
2. For each visible leaf in my quad-tree, I create those vertices, lock the VB, and upload to the vertex buffer, then unlock VB.
3. Then I render those polygons
4. Repeat for all visible terrain leaves.

You can see from my previous post that it seems that filling the vertex buffer with data is what''s killing the performance. Each leaf contains 1089 vertices (33x33) and there are a maximum of 64 leaves visible each frame. That''s 1089x64 vertices and each vertex is 40 bytes. That only works out to be 3 megs of data per frame which should be nothing for a 4x AGP Ti4200 vid-card.

HELP! (hahahah, that''s my girly-man scream for assistance)

Share this post


Link to post
Share on other sites
Draigan    130
also, I get this message in the output window of VisualC++... I don''t it serious...

Direct3D8: (INFO) :Win2K SP1 or above detected - enabling VB swap workaround

I think that Win2K without SP1 had problems with DirectX that were fixed with SP1+ and this message is just informing me of that.

Share this post


Link to post
Share on other sites
It sounds like it''s telling you that because you have Win2000 Service Pack 1 or above, it can''t swap vertex buffers the regular way and needs to do a workaround. I''m not sure what it means by swapping, but I would definitely test your program on a non-Win2K system.

~CGameProgrammer( );

Share this post


Link to post
Share on other sites
Fidelio_    122
quote:
Case 2:
i = 0
DynamicVB->Lock(space for 1089 vertices, &pVertex)
for(i = 0; i < 1089; i++) {
pVertex[ i ].x = 1.0f;
}
DynamicVB->Unlock()
Frame Rate: 100fps

Anyone have any ideas what''s going on here? I''m correctly locking the VB using NOOVERWRITE or DISCARD depending on if there is room or not. It seems that filling the memory that Lock returns is killing my performance.



When you fill a vertex buffer and unlock, it is sent to the card. If you modify it, it''s possible it has to be read back from the card, and that''s slow. It might be better to treat your VB as a writeonly buffer, and create it with the writeonly flag. Keep an array of vertices on the side, modify that and copy everything in the vertex buffer between lock and unlock.

Share this post


Link to post
Share on other sites
Kikuchiyo    122
When you call Lock() with the NOOVERWRITE flag, are you locking the whole buffer, or just the precise area you are about to write to? If you use this flag to lock an area that the GPU is trying to use, you''ll cause a stall.

You should also avoid using that 40 byte vertex format if you can. They don''t fit nicely onto the GPU''s cache lines, and can have a sizeable impact on your performance. See http://developer.nvidia.com/view.asp?IO=Vertex_Buffer_Statistics for more details.

Finally, are you filling your vertex components in the order they are stored in memory? You said that you created the vertex buffer with the WRITEONLY flag, which should ensure that it is placed in AGP memory ( assuming you didn''t also specify the SYSTEMMEMORY flag ( ick ) ). Non-sequential writes to AGP memory can be pretty slow.

Share this post


Link to post
Share on other sites
HoozitWhatzit    122
Okay, I''m going to ask a stupid question, but I''m just trying to be thorough: you mentioned that you created your VB with the D3DUSAGE_WRITEONLY flag. Did you OR this flag with the D3DUSAGE_DYNAMIC flag?

--Hoozit.

Share this post


Link to post
Share on other sites
Draigan    130
yeah, I OR''d the WRITEONLY and DYNAMIC flags. And I use the dynamic vertex buffer headers from Nvidia to test... I don''t read from the VB and I use NOOVERWRITE and DISCARD in the correct way.

If I change it to use a static vertex buffer, I get a major speed increase. I don''t get no-where near the 70% performance that dynaVB''s are supposed to give. In fact, it seems that filling the vertex buffer is my slow point. I can''t seem to get much more than 100 megs/s bandwidth even though my card is AGP4x so it should get 1000Mgs/s

I get roughly the same speeds on a GeForce2MX and GeForce4 Ti which means it''s not fillrate or triangle rate

And it''s not processor because the GeForce2Mx is running on a Celeron 850 and the Ti is on a P4-2.4Ghz Northwood with a 533-Bus.

I''m using Win2K and SP3 on both machines. Think that causes any problems?

Share this post


Link to post
Share on other sites
As I said, you are getting a warning that a Win2000-specific workaround is being used. So it is important to test on a non-Win2000 system. If it runs well, then you know your problem is Win2000-specific.

~CGameProgrammer( );

Share this post


Link to post
Share on other sites