Archived

This topic is now archived and is closed to further replies.

juhaszt

Performance problem with dynamic vertex buffer

Recommended Posts

I have created a dynamic vertex buffer with the size of 1MB. dev->CreateVertexBuffer(sizeVB, D3DUSAGE_DYNAMIC, D3DFVF_MYVERTEX, D3DPOOL_DEFAULT, &pVBuffer); I fill it with the data using a memcpy() function. To fill it takes about 7msec (average value), that means a data transfer value of about 140MB/sec. That is far too slow I think, my computer works with 2X AGP speed. How is it possible that the memcpy() works soo slow when filling a dynamic vertex buffer? (my configuration: 360MHz Celeron, Geforce2 MX)

Share this post


Link to post
Share on other sites
I havent done any tests as to actual performance increases BUT IIRC the debug spew will probably reccoment that you also make the VB WRITE ONLY.

That may increase performance.

Neil


WHATCHA GONNA DO WHEN THE LARGEST ARMS IN THE WORLD RUN WILD ON YOU?!?!

Share this post


Link to post
Share on other sites
Dont copy into the vertex buffer if not absolutely necessary. Write the data directly into the buffer instead.

And make ABSOLUTELY SURE that you are using the D3DUSAGE_WRITEONLY when creating and D3DLOCK_DISCARD or D3DLOCK_NOOVERWRITE when locking. if not youäll have severe preformance problems.

( i can see now that the D3DUSAGE_WRITEONLY is missing.. yikes.. )

[Insert cool signature here]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Out of curiosity, what kind of throughput do you get writing to a sysmem VB?

Ditto on what CamelFly said: Write your vertices directly to the VB if at all possible.

FWIW, I''ve noticed no difference in write performance using D3DUSAGE_WRITEONLY, but that''s not to say it couldn''t make a difference on other hardware/drivers. (If anything, it might motivate the driver to put the VB in AGP mem vs. local mem, where you would have faster writes. But with D3DUSAGE_DYNAMIC I doubt it would put them in local mem in the first place.)

You should still definitely be using it.

Share this post


Link to post
Share on other sites
I think you are getting a little confused on AGP memory vs the AGP bus... AGP memory IS system memory (organized into a large block by a technique called GART), the graphics card can access this memory VIA the AGP bus (at fairly high speed).

You are using a celeron @360mhz, which has a 66MHZ FSB (An I would assume <= PC100 ram?) your memory to memory copies are not going to be super fast (sorry). 156MB/S is fairly respectable, given you are doing a copy (1 read 1 write) and all of the OS overhead. Memcpy can be optimized a lot (by removing safeguards and making assumptions), but that is another topic, it is the fastest non-asm option you have (save using someone elses library).

As someone mentioned, do not copy to the vertex buffer unless needed (and only lock the amount of the VB you need not 0-maxSize). Make sure you have writeonly and lock with Discard or No-Overwrite so that you do not have to wait on Lock/Unlock.


Share this post


Link to post
Share on other sites
Yes, I forgot to say that I use D3DLOCK_DISCARD when locking the vertex buffer.

Another question:

The dynamic vertex buffer will not be created in the video card''s memory, but in the AGP memory? Then rendering from a dynamic vertex buffer will be slower than rendering from a static vertex buffer, which will be created in the video card''s memory. Right?

Thanks for your answers anyway!!



Share this post


Link to post
Share on other sites
Correct, a vertex buffer stored in video memory will be substantially faster, but remember these are VB that cannot be written or read from without huge performance penalties (as the entire VB would have to be copied to and from AGP memory).

One thing i forgot to mention is that 1MB is a HUGE amount of data (almost 28000 verticies by my FVF). So again, the 7ms is actually pretty good. The optimal dynamic vertex buffer size is around 1000-2000 verticies depending on your card.

Again with a VB of that size do not lock the entire buffer, it will slow you down alot. Do you need that many verticies? (Just curious)

[edited by - Entz on January 21, 2003 9:34:19 PM]

Share this post


Link to post
Share on other sites
yes, I need even more vertices, and I created that vertex buffer just because all my vertices would not fit in one huge vertex buffer because the indices are only 16 bit long... that is because I have to lock and fill that buffer more times for one rendered pic...

Share this post


Link to post
Share on other sites
another thing:

I have a lot of vertices but they are static, so I could use static vertex buffer(s), but that would eat up my video memory. That is why I decided to use a ''small'' dynamic vertex buffer. I hope that is the right way of rendering lot of triangles so that I leave some free place for the textures in the video memory...

Share this post


Link to post
Share on other sites
Don''t use dynamic vertexbuffers for the whole scene...

I did the same mistake at first when making my engine. This was a scene with around 250 000 polygons. When using dynamic vertexbuffer my radeon could only do about 20 FPS when rendering all polygons at once. That was with texture switching. That''s just 5 million polygons per second, very far from what the radeon can do.. This was with using all tricks with dynamic buffers I had read about and using the flags correctly.

Well, I switched over to dynamic indexbuffers but static vertexbuffers. And the framerate increased dramatically in fact it went up to about 25 million polygons per second or 100fps.

But what was worse was that dynamic vertexbuffers also takes cpu power. So you it can''t do anything else than rendering the scene. And in fact when I added culling the framerate dropped, it didn''t increase as one would think. Ok, it was a primitive culling method, but still.

Well, now later I have added better culling to my engine. And it runs at about 500 fps, in the same window. A quite small window on my screen. Maximizing the app on 1600x1200 give about 200 fps. I don''t support full screen.

And finally if you still didn''t get my point, whenver it''s possible avoid dynamic vertexbuffers, they are slow and and they stall both the GPU and CPU. Dynamic indexbuffers seems to be ok though.

BTW, I wonder where ATI got their figures, with 100M/polygons per second... Well, there''s a testapp at their site that generates figures like this on my computer, so it''s not impossible...

Share this post


Link to post
Share on other sites