Archived

This topic is now archived and is closed to further replies.

Lord_Omega

Vertex Array Performance????

Recommended Posts

hi, I''m stuck with kind of a lame problem here... I am trying to render a "high"-polycount scene using glDrawElements(GL_TRIANGLES,...) with about 350000-400000 polys. I disabled multitexturing and other fancy stuff...only singletexturing and lighting is enabled - but still my framerate is only about 8 FPS. Note that I have a GeForce 2 TI, which should actually perform *MUCH* better. Now, I used a profiler on my app, and found out, that 99.99% of the time is spent by glDrawElements... *WHY* is that so? Isn''t the GPU supposed to render independent of the CPU? Any suggestions? Of course there''s compiled arrays, tri-stripping, and NVidia VAR arrays...but do these make *so much* of a difference? thx in advance for your help! PS: Don''t tell me a GF2 sucks...i know that already :-) I''m about to get a new ATI actually... PPS: btw, DevPartner Profiler really sucks...

Share this post


Link to post
Share on other sites
I had the same problem as yours but not with this kind of scene and with a worse card than yours.

The only way you can achieve CPU - GPU parallel proeccesing is by using VAR and VBO. Using simple vertex arrays make, glDrawElements must finish before returning to your app. With VAR the funstion call is faster (much faster), and this way you can calculate something else on the CPU while the GPU is rendering your triangles. Optimized indexed triangle lists or strips with VAR is the best thing you can have. I think so at least I haven''t found anything faster yet.

Look at my post a little down the page "VAR + SwapBuffers" for code (by Yann L) on how to setup the arrays. You may have a little problem with SwapBuffers then, because someway you must flush the vertex buffers, and see that the most time is spent on that function call, but there is nothing you can do about it

Hope this helps

HellRaiZer

Share this post


Link to post
Share on other sites
as a (probably not so correct) way to look at it:

place your data in agp or video memory and the card can fetch it by itself. thats only possible via var/vbo

if your stuff is in system memory the cpu has to handfeed your card, because it cant get there on its own.

and last: make sure drawelements renders a "good" amount of geometry. a "perfect" value might depend on your card and even the driver version, so just make sure its more than a handful (500+)

Share this post


Link to post
Share on other sites
quote:

Using simple vertex arrays make, glDrawElements must finish before returning to your app.



Not entirely true. Since according to the spec you are free to modify your data after the glDrawElements call, the driver has to copy the data into a "safe" zone; but it''s free (and that''s what it will generally do) to return before the rendering is finished.

What''s true however is that, if your data is already in video memory, the data doesn''t have to be uploaded, hence the glDrawElements call is faster.

Y.

Share this post


Link to post
Share on other sites
thx for your quick response :-)

quote:

and last: make sure drawelements renders a "good" amount of geometry. a "perfect" value might depend on your card and even the driver version, so just make sure its more than a handful (500+)


currently, I''m calling the function a couple of times with about 30000 polys (so that''s quite a value) .

Concerning VAR arrays:

If I got that right, you''re saying that VAR helps me persist the vertex data directly on the hardware. But isn''t that exactly the same, what compiled vertex arrays do? Or am I missing something? Maybe this is a really stupid question :-)...but i always thought that VAR arrays only help synchronizing for complex rendering (through fences).

greetz OmegaSquad

Share this post


Link to post
Share on other sites
What CVAs suppose to do is not the same as VAR.

If i''m correct, when you lock an array, you say to the driver that the data pointed by the arrays, will not change until you unlock it again. Also you must unlock the arrays, in case to complete rendering. Or this is because you can have one locked array at a time? I don''t remember exactly.

When you use VAR, you are free to change the data in the middle of rendering. Effect will take place when your next call uses those data. What you gain from VAR is (as Ysaneya said) is that you get rid of the copy-to-safe-place (uploading) thing, hence VAR is faster than simple VAs. This way the data is in a place where the driver/card has faster access than system memory, and its easier for it (the GPU) to access them. That''s why you have "parallel proccesing". The problem with simple VAs is, that while the copy-to-safe-place takes place between the driver and your app, the GPU had already begun to proccess vertices. But all the time is spent on the copy process.

Finally, as nVidia say in one of their papers on VAR, fences are needed only on memory-limited situations. I don''t know what that means!

I''m not really sure for the above stuff. Just experience. But i think we all say the same things again and again. I quit!

HellRaiZer

Share this post


Link to post
Share on other sites
quote:
Original post by HellRaiZer
When you use VAR, you are free to change the data in the middle of rendering. Effect will take place when your next call uses those data.


Nah, that''s the behaviour of standard VAs. They synchronize for you, but require the annoying and CPU hogging copy. With VAR, you have to do the sync yourself. Simply put:

With standard VAs, glDrawElements() will copy the supplied data to an internal memory cache. It will initiate the rendering of that internal data copy, and return from the call. Since the data you provided will not be accessed anymore by the hardware, you are free to change it at will. The hardware will still render, but on the internal copy. The problem with that approach is the copy operation, which takes a lot of performance.

With VAR, the hardware will directly operate from the data you supplied. glDrawElements() will simply initiate a DMA fetching from your supplied data, and return from the call. No copy is made. That''s a lot faster, of course. But now your data isn''t safe anymore, as the GPU is accessing it. If you modify the data after a glDrawElements() call, you''ll get undefined results. The GPU might just be reading it, after all it''s working in parallel.

If you want both, ie. you want the GPU to fetch your data via DMA (and avoid the copy overhead), and still want to modify the data on the fly in the same frame, then you need fences to synchronize your updates with the GPU. Fences are used by the GPU as a way to tell you: "OK, I''m done with that part of the data, now you can safely modify it". This scenario is called "vertex streaming". Newer extensions, such as VBO, will hide the synchronization in the driver.

Share this post


Link to post
Share on other sites
Thx a lot for your help.
I''ve looked at your other thread @ HellRaizer...and looked at your benchmarks (the sphere thingy) - and figured that mine (using standard VAs) weren''t close to 3M tris/sec on same resolution...but I have a Geforce2! Now, I''ve looked through all my initialization code again (Rule Nr.1: It''s not the card that''s a moron - it''s the programmer!), and came along

glEnable(GL_NORMALIZE)

Well...that''s it. Disabling that gives a huge jump in framerate :-) I seriously hate having a single line of code mess up everything.
But now I''m definitely going to change to VAR/VBO.
Thx a lot again

Share this post


Link to post
Share on other sites
As you can see on the top of the benchmark log, i said that no lighting have been used in any of them. So using GL_NORMALIZE, wasn''t something i should have included.

To say the truth, i''ve never used hardware lights seriously in my code. The only experience i have about hw lights, is from when i was starting learning OpenGL, while i was reading the redbook! So GL_NORMALIZE is a "unknown" cap for me

HellRaiZer

Share this post


Link to post
Share on other sites