Sign in to follow this  

CPU bound specifying geometry...ideas?

This topic is 2841 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi Guys, So I just finished a first crack at using VBO's in my vector map rendering engine, replacing the old style immediate mode rendering that I first implemented. The results were impressive! A test map that used to take ~45ms to render now takes ~1ms. However, I am dealing with some MASSIVE maps, and I have found that even using VBO's the larger sets are slow to render. I profiled for bottlenecks, and it would seem I am still CPU bound specifying vertex ranges for some variable-sized geometry (ie GL_LINE_STRIPs, each with variable segment counts). In the case of my filled polys I am fine because I use raw triangle lists and can make one glDrawArrays call for the entire set no matter how large. However, because I have lists of variable length line segments I am specifying each GL_LINE_STRIP with glDrawArrays, resulting in thousands of calls per render. I am looking for suggestions to get this loop off of the CPU and onto the GPU. I have read that most of the batch render functions (accepting arrays of vertex ranges) just loop in the driver doing what I am doing now. Is this true? If so I doubt that will help me at all. I'm starting to wonder if this is a case where displays lists are genuinely better than VBO's, as I could essentially run these variable size line render loops into a display list at run-time, then subsequently just execute that. Any thoughts? Thanks!

Share this post


Link to post
Share on other sites
Just as a follow-up to this, how evil would it be to create a display list containing the VBO render calls themselves? I vaguely recall reading this can be problematic but where I am just specifying VBO vertex ranges in the display list loop it seems like a potentially neat solution...

Share this post


Link to post
Share on other sites
This is more of a follow up for anyone else working with this issue. I did go ahead and try the display list of glDrawArrays(GL_LINE_STRIP) calls, but it was definitely not a good solution.

I assume this is because the display list is either not getting cached on card?

Either way it took significantly longer (~50%) to render the line strips from my display list of VBO based glDrawArrays() calls than just manually looping through the glDrawArrays myself. I'm still looking for a good way to speed this process up as a result.

Share this post


Link to post
Share on other sites
Thanks for the suggestions, guys.

I went ahead and changed my data organization so that I am still using a raw vertex VBO on the card (dups and all), but instead of looping with glDrawArrays for each line strip, I am now making a single call to glMultiDrawArrays. This has netted a performance boost, but I am STILL spending too much time on the CPU/Bus passing these arrays of vertex start indices and vertex counts for the line strips. (i.e. one layer of a map may contain thousands of individual line strips, each containing thousands of vertices, and an individual map may contain hundreds of layers).

I'm currently looking into the 'indexed line lists' Promit suggested. I assume you mean combining duplicate vertices and creating lists of indices for the lines? I'm not sure how much faster this would be though if I still have to specify a start index/vertex count for each line strip within an index VBO. It seems it is the passing of these vertex ranges that is ultimately killing performance.

I know I should combine duplicate vertices in my vertex VBO and create index VBO's. My testing has indicated I could save ~30-40% of memory usage by doing this, and other posts suggest that the hardware is better optimized for rendering from index vbo's. My problem is in the sheer amount of time it takes to do this processing on millions of vertices. My users might freak out if they have to leave their app running for hours to pre-process a maps vertices before it can be rendered. Anyone have any suggestions for quickly creating a list of unique vertices from a large list containing dups? =)

Cheers!

Share this post


Link to post
Share on other sites
Quote:
Original post by Grumple
Anyone have any suggestions for quickly creating a list of unique vertices from a large list containing dups? =)
The simplest approach happens to have decent performance.

Store each vertex in turn in a hash table (such as boost::unordered_map), replacing with a reference to the previous matching vertex whenever you hit a duplicate. A hash table has amortised O(1) lookup and insertion, so the duplicate testing is fast, and your whole operation will be O(N) with respect to the total number of vertices.

Doing better than that is going to require considerable effort (parallelisation, most likely), but it should be fast enough for most uses. How many vertices (approximately) are we talking about here?

Share this post


Link to post
Share on other sites
Well my current 'worst test case' ends up being approximately ~400MB spread over 100 VBO's. I'm using 2D vertices as large groups of the geometry share the same 'depth' into the screen so I just maintain a secondary mechanism to translate to screen depth for a given swath, then render from my VBO in '2d'. That still leaves me with as many as 50,000,000 2d vertices. I would estimate 1/4 of those are used for line rendering (versus triangles, points, etc), so maybe 12,500,000 vertices of line rendering. I should mention I would rarely be rendering this entire set, as there is some rudimentary view port clipping happening as well.

I should also mention I don't use lighting, texturing, etc, etc for any of this. It is a very basic ortho render of raw primitives.

I am testing on a relatively old card (Geforce 7300 LE 512MB) so I am hoping this will all work noticeably better on newer hardware. That's also a big part of why I want to get this line rendering GPU-bound instead of CPU/bus-bound on my machine. =)

Share this post


Link to post
Share on other sites
Quote:
Original post by Grumple
Just as a follow-up to this, how evil would it be to create a display list containing the VBO render calls themselves? I vaguely recall reading this can be problematic but where I am just specifying VBO vertex ranges in the display list loop it seems like a potentially neat solution...

While this is possible, I don't think it is gonna be of much use to anyone

Quote:
man pages
glDrawElements is included in display lists. If glDrawElements is
entered into a display list, the necessary array data (determined by
the array pointers and enables) is also entered into the display list.
Because the array pointers and enables are client-side state, their
values affect display lists when the lists are created, not when the
lists are executed.

Share this post


Link to post
Share on other sites
Things are looking up thanks to all the suggestions!

Just to wrap up this thread for anyone interested in the 'solution', I went ahead on swiftcoder's suggestion and created a vertex packing algorithm that first sorted my vertices by X (using stl::sort), then eliminated duplicates based on the re-ordered list. This was WAY faster than I expected, so I was able to include this as a 'preprocessing step' for my data.

Having switched to using packed vertex buffers and IBO's, I noticed a surprising performance increase without changing much else. I'm now rendering my largest test data set in under 10ms most of the time, and all I changed beyond using index buffers was the corresponding switch to glDrawElements from glDrawArrays.

All this is making me wonder if the line strip rendering loop was as much the bottleneck as it seemed to be from the profile...

Cheers!

Share this post


Link to post
Share on other sites

This topic is 2841 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this