Need Rendering Algorithm Advice

Graphics and GPU Programming Programming

Started by ClDumbass October 20, 2010 01:45 AM

8 comments, last by SuperVGA 13 years, 6 months ago

122

Author

October 20, 2010 01:45 AM

So here's the deal. I'm basically attempting to make an adventure-focused game that has a similar terrain to minecraft* (lots and lots of discrete blocks).
Currently I'm storing these with a Region object that in turn points to Chunk objects. The Region holds 64x1x64 grid of chunks which are themselves 16x256x16 arrays of unsigned shorts.
No matter what rendering/culling/etc algorithm I've tried I haven't been able to get satisfactory frame-rates with even a single type of block (so no texture changes).
The "best" one I've had so far basically calculated a cube by figuring out the smallest sphere that included the entire view and using that center as the center of a cube whose sides were 2*radius long, figured out what x,y,z ranges would be included in that box, then check if those blocks are in the aforementioned view sphere and render them if they are. This basically ends up amounting to setting up some ranges for then looping through a 3d array, then rendering each element it lands on that's visible (and actually a block, as values of "0" are nothing).

So, in short: I know I'm doing it wrong. How do I do it right?

*No, I do not intend to clone the game. I wanted to have procedurally generated environments and an *adventuring* experience system so I could travel the land beating up baddies. I kinda like how the look of minecraft works so I thought I'd clone it's graphical style :P

Technologies: SDL + glew, 1280x720, view distance ~= 30 blocks (most I can get without loads of lag)
Computer: Core i7-950, 12GB DDR3, GTX470 (in other words, should be *way* overkill for this)
Reference point: I get (at minimum) 70ish FPS on minecraft with view distance set to the farthest (which I estimate is about 200 blocks) and a viewport of ~1920x1100 (my screen is 1920x1200)

Note: I'm not too attached to SDL/glew/etc. The only requirement is cross-platform compatibility and the ability to handle tens of thousands of probably-occluded objects.

Ideas I've had: Making a second array overlay that just has flags and use one for "surrounded." In other words, pre-calculate whether the block is going to be remotely visible by checking it's 6 face-sharing neighbors for solidity. I could set the "air" blocks to this value by default to avoid a second check.

Hodgman

52,717

October 20, 2010 02:09 AM

Ignoring culling for a moment, how are you rendering the cubes? Is each one drawn individually, or do you pack multiple cubes into the same draw-call somehow?

On modern cards, the number of draw-calls can be more important than the number of triangles, and having each draw-call only draw a small number of triangles can be very inefficient.

. 22 Racing Series .

ClDumbass

122

Author

October 20, 2010 02:19 AM

At the moment each block is drawn as a CallList, but I have no good ideas on how to do a larger call list (other than making, say, each 16x16x16 chunk a call list that's static until the chunk is modified, but then I'd have to cycle call list IDs and I'm not sure how good that is).
This means it's 4 calls to draw the block (push, translate, call, pop).
If there's is a large cap to how many call lists can be active at once then I may have to investigate that route.

Fun fact: I added the "visibility" flag idea I noted in my previous post and was able to go from ~20fps on 30 block distance to ~30 on 80 block distance. *MAJOR* improvement, though still not what I'd like. My goal is 45 FPS on 300 block view distance.

SuperVGA

1,303

October 20, 2010 02:27 AM

Hello ClDumbass,

As Hodgman said, the draw calls are sometimes the bottleneck, and could
prove to be your issue here.
I suggest you perform a chunk-to-primitives algorithm on the chunks close by,
sending them into a list of vertices in a Vertex Buffer Object, storing the
geometry on the graphics card. Then it takes one call to render the contents
of that chunk, but you should chose a chunk size that makes sense in terms
of updating as well as number of draw calls.

I've extended my voxel framework to support texturing too, though I raycast
as I find it somewhat faster on more complex blocks. I.e checkerboard-alternating solid and non solid areas (would turn into many polygons and texture changes).

Just rendering a heightmap makes is also very easy this way, but I suppose
you need multiple layers (solid over empty over solid)

EDIT: Oh, you posted back before me. Yes, I suppose you could use DisplayLists also, although VBOs should faster.
Try adjusting the chunk size, and do check the neighbors, you only need polygons for the surface, not in-between cells.
A fast way to do this is to traverse a chunk along one axis, and then check
a transistion from empty to solid, and add a quad. Then check for transistion from solid to empty and so forth.
Repeat on the two remaining axes.

ClDumbass

122

Author

October 20, 2010 02:51 AM

Okay, that *sounds* good. Do you know any good places for guides on how to set that up? (I got the check-for-transitions part, but not the VBO part).
But yeah, the check-for-transitions thing should remove an average of 2/3 of the triangles (or more) on rolling-plains style terrain. This is *way* better than just removing the 100% obviously not visible blocks. Well, assuming it works :P

Hodgman

52,717

October 20, 2010 02:58 AM

Yeah, if each block is rendered with it's own draw-call, then that's pretty inefficient. Each draw call should have at least dozens of triangles in it, and preferably thousands.
You can picture the GPU like a mini-gun -- your draw call loads it full of ammo (triangles), then spins up the gun, blasts through all that ammo, and then spins down again. If you're not putting enough ammo in each time, then you'll spend most of your time spinning-up instead of firing.

Instead of using display lists, you can create a vertex buffer (VBO), or several of them (e.g. one VBO per chunk). You can dispose of the push/translate/pop calls by translating the vertices yourself before adding them to the VBO.
Once you've added several blocks to the VBO, you can draw all those blocks at once with a single draw-call.

I don't have any VBO tutorials that I can recommend, but hopefully google will be of help.

. 22 Racing Series .

ClDumbass

122

Author

October 20, 2010 03:36 AM

So are you saying that the most efficient size for the vertex buffer is essentially however big I can make it?
For example, if I made it the entire 16x256x16 chunk (which would give me 64*64 such buffers per region) would that be too much? (Keep in mind that each area has hyper-simplistic information.)
Where would the trade-off be between taking too long to update the buffer, drawing unnecessary stuff and VRAM be? (Though I suppose it's safe to assume that on most systems I'm not going to overuse VRAM with this kind of polygon information.)
Or should I just start experimenting?
Hmm... all sorts of questions...

Either way you guys are getting me much closer to on track, so thanks a bunch for that! I won't be able to do any solid coding tomorrow as I have work followed by a tabletop RPG session (Pathfinder RPG for those familiar). I probably get something done thursday and saturday nights my time.

If I can reach my goal of playable FPS at 300 blocks distant I'll be mega-happy :P

SuperVGA

1,303

October 20, 2010 04:13 AM

In some sense, yes; the more ammo the better. But some shots
will miss anyway, and you can precompute that to speed things up.
Furthermore, you need to switch between anti-personnel and anti-panzer
ammo sometimes. ...

Ok I'm done with the metaphors as I suck at them. :/

The thing is, the bigger the VBO, the longer it will take to recompute it
when it's changing (if your terrain isn't static) - You will probably
need to compute it anyways when it needs to be loaded in. Unless
of course you have a VBO cache in which case that would need to be loaded in,
again costing you precious time.

Also, even though the sides of a cell facing away from the viewer is culled,
on a larger scale with large VBOs, many things are left unculled.
For instance: if the viewer is standing in the middle of a block, the
invisible primitives will still be considered, and there may be solid cells close
by that would occlude entire blocks further away.

So what I'm saying is, that even though the graphics card like lots of data on one place,
you may be better off finding a balance between (re)loading times and your own occlusion algorithms >and< the size of the blocks pushed to the card. :)

When viewing across my 128^3 cube with a huge sphere(r=64) carved out of the center, I get approx 300fps with my optimizations on my 275 GTX.
But, if I place a smaller sphere(r=16) in the middle of this, the rays slow down around the edges and results in ~100fps.
In terrain occasions on a larger, flatter area I expect around 400, as terrain close to the viewer occludes valleys further away. But in optimal situations,
like rooms of around size 64^3 with nearly straight walls I get around 1500fps.
So it all depends... :(

[Edited by - SuperVGA on October 20, 2010 5:13:02 AM]

Heaven

601

October 23, 2010 12:07 AM

So raytracing...but tracing BLOCKS instead of pixels...wouldn't that be one affordable alternative?

Florida, USA
Current Project
Jesus is LORD!

SuperVGA

1,303

October 23, 2010 04:02 AM

Quote:Original post by Heaven
So raytracing...but tracing BLOCKS instead of pixels...wouldn't that be one affordable alternative?

Well, raycasting to start with. ;)
In case you mean blocks(of voxels) instead of single voxels, you're right, that should be affordable.
But when it comes to cases where
the blocks are not completely empty or completely solid,
it gets difficult (for me at least :/).
-Otherwise it would just be the same technique as with voxels.

Need Rendering Algorithm Advice

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Need Rendering Algorithm Advice

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines