Back to General and Gameplay Programming

Sorting/structuring renderables and cache locality

Lemmi · 2014-09-12T08:51:26

Hi, so I'm building a graphics engine for fun, and I've been thinking about how to approach renderable sorting for the different passes (I'm doing deferred rendering). I'd heard about how you can make huge gains by sorting everything so that access is linear for each pass. The problem for me comes when I want to re-use the same renderables for several different passes during the same frame. First of all I want to start off by saying that my knowledge of how the modern CPU cache actually works is very rudimentary, so I'm mostly going off assumptions here, please do correct me if I am wrong at any point. Also don't hesitate to ask for clarifications if I'm making no sense. My current idea would be to keep a large, preallocated buffer where I store all the renderables (transforms, meshes, bundled with material and texture handles, flyweight pattern style) that got through culling each frame update. Then I would keep different index/handle "lists"(not necessarily an actual list) -- one list per render pass -- with handles or direct indices to the renderable array. This way I can access the same renderable from several different passes. I don't have to copy or move the renderables around. I'd just send in a pointer to the renderables array and then for each pass access all the relevant renderables through the index lists. This would essentially mean that I never sort the actual renderables array, only sorting the index lists for things like depth, translucency (depending on what pass). Now comes my question, would this be inefficient because I'd be essentially randomly accessing different indices in the big renderable array? The cache would have no real good way to predict where I'd be accessing next, so I'd probably be getting tons of cache misses. I just feel that despite this, it's a flexible and hopefully workable approach. How do real, good engines deal with this sort of thing? Should I just not bother thinking about how the cache handles it?

General and Gameplay Programming Programming

Started by Lemmi September 06, 2014 03:52 PM

13 comments, last by Lemmi 9 years, 7 months ago

_the_phantom_

11,263

September 09, 2014 04:46 PM

That assumes one draw call per object, plus by the time you've got to the 'sort draw calls' you'll have already done a lot of dead object removal so you should never see a 'dead' object in your draw call lists to sort by.

At the highest 'game' level you'd be tracking the game entity which any attached renderables (1 or more draw calls) are associated; when these die the renderer never sees them.

Vis-culling per "camera", again above renderer submission, takes care of visible objects for a given scene.

Only once you get beyond vis-culling do you start breaking renderables down into their draw-call components and start sorting them and routing them to the correct passes for a scene.

Norman Barrows

7,180

September 09, 2014 09:40 PM

note that sort order ought to be based on binding times.

slowest to fastest these appear to be (someone correct me if i'm wrong):

1. texture

2. mesh

3 material

4. constants

5. transforms

6. other flags

so for no alpha blend, sort on tex, then mesh, then material, then constants (perhaps?), then near to far, then draw you instances with your various transforms, setting other flags on the fly as needed (using a state manager, of course).

for alpha blend its the same, but sort far to near.

i personally don't sort on constants, as i'm not using shader code in my current project. one of the shader coders here can tell you if its worth it or not. its may depend on shader model used, as i recall, constant bind times were slow in some early shader models. but i defer to those here with more shader experience who may be able to elaborate on that point...

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Lemmi

126

Author

September 11, 2014 06:15 PM

Okay! I had no internet access for about a week (feels like a year). This is all super good advice and I will definitely take all of these into consideration when moving forward. However, it just occurred to me, I'm unsure of how I want to produce and store scene depth during the pre-render transform pass?

I'm quite sure I can't just use the normal transforms as they are, because they're probably in world-space or own-space.

Should I just as an extra step transform everything by the camera's view matrix to produce the depth from the camera's point of view and then store that and use that for sorting?

My intent is to store my depth buffer linearly, the way MJP describes in his excellent tutorials.

See, last time I did something like this, I did no sorting at all, and I just did all transforms on the vertex shader, where I'd multiply every vertex by WVP or whatever was needed. I did no sorting at all, I just let the shader sort out depth through painter's algorithm.

To roughly sketch out what I'm imagining here:

Game's pre-render update pass:

for(each renderable)

{

//EITHER THIS VVVVVV

//transform renderable to camera viewspace and get the depth from camera POV

float renderableDepth = (renderable.transform * camera.viewMatrix).z

//OR THIS VVVVV

//simply calculate a rough distance between camera and renderable. when we've reached this point, we've already culled away all the objects that are outside the camera frustum, //so it should be pretty OK?

float renderableDepth = vector3Distance(renderable.position, camera.position) //returns a length as a single float

//no matter the method, we'd finally do this

//encode the depth somewhere within the flags variable

(renderable.flags & 0x0000ffff) |= renderableDepth; //Don't pay too much attention to how I pack it. I can never remember bitshifting syntax without looking it up.

}

Then later on:

void Sort(all the renderables)

{

for(each renderable)

{

Sort based on... flag? just straight up sort on which value is lowest?

and that the lowest value also indicates the first textures/materials/meshes.. Perhaps first compare it on textures, then meshes etc, as was suggested above?

}

for(each renderable)

{

and then after it's been sorted once by the first 32bits of the flag (where we'd possibly store all those things)

we sort it again by depth?

}

I'm sorry for being so dense. ;)

I am also of course aware that I'll have to profile this to see if I even gain anything at all by sorting, but I sort of just want to try making a system of this sort either way, as I think it could be very useful to understand the techniques.

_the_phantom_

11,263

September 11, 2014 07:56 PM

For distance you'll want the second version, however instead of working out the distance stick with the squared distance as it is cheaper to calculate as it doesn't need the square root operation, and does the same job.

Lemmi

126

Author

September 12, 2014 08:51 AM

For distance you'll want the second version, however instead of working out the distance stick with the squared distance as it is cheaper to calculate as it doesn't need the square root operation, and does the same job.

You're right and I agree! I'll also go ahead and assume that my thoughts surrounding the sorting approach are at least somewhat on the right track. I'll carefully re-read all the posts before acting.

Sorting/structuring renderables and cache locality

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Sorting/structuring renderables and cache locality

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines