# Lemmi

Member

17

126 Neutral

• Rank
Member
1. ## Sorting/structuring renderables and cache locality

You're right and I agree! I'll also go ahead and assume that my thoughts surrounding the sorting approach are at least somewhat on the right track. I'll carefully re-read all the posts before acting.
2. ## Sorting/structuring renderables and cache locality

Okay! I had no internet access for about a week (feels like a year). This is all super good advice and I will definitely take all of these into consideration when moving forward. However, it just occurred to me, I'm unsure of how I want to produce and store scene depth during the pre-render transform pass?   I'm quite sure I can't just use the normal transforms as they are, because they're probably in world-space or own-space.   Should I just as an extra step transform everything by the camera's view matrix to produce the depth from the camera's point of view and then store that and use that for sorting?    My intent is to store my depth buffer linearly, the way MJP describes in his excellent tutorials.   See, last time I did something like this, I did no sorting at all, and I just did all transforms on the vertex shader, where I'd multiply every vertex by WVP or whatever was needed. I did no sorting at all, I just let the shader sort out depth through painter's algorithm.   To roughly sketch out what I'm imagining here:   Game's pre-render update pass:   for(each renderable) {      //EITHER THIS VVVVVV   //transform renderable to camera viewspace and get the depth from camera POV   float renderableDepth = (renderable.transform * camera.viewMatrix).z     //OR THIS VVVVV   //simply calculate a rough distance between camera and renderable. when we've reached this point, we've already culled away all the objects that are outside the camera frustum,     //so it should be pretty OK?   float renderableDepth = vector3Distance(renderable.position, camera.position) //returns a length as a single float     //no matter the method, we'd finally do this   //encode the depth somewhere within the flags variable   (renderable.flags & 0x0000ffff) |= renderableDepth; //Don't pay too much attention to how I pack it. I can never remember bitshifting syntax without looking it up. }   Then later on:   void Sort(all the renderables) {   for(each renderable)   {     Sort based on... flag? just straight up sort on which value is lowest?     and that the lowest value also indicates the first textures/materials/meshes.. Perhaps first compare it on textures, then meshes etc, as was suggested above?   }     for(each renderable)   {       and then after it's been sorted once by the first 32bits of the flag (where we'd possibly store all those things)       we sort it again by depth?   } }   I'm sorry for being so dense. ;)   I am also of course aware that I'll have to profile this to see if I even gain anything at all by sorting, but I sort of just want to try making a system of this sort either way, as I think it could be very useful to understand the techniques.
3. ## Sorting/structuring renderables and cache locality

I like the idea about packing different (small) indices into a 64bit structure. Regarding all the shaders, textures and such, I was currently thinking about using the flyweight pattern, which is pretty much what you're describing, I think. It's also what I used last time and it worked fine.     Am I to understand that you suggest I'd do something like this?:     struct Renderable {  InstanceData transform; //either pos+quat or a 4x4 matrix, I guess. possibly other things? Locally stored copy, right? because we want optimal data locality  long key; //or potentially even a 128bit structure }     struct RenderPass {   vector<Renderable>   vector<shaderIndices> //or shader pointers }     And then each render pass would: 1) Sort each renderable based on, for example, their transform's camera depth 2) Applying each shader to each renderable, where you'd fetch from localized and sorted mesh/texture/material arrays somewhere else, using the handles that you'd extract from the renderable 'key' variable.     Or you can sort all renderables once per frame based on their transform camera depth and THEN insert them by going from the back to the front and pushing them back into every render pass that they've been flagged for. That's probably better.     I guess that'd mean that the actual entity would also need another set of flags then, so that the renderer knows which passes I want to insert it into.     Why would I sort by the key and not the transform? To optimize texture/resource usage and bundling render calls? How would I go about doing that on the fly? Rebuilding and merging vertex buffers every frame and doing some sort of semi-instancing? Or are you meaning just sorting them by what textures/materials they use so that I can send those resources once and then render several meshes without changing anything? I can't remember if that was possible to do in directX. I'm going to be using openGL, btw, if that is in any way relevant and helps the discussion.   Edit: Oh, yeah. Of course. You want to sort by the keys because then you know that you'd constantly be accessing the closest meshes/textures/materials every time you move to the next renderable.   Btw are you implying that I should sort by both transform and key? Say, I first sort by keys, and then again within all renderables that have the identical key(unlikely) sort again by transforms?
4. ## Sorting/structuring renderables and cache locality

Yes! Okay. Thank you for this advice. I'm fully aware that premature optimization is wrong, but last time I wrote a "graphics engine", I spent 1½ years of regretting that I had made a bunch of stupid design mistakes that would be really hard to fix, so this time around I rather overthink than underthink! :) Also, I'm a dummy. By data redundancy, do you mean that you actually have several instances of the same object in different places for better data locality?
5. ## Sorting/structuring renderables and cache locality

Hi, so I'm building a graphics engine for fun, and I've been thinking about how to approach renderable sorting for the different passes (I'm doing deferred rendering). I'd heard about how you can make huge gains by sorting everything so that access is linear for each pass. The problem for me comes when I want to re-use the same renderables for several different passes during the same frame. First of all I want to start off by saying that my knowledge of how the modern CPU cache actually works is very rudimentary, so I'm mostly going off assumptions here, please do correct me if I am wrong at any point. Also don't hesitate to ask for clarifications if I'm making no sense. My current idea would be to keep a large, preallocated buffer where I store all the renderables (transforms, meshes, bundled with material and texture handles, flyweight pattern style) that got through culling each frame update. Then I would keep different index/handle "lists"(not necessarily an actual list) -- one list per render pass -- with handles or direct indices to the renderable array. This way I can access the same renderable from several different passes. I don't have to copy or move the renderables around. I'd just send in a pointer to the renderables array and then for each pass access all the relevant renderables through the index lists. This would essentially mean that I never sort the actual renderables array, only sorting the index lists for things like depth, translucency (depending on what pass). Now comes my question, would this be inefficient because I'd be essentially randomly accessing different indices in the big renderable array? The cache would have no real good way to predict where I'd be accessing next, so I'd probably be getting tons of cache misses. I just feel that despite this, it's a flexible and hopefully workable approach. How do real, good engines deal with this sort of thing? Should I just not bother thinking about how the cache handles it?

7. ## Deferred rendering - Point light attenuation problem

Yeah, I changed it to position.xy = texCoord.xy; a while back. It didn't change anything. The reason I had it that way was because I had seen it done like that in some other samples. Pretty much just changing values around and crossing my fingers at this point.
8. ## Deferred rendering - Point light attenuation problem

I have to admit, I don't really understand all of the math behind this, so please do assume that the math is wrong, I think that's for the best. I tried what you suggested and it resulted in virtually nothing being rendered whatsoever, but hey, that solved my first problem! ;) It's uh, hard to explain, but it did render if I had my camera in a very special angle and was looking at it with the edge of my screen. You are of course right about the division by zero thing, that was silly of me.