Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 06 Mar 2010
Offline Last Active Sep 12 2014 02:49 AM

Posts I've Made

In Topic: Sorting/structuring renderables and cache locality

12 September 2014 - 02:51 AM

For distance you'll want the second version, however instead of working out the distance stick with the squared distance as it is cheaper to calculate as it doesn't need the square root operation, and does the same job.

You're right and I agree! I'll also go ahead and assume that my thoughts surrounding the sorting approach are at least somewhat on the right track. I'll carefully re-read all the posts before acting.

In Topic: Sorting/structuring renderables and cache locality

11 September 2014 - 12:15 PM

Okay! I had no internet access for about a week (feels like a year). This is all super good advice and I will definitely take all of these into consideration when moving forward. However, it just occurred to me, I'm unsure of how I want to produce and store scene depth during the pre-render transform pass?


I'm quite sure I can't just use the normal transforms as they are, because they're probably in world-space or own-space.


Should I just as an extra step transform everything by the camera's view matrix to produce the depth from the camera's point of view and then store that and use that for sorting? 


My intent is to store my depth buffer linearly, the way MJP describes in his excellent tutorials.


See, last time I did something like this, I did no sorting at all, and I just did all transforms on the vertex shader, where I'd multiply every vertex by WVP or whatever was needed. I did no sorting at all, I just let the shader sort out depth through painter's algorithm.


To roughly sketch out what I'm imagining here:


Game's pre-render update pass:


for(each renderable)




  //transform renderable to camera viewspace and get the depth from camera POV

  float renderableDepth = (renderable.transform * camera.viewMatrix).z



  //simply calculate a rough distance between camera and renderable. when we've reached this point, we've already culled away all the objects that are outside the camera frustum,     //so it should be pretty OK?

  float renderableDepth = vector3Distance(renderable.position, camera.position) //returns a length as a single float


  //no matter the method, we'd finally do this

  //encode the depth somewhere within the flags variable

  (renderable.flags & 0x0000ffff) |= renderableDepth; //Don't pay too much attention to how I pack it. I can never remember bitshifting syntax without looking it up.



Then later on:


void Sort(all the renderables)


  for(each renderable)


    Sort based on... flag? just straight up sort on which value is lowest?

    and that the lowest value also indicates the first textures/materials/meshes.. Perhaps first compare it on textures, then meshes etc, as was suggested above?



  for(each renderable)


      and then after it's been sorted once by the first 32bits of the flag (where we'd possibly store all those things)

      we sort it again by depth?




I'm sorry for being so dense. ;)


I am also of course aware that I'll have to profile this to see if I even gain anything at all by sorting, but I sort of just want to try making a system of this sort either way, as I think it could be very useful to understand the techniques.

In Topic: Sorting/structuring renderables and cache locality

07 September 2014 - 09:03 AM

I like the idea about packing different (small) indices into a 64bit structure. Regarding all the shaders, textures and such, I was currently thinking about using the flyweight pattern, which is pretty much what you're describing, I think. It's also what I used last time and it worked fine.



Am I to understand that you suggest I'd do something like this?:



struct Renderable


 InstanceData transform; //either pos+quat or a 4x4 matrix, I guess. possibly other things? Locally stored copy, right? because we want optimal data locality

 long key; //or potentially even a 128bit structure




struct RenderPass



  vector<shaderIndices> //or shader pointers




And then each render pass would:

1) Sort each renderable based on, for example, their transform's camera depth

2) Applying each shader to each renderable, where you'd fetch from localized and sorted mesh/texture/material arrays somewhere else, using the handles that you'd extract from the renderable 'key' variable.



Or you can sort all renderables once per frame based on their transform camera depth and THEN insert them by going from the back to the front and pushing them back into every render pass that they've been flagged for. That's probably better.



I guess that'd mean that the actual entity would also need another set of flags then, so that the renderer knows which passes I want to insert it into.



Why would I sort by the key and not the transform? To optimize texture/resource usage and bundling render calls? How would I go about doing that on the fly? Rebuilding and merging vertex buffers every frame and doing some sort of semi-instancing? Or are you meaning just sorting them by what textures/materials they use so that I can send those resources once and then render several meshes without changing anything? I can't remember if that was possible to do in directX. I'm going to be using openGL, btw, if that is in any way relevant and helps the discussion.


Edit: Oh, yeah. Of course. You want to sort by the keys because then you know that you'd constantly be accessing the closest meshes/textures/materials every time you move to the next renderable.


Btw are you implying that I should sort by both transform and key? Say, I first sort by keys, and then again within all renderables that have the identical key(unlikely) sort again by transforms?

In Topic: Sorting/structuring renderables and cache locality

06 September 2014 - 11:47 AM

smacks of premature optimization.

Yes! Okay. Thank you for this advice. I'm fully aware that premature optimization is wrong, but last time I wrote a "graphics engine", I spent 1½ years of regretting that I had made a bunch of stupid design mistakes that would be really hard to fix, so this time around I rather overthink than underthink! :)

Also, I'm a dummy. By data redundancy, do you mean that you actually have several instances of the same object in different places for better data locality?

In Topic: Deferred rendering - Point light attenuation problem

09 December 2012 - 11:27 AM

I don't have answer to your problem but your code seems complicated at some parts:

output.Position = mul(float4(input.Position, 1.0f), World);

could be written as:

output.Position = mul(input.Position, World);

If you define your input.Position as float4. It isn't necessary to provide the 4th component from the program.

float2 texCoord = postProjToScreen(input.LightPosition);
float4 baseColor = textures[0].Sample(pointSampler, texCoord);

Since you are using D3D 10 or 11 that part of the code could be replaced with

int3 Index = int3(input.Position.x,input.Position.,0);
float4 baseColor = textures[0].Load(Index);
float4 normalData = textures[1].Load(Index);


Hi! The input position needs to be casted to a float4 and have an added 1.0f to the last channel, else you get some really weird undefined behaviour unless you re-write the model class vertex struct, which I see no reason to do.

The .Load function thing was neat! Fun to learn about new things. Question: Do you know if this is faster than using the sampler, or if it brings any other advantage?

If I'm going back on the subject, I'm starting to suspect that it actually isn't the attenuation that is the problem, because I've scoured the entire net and tried so many different attenuation methods, and they all have the same problem. It's as if the depth value gets screwed up by my InvertedViewProjection.

This is what I do:

viewProjection = viewMatrix*projectionMatrix;
D3DXMatrixInverse(&invertedViewProjection, NULL, &viewProjection);

Then when I send it into the shader I transpose it. I'm honestly not sure what transposition does, so I'm not sure if it can cause this kind of problem where it screws up my position Z axis when multiplied with it.

Another oddity I found was that when I looked through the code in pix I'd get depth values that were negative, so I'd get -1.001f and other values. This doesn't seem right?

The depth is stored the normal way in the gbuffer pixel shader:
output.Depth = input.Position.z / input.Position.w;