Jump to content

  • Log In with Google      Sign In   
  • Create Account

hellraizer

Member Since 10 Nov 2002
Offline Last Active Today, 10:03 AM

#5086793 Shader "plugin" system?

Posted by hellraizer on 17 August 2013 - 11:05 AM

It has been a long time since I touched any rendering related code, but I'll try to describe what I remember from my implementation.

 

Each surface shader (e.g. a shader that will be applied to the surface of a 3D model) can use one or more different plugins. 

Shader plugins were implemented using Cg interfaces. 

 

So, your example above would look something like this (pseudocode since I haven't written a line of Cg for a couple of years now).

IAmbientLighting g_AmbientLighting;
sampler2D DiffuseTex;

float4 mainPS(VS_OUTPUT i)
{
  float4 diffuseColor;
  diffuseColor.rgb = tex2D(DiffuseTex, i.UV.xy).rgb;
  diffuseColor.rgb *= g_AmbientiLighting.CalcAmbientLight(i.PosWS.xyz);
  diffuseColor.a = 1.0;

  return diffuseColor;
}

The IAmbientLighting interface would look like this: 

interface IAmbientLighting
{
  float3 CalcAmbientLighting(float3 posWS);
}

Your current shader would have used a constant ambient color implementation. Something like: 

class ConstAmbientLight : IAmbientLighting
{
  float4 AmbientColor;

  float3 CalcAmbientLighting(float3 posWS)
  {
    return AmbientColor.rgb * AmbientColor.a;
  }
}

If you would like to change to an SSAO implementation, instead of using this class you would use:

class SSAO : IAmbientLighting
{
  sampler2D SSAOTex;
  float4 AmbientColor;
  float4x4 WSToSSMatrix;

  float3 CalcAmbientLighting(float3 posWS)
  {
    float2 screenSpacePos = TransformToSS(posWS, WSToSSMatrix);
    float ssao = tex2D(SSAOTex, screenSpacePos).r;
    return AmbientColor.rgb * AmbientColor.r * ssao;
  }

With those two interface implementations available, the renderer is responsible for selecting the correct one at run-time, based on some criteria (user prefs, GPU caps, etc.) and linking it to all the surface shaders which use an IAmbientLighting object.

 

The idea can be extended to other things. E.g. different kind of lights (omni, point, directional) can be implemented as interfaces of one common ILight interface. 

 

This way you can create (e.g.) a Phong shader with or without SSAO, using one or more lights of any type. 

 

That's the basic idea. Hope it makes some sense. If not, just say it and I'll do my best to describe it better.




#5070497 Simple question regarding Android ImageButtons

Posted by hellraizer on 17 June 2013 - 12:10 PM

Try creating one View.OnClickListener object and use the View.getID() on the passed view object.

 

Something like this:

View.OnClickListener listener = new View.OnClickListener()
{
  public void onClick(View v)
  {
    int id = v.getID();
    switch(id)
    {
    case 1000:
      // button0 was clicked...
      break;
    case 1001:
      // button1 was clicked...
      break;
    }
  }
}

button0.setOnClickListener(listener);
button1.setOnClickListener(listener);
...

button0.setID(1000);
button1.setID(1001);
...

Hope that helps.




#5048836 Is my frustum culling slow ?

Posted by hellraizer on 01 April 2013 - 04:34 AM

If the AABBs correspond to static geometry, translating them to world space every frame is an overkill. You should do it once at start up.

 

If it's about dynamic geometry, then it shouldn't be that simple when rotation is involved. If your objects rotate, you should calculate the AABB from the OBB defined by the original AABB and the object's transformation, in case you want to use the same code for all your objects. Otherwise you can find/write another function which culls OBBs against the frustum. 

 

In case you go about the OBB route, it might be faster to just check the bounding sphere (which isn't affected by rotation) against the frustum, at the expense of rendering a few more objects (bounding spheres tend to be larger that AABBs depending on the object they enclose). 




#5048817 Is my frustum culling slow ?

Posted by hellraizer on 01 April 2013 - 02:27 AM

@lipsryme

If you get an access violation as soon as you add a 4th aabb in the list it means that your aabbList array isn't 16-byte aligned. Two choices here:

1) Explicitly allocate the aabbList array to be 16-byte aligned (e.g. using _aligned_malloc()) or

2) Change the 6 _mm_load_ps() calls with _mm_loadu_ps() which allow reading from unaligned addresses.

 

Hope that helps.

 

PS. To test if an array address is 16-byte aligned you can use one of the functions found here: http://stackoverflow.com/a/1898487/2136504

E.g. Check &aabbList[(iIter << 2) + 0].center.x and if it returns true but the _mm_load_ps() fails then something else is wrong with your array.




#5048626 Is my frustum culling slow ?

Posted by hellraizer on 31 March 2013 - 11:53 AM

I believe the changes you made in the code is the problem. To be more exact:

 

The original code read the centers and extends of the 4 AABBs from an array with the following layout:

c0.x, c0.y, c0.z, e0.x, e0.y, e0.z, c1.x, c1.y, c1.z, e1.x, e1.y, e1.z, c2.x, c2.y, c2.z, e2.x, e2.y, e2.z, c3.x, c3.y, c3.z, e3.x, e3.y, e3.z, ...

 

When the following instructions are executed, the XMM registers hold the values mentioned in their name:

 

// NOTE: Since the aabbList is 16-byte aligned, we can use aligned moves.
// Load the 4 Center/Extents pairs for the 4 AABBs.
__m128 xmm_cx0_cy0_cz0_ex0 = _mm_load_ps(&aabbList[(iIter << 2) + 0].m_Center.x);
__m128 xmm_ey0_ez0_cx1_cy1 = _mm_load_ps(&aabbList[(iIter << 2) + 0].m_Extent.y);
__m128 xmm_cz1_ex1_ey1_ez1 = _mm_load_ps(&aabbList[(iIter << 2) + 1].m_Center.z);
__m128 xmm_cx2_cy2_cz2_ex2 = _mm_load_ps(&aabbList[(iIter << 2) + 2].m_Center.x);
__m128 xmm_ey2_ez2_cx3_cy3 = _mm_load_ps(&aabbList[(iIter << 2) + 2].m_Extent.y);
__m128 xmm_cz3_ex3_ey3_ez3 = _mm_load_ps(&aabbList[(iIter << 2) + 3].m_Center.z); 

 

If we assume that the initial aabbList array is 16-byte aligned, all loads are 16-byte aligned and the instructions are executed correctly. This is the reason we are loading the XMM regs with those specific array offsets.

 

On the other hand, your code doesn't do the same thing. It just stores the AABBs on the stack and the layout isn't the one expected by the code. The best case scenario is that your layout is: 

 

 

c0.x, c0.y, c0.z, c1.x, c1.y, c1.z, c2.x, c2.y, c2.z, c3.x, c3.y, c3.z, e0.x, e0.y, e0.z, e1.x, e1.y, e1.z, e2.x, e2.y, e2.z, e3.x, e3.y, e3.z

 

but:

1) I think you can't be 100% sure about that (e.g. that the compiler will place the centers before the extends)

2) It's not guaranteed to be 16-byte aligned.

3) Most importantly, it's not what the code expects.

 

If you have to read the AABBs the way you did (one element at a time) I would suggest something like this:

 

__declspec(align(16)) _Vector3f aabbData[8];

aabbData[0].x = ... // center0.x
aabbData[0].y = ... // center0.y
aabbData[0].z = ... // center0.z
aabbData[1].x = ... // extend0.x
...

And then use this array to load the XMM regs as in the original code snippet.

 

PS. If you try to understand what the code does with those SSE instructions, you might be able to "optimize" it and get rid of the loads and shuffles completely. This is in case you continue to read the AABB data the way you do it.




#5048590 Is my frustum culling slow ?

Posted by hellraizer on 31 March 2013 - 09:15 AM

@Hellraizer

On the plus side, the last piece of code in the post (4 boxes at a time) does it correctly

 

Now i am confused as in 4box version says:

 

// NOTE: This loop is identical to the CullAABBList_SSE_1() loop. Not shown in order to keep this snippet small.

 

where that part of code is that you mentioned.

 

What I meant is, that the calculations of (d+r) and (d-r) in the 4-box-at-a-time loop are correct.

 

When you substitute the comment you mentioned, with the loop from CullAABBList_SSE_1(), you have to fix the typo I mentioned, in order for it to be correct.

 

Hope that makes sense.




#5048485 Is my frustum culling slow ?

Posted by hellraizer on 31 March 2013 - 12:13 AM

It's been a long time since my last reply here on gamedev.net.

 

@lipsryme: Happy to know that my blog post actually helped someone smile.png Unfortunately, there seems to be an error in the code. The culling should be incorrect. Since you haven't seen it yet I'd assume that you are just rendering more objects than needed.

 

The error is in:

__m128 xmm_d_p_r = _mm_add_ss(_mm_add_ss(xmm_d, xmm_r), xmm_frustumPlane_d);
__m128 xmm_d_m_r = _mm_add_ss(_mm_add_ss(xmm_d, xmm_r), xmm_frustumPlane_d);

 

Can you spot it?

xmm_d_m_r should subtract r from d, not add it! it should be:

 

__m128 xmm_d_m_r = _mm_add_ss(_mm_sub_ss(xmm_d, xmm_r), xmm_frustumPlane_d);

 

I don't have the project anymore so I'd assume it's just a blog post typo and it didn't affect the timings.

 

On the plus side, the last piece of code in the post (4 boxes at a time) does it correctly smile.png

 

Hope this doesn't ruin your benchmarks.




#3807148 manual alternative to glulookat ?

Posted by hellraizer on 29 October 2006 - 12:05 AM

I think this would help understanding what gluLootAt() does inside.


void LookAt(const CVector3& pos, const CVector3& dir, const CVector3& up)
{
CVector3 dirN;
CVector3 upN;
CVector3 rightN;

dirN = dir;
dirN.Normalize();

upN = up;
upN.Normalize();

rightN = dirN.Cross(upN);
rightN.Normalize();

upN = rightN.Cross(dirN);
upN.Normalize();

float mat[16];
mat[ 0] = rightN.x;
mat[ 1] = upN.x;
mat[ 2] = -dirN.x;
mat[ 3] = 0.0;

mat[ 4] = rightN.y;
mat[ 5] = upN.y;
mat[ 6] = -dirN.y;
mat[ 7] = 0.0;

mat[ 8] = rightN.z;
mat[ 9] = upN.z;
mat[10] = -dirN.z;
mat[11] = 0.0;

mat[12] = -(rightN.Dot(pos));
mat[13] = -(upN.Dot(pos));
mat[14] = (dirN.Dot(pos));
mat[15] = 1.0;

glMultMatrixf(&mat[0]);
}





I hope i haven't done any mistakes :) If somebody sees something wrong with the above code, say it. I think there is an old thread talking about how gluLookAt works and an alternative, so you may find something more useful in there.
I think the above code (with the exception of using 3d vectors) is exactly what gluLookAt does (glu32.dll).

Hope it helps anyway.

HellRaiZer

EDIT : Apparently gluLookAt() does a glMultMatrix and a glTranslate in case to build the final matrix. The code above should work, because the translation is inside mat.


PARTNERS