Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Nov 2002
Offline Last Active Jul 12 2016 11:54 PM

Posts I've Made

In Topic: Shader "plugin" system?

17 August 2013 - 11:05 AM

It has been a long time since I touched any rendering related code, but I'll try to describe what I remember from my implementation.


Each surface shader (e.g. a shader that will be applied to the surface of a 3D model) can use one or more different plugins. 

Shader plugins were implemented using Cg interfaces. 


So, your example above would look something like this (pseudocode since I haven't written a line of Cg for a couple of years now).

IAmbientLighting g_AmbientLighting;
sampler2D DiffuseTex;

float4 mainPS(VS_OUTPUT i)
  float4 diffuseColor;
  diffuseColor.rgb = tex2D(DiffuseTex, i.UV.xy).rgb;
  diffuseColor.rgb *= g_AmbientiLighting.CalcAmbientLight(i.PosWS.xyz);
  diffuseColor.a = 1.0;

  return diffuseColor;

The IAmbientLighting interface would look like this: 

interface IAmbientLighting
  float3 CalcAmbientLighting(float3 posWS);

Your current shader would have used a constant ambient color implementation. Something like: 

class ConstAmbientLight : IAmbientLighting
  float4 AmbientColor;

  float3 CalcAmbientLighting(float3 posWS)
    return AmbientColor.rgb * AmbientColor.a;

If you would like to change to an SSAO implementation, instead of using this class you would use:

class SSAO : IAmbientLighting
  sampler2D SSAOTex;
  float4 AmbientColor;
  float4x4 WSToSSMatrix;

  float3 CalcAmbientLighting(float3 posWS)
    float2 screenSpacePos = TransformToSS(posWS, WSToSSMatrix);
    float ssao = tex2D(SSAOTex, screenSpacePos).r;
    return AmbientColor.rgb * AmbientColor.r * ssao;

With those two interface implementations available, the renderer is responsible for selecting the correct one at run-time, based on some criteria (user prefs, GPU caps, etc.) and linking it to all the surface shaders which use an IAmbientLighting object.


The idea can be extended to other things. E.g. different kind of lights (omni, point, directional) can be implemented as interfaces of one common ILight interface. 


This way you can create (e.g.) a Phong shader with or without SSAO, using one or more lights of any type. 


That's the basic idea. Hope it makes some sense. If not, just say it and I'll do my best to describe it better.

In Topic: Simple question regarding Android ImageButtons

17 June 2013 - 12:10 PM

Try creating one View.OnClickListener object and use the View.getID() on the passed view object.


Something like this:

View.OnClickListener listener = new View.OnClickListener()
  public void onClick(View v)
    int id = v.getID();
    case 1000:
      // button0 was clicked...
    case 1001:
      // button1 was clicked...



Hope that helps.

In Topic: Is my frustum culling slow ?

01 April 2013 - 04:34 AM

If the AABBs correspond to static geometry, translating them to world space every frame is an overkill. You should do it once at start up.


If it's about dynamic geometry, then it shouldn't be that simple when rotation is involved. If your objects rotate, you should calculate the AABB from the OBB defined by the original AABB and the object's transformation, in case you want to use the same code for all your objects. Otherwise you can find/write another function which culls OBBs against the frustum. 


In case you go about the OBB route, it might be faster to just check the bounding sphere (which isn't affected by rotation) against the frustum, at the expense of rendering a few more objects (bounding spheres tend to be larger that AABBs depending on the object they enclose). 

In Topic: Is my frustum culling slow ?

01 April 2013 - 02:27 AM


If you get an access violation as soon as you add a 4th aabb in the list it means that your aabbList array isn't 16-byte aligned. Two choices here:

1) Explicitly allocate the aabbList array to be 16-byte aligned (e.g. using _aligned_malloc()) or

2) Change the 6 _mm_load_ps() calls with _mm_loadu_ps() which allow reading from unaligned addresses.


Hope that helps.


PS. To test if an array address is 16-byte aligned you can use one of the functions found here: http://stackoverflow.com/a/1898487/2136504

E.g. Check &aabbList[(iIter << 2) + 0].center.x and if it returns true but the _mm_load_ps() fails then something else is wrong with your array.

In Topic: Is my frustum culling slow ?

31 March 2013 - 11:53 AM

I believe the changes you made in the code is the problem. To be more exact:


The original code read the centers and extends of the 4 AABBs from an array with the following layout:

c0.x, c0.y, c0.z, e0.x, e0.y, e0.z, c1.x, c1.y, c1.z, e1.x, e1.y, e1.z, c2.x, c2.y, c2.z, e2.x, e2.y, e2.z, c3.x, c3.y, c3.z, e3.x, e3.y, e3.z, ...


When the following instructions are executed, the XMM registers hold the values mentioned in their name:


// NOTE: Since the aabbList is 16-byte aligned, we can use aligned moves.
// Load the 4 Center/Extents pairs for the 4 AABBs.
__m128 xmm_cx0_cy0_cz0_ex0 = _mm_load_ps(&aabbList[(iIter << 2) + 0].m_Center.x);
__m128 xmm_ey0_ez0_cx1_cy1 = _mm_load_ps(&aabbList[(iIter << 2) + 0].m_Extent.y);
__m128 xmm_cz1_ex1_ey1_ez1 = _mm_load_ps(&aabbList[(iIter << 2) + 1].m_Center.z);
__m128 xmm_cx2_cy2_cz2_ex2 = _mm_load_ps(&aabbList[(iIter << 2) + 2].m_Center.x);
__m128 xmm_ey2_ez2_cx3_cy3 = _mm_load_ps(&aabbList[(iIter << 2) + 2].m_Extent.y);
__m128 xmm_cz3_ex3_ey3_ez3 = _mm_load_ps(&aabbList[(iIter << 2) + 3].m_Center.z); 


If we assume that the initial aabbList array is 16-byte aligned, all loads are 16-byte aligned and the instructions are executed correctly. This is the reason we are loading the XMM regs with those specific array offsets.


On the other hand, your code doesn't do the same thing. It just stores the AABBs on the stack and the layout isn't the one expected by the code. The best case scenario is that your layout is: 



c0.x, c0.y, c0.z, c1.x, c1.y, c1.z, c2.x, c2.y, c2.z, c3.x, c3.y, c3.z, e0.x, e0.y, e0.z, e1.x, e1.y, e1.z, e2.x, e2.y, e2.z, e3.x, e3.y, e3.z



1) I think you can't be 100% sure about that (e.g. that the compiler will place the centers before the extends)

2) It's not guaranteed to be 16-byte aligned.

3) Most importantly, it's not what the code expects.


If you have to read the AABBs the way you did (one element at a time) I would suggest something like this:


__declspec(align(16)) _Vector3f aabbData[8];

aabbData[0].x = ... // center0.x
aabbData[0].y = ... // center0.y
aabbData[0].z = ... // center0.z
aabbData[1].x = ... // extend0.x

And then use this array to load the XMM regs as in the original code snippet.


PS. If you try to understand what the code does with those SSE instructions, you might be able to "optimize" it and get rid of the loads and shuffles completely. This is in case you continue to read the AABB data the way you do it.