The Jump to D3D11: Orthographic projection

Started by
15 comments, last by noodleBowl 10 years, 4 months ago

What would be the motivation to do this in shaders?

1) To reduce the number of constants that need to be sent to the shader (16 floats for the matrix versus only the 6 floats that can be sent to create the matrix).

2) To reduce the number of calculations in the shader - most of the values (10 of them) in the matrix are constant 1s and 0s, and the HLSL compiler will not generate code for calculating (mostly multiplying) vector or other matrices' components with those values, or it will mix them together with other constant values that are part of the same calculations, into a single constant that is only applied to the calculation once... this and other optimisations that the compiler may do, that I cannot possibly imagine. smile.png


Even if your camera moves a lot, you need to build the proj matrix once per frame. If you move the code to a vertex shader, you will be building the matrix once per VERTEX. Once per very single vertex in your scene. Or you can also put it to a pixel shader.... ;)

Hmmm. I didn't think of it this way. Good point. Please ignore my last comment. smile.png

(Though, in my defense, I was thinking of building the projection matrix in a Geometry shader, so only once per primitive...).

Advertisement


What would be the motivation to do this in shaders?

1) To reduce the number of constants that need to be sent to the shader (16 floats for the matrix versus only the 6 floats that can be sent to create the matrix).

2) To reduce the number of calculations in the shader - most of the values (10 of them) in the matrix are constant 1s and 0s, and the HLSL compiler will not generate code for calculating (mostly multiplying) vector or other matrices' components with those values, or it will mix them together with other constant values that are part of the same calculations, into a single constant that is only applied to the calculation once... this and other optimisations that the compiler may do, that I cannot possibly imagine. smile.png


Even if your camera moves a lot, you need to build the proj matrix once per frame. If you move the code to a vertex shader, you will be building the matrix once per VERTEX. Once per very single vertex in your scene. Or you can also put it to a pixel shader.... ;)

Hmmm. I didn't think of it this way. Good point. Please ignore my last comment. smile.png

(Though, in my defense, I was thinking of building the projection matrix in a Geometry shader, so only once per primitive...).

You dont need to rebuild a projection matrix at all, just create one and pass it along to the shader you need to set this the per frame cb once a frame that is all. If you start recreating your projection matrix you are just wasting CPU time each frame.

Generally you build all projection matrices you need in the code once at application initialise or level initialise and then set it when needed, you can have to update the projection matrix multiple times per frame like for example when you have a cubemap renderer or other post effects in your render pipeline. Generally you use a different projection for this then for the normal game camera.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion



You dont need to rebuild a projection matrix at all, just create one and pass it along to the shader you need to set this the per frame cb once a frame that is all. If you start recreating your projection matrix you are just wasting CPU time each frame.

Generally you build all projection matrices you need in the code once at application initialise or level initialise and then set it when needed, you can have to update the projection matrix multiple times per frame like for example when you have a cubemap renderer or other post effects in your render pipeline. Generally you use a different projection for this then for the normal game camera.

Yep, you don't. Unless you use it to simulate zoom for example.

I wrote originaly my post about the view matrix (I forgot the thread was about projection) and when I noticed the mistake, I only edited "view" to "projection" everywhere in my post, without thinking about it more in depth smile.png But my point still applies and the fact that it's about projection and not view only makes it stronger - because you don't need to rebuild this matrix at all (usually) smile.png

Feel free to correct me, but I think this what I need to do

I need to build my projection matrix once, so for this I can use the XNA Matrix Lib or D3D Matrix Lib using the OrthographicOffCenterLH functions. Then I need to, I'm assuming, store this information in Constant Buffer (cbuffer). Which will then some how go into my Vertex Shader. Right?

@Tom KQT, I'd like to mention that I was actually correct:

1) To reduce the number of constants that need to be sent to the shader (16 floats for the matrix versus only the 6 floats that can be sent to create the matrix).
2) To reduce the number of calculations in the shader - most of the values (10 of them) in the matrix are constant 1s and 0s, and the HLSL compiler will not generate code for calculating (mostly multiplying) vector or other matrices' components with those values, or it will mix them together with other constant values that are part of the same calculations, into a single constant that is only applied to the calculation once... this and other optimisations that the compiler may do, that I cannot possibly imagine. smile.png

Explanation:

The shader will contain fewer instructions if you just pass the 6 float parameters into the shader and "build" the matrix there... Take a simple mul(matrix, vector) calculation for example... In the case of an orthographic matrix, only 7 values from the matrix will be != 0, and one of them is always 1, so there are actually only 6 variables. In this case, the compiler will optimize the mul(matrix, vector) intrinsic function down to just 9 operations: 6 multiplies, 3 additions. The value from the matrix that is 1 has the effect that the vertex's w value is preserved, so no operation is required for this. To this, you have to add the number of instructions that the compiler will generate for building the 6 variable matrix values from the 6 passed-in values: D3DXMatrixOrthoOffCenterLH does this with 2 additions, 6 subtractions and 6 divisions. Add all these up, and you get a total of 23 operations, though I'm pretty sure there's still room for the HLSL compiler to optimize the D3DXMatrixOrthoOffCenterLH formula.

Now, if you pass the ortho matrix as a whole matrix of 4*4=16 floats to the shader, then the mul(matrix, vector) calculation has to treat all of those 16 values as variables, and so the compiler translates it to: 16 multiplies and 12 additions - a total of 28 operations, which is more than the 23 operations requried for the method I proposed. smile.png

Keep in mind that you need to do a mul(matrix, vector) at least once per vertex in your vertex shader.


I need to build my projection matrix once, so for this I can use the XNA Matrix Lib or D3D Matrix Lib using the OrthographicOffCenterLH functions. Then I need to, I'm assuming, store this information in Constant Buffer (cbuffer). Which will then some how go into my Vertex Shader. Right?

Yes, this is what you need to do, but instead of D3DXMatrixOrthoOffCenterLH, you can also use this HLSL function directly in your vertex shader, and instead of passing the whole matrix to the shader, you can just pass the values for the function's parameters:

float4x4 ortho_mat(float l, float r, float b, float t, float zn, float zf) {return float4x4(2.0 / (r - l), 0.0, 0.0, 0.0, 0.0, 2.0 / (t - b), 0.0, 0.0, 0.0, 0.0, 1.0 / (zf - zn), 0.0, (l + r) / (l - r), (t + b) / (b - t), zn / (zn - zf), 1.0);}

The shader will contain fewer instructions if you just pass the 6 float parameters into the shader and "build" the matrix there...

The shader will generate even fewer instructions if you don’t send any matrices at all nor do any matrix math.

The whole purpose of the projection matrix is to normalize vertices between -1 and 1 on both axes.
If you simply construct a vertex buffer with vertices already normalized in this way you don’t need to send any matrices at all nor perform any math on them inside the vertex shader.

This works exceptionally well for static 2D objects such as the HUD, and it is independent of resolution (meaning your HUD items consume the same amount of screen space as you increase or decrease the screen resolution). You can regenerate the vertex buffer if you want objects not to physically scale up or down with various resolutions upon resizing, which is reasonable performance-wise (updating a few vertex buffers once only when resizing).

For sprites that move, rotate, and scale, you need only send normalized screen offset positions, a rotation in radians, and a single scale value. This lets you fit the entire transform of the sprite into a single float4 value, decreasing demands on bandwidth substantially. XY = Normalized Translation, Z = Rotation, W = Scale.


Vertex shader:

float2 fXY = IN.xy * IN_POS_ROT_SCALE.w;
float fCos = cos( IN_POS_ROT_SCALE.z );
float fSin = sin( IN_POS_ROT_SCALE.z );
OUT.x = fXY.x * fCos - fXY.y * fSin;
OUT.y = fXY.y * fCos + fXY.x * fSin;
OUT.xy += IN_POS_ROT_SCALE.xy;
OUT.zw = IN.zw;
The sin/cos calls could be done on the CPU and sent instead of scale (if scale is 1.0) or as a second float4, but since it will only be applied to 4 vertices it will likely be much faster done on the GPU.

The above shows the worst-case scenario but you can make different shaders or shader branches when rotation is 0 or scale is 1 and avoid the sin/cos, an extra multiply, or both.



Not only is this the fastest way to handle 2D (for any platform), it has the advantage that vertex coordinates of X = -1 is always the left of the screen and X = 1 is always the far right, so keeping things proportional to various resolutions is only a matter of a simple normalization multiplier. If stretching is not desired (and why would it be?) then it only needs to account for the aspect ratio, but it still boils down to a simple X/Y normalization multiplier that can be precomputed once when the resolution changes.


And again, for static HUD images, the vertex shader is this:
OUT = IN;


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

The shader will contain fewer instructions if you just pass the 6 float parameters into the shader and "build" the matrix there...

The shader will generate even fewer instructions if you don’t send any matrices at all nor do any matrix math.

The whole purpose of the projection matrix is to normalize vertices between -1 and 1 on both axes.
If you simply construct a vertex buffer with vertices already normalized in this way you don’t need to send any matrices at all nor perform any math on them inside the vertex shader.

This works exceptionally well for static 2D objects such as the HUD, and it is independent of resolution (meaning your HUD items consume the same amount of screen space as you increase or decrease the screen resolution). You can regenerate the vertex buffer if you want objects not to physically scale up or down with various resolutions upon resizing, which is reasonable performance-wise (updating a few vertex buffers once only when resizing).

For sprites that move, rotate, and scale, you need only send normalized screen offset positions, a rotation in radians, and a single scale value. This lets you fit the entire transform of the sprite into a single float4 value, decreasing demands on bandwidth substantially. XY = Normalized Translation, Z = Rotation, W = Scale.


Vertex shader:


float2 fXY = IN.xy * IN_POS_ROT_SCALE.w;
float fCos = cos( IN_POS_ROT_SCALE.z );
float fSin = sin( IN_POS_ROT_SCALE.z );
OUT.x = fXY.x * fCos - fXY.y * fSin;
OUT.y = fXY.y * fCos + fXY.x * fSin;
OUT.xy += IN_POS_ROT_SCALE.xy;
OUT.zw = IN.zw;
The sin/cos calls could be done on the CPU and sent instead of scale (if scale is 1.0) or as a second float4, but since it will only be applied to 4 vertices it will likely be much faster done on the GPU.

The above shows the worst-case scenario but you can make different shaders or shader branches when rotation is 0 or scale is 1 and avoid the sin/cos, an extra multiply, or both.



Not only is this the fastest way to handle 2D (for any platform), it has the advantage that vertex coordinates of X = -1 is always the left of the screen and X = 1 is always the far right, so keeping things proportional to various resolutions is only a matter of a simple normalization multiplier. If stretching is not desired (and why would it be?) then it only needs to account for the aspect ratio, but it still boils down to a simple X/Y normalization multiplier that can be precomputed once when the resolution changes.


And again, for static HUD images, the vertex shader is this:

OUT = IN;


L. Spiro

If my end goal is to create a sprite batcher, since I'm only focusing on 2D, is this the best approach? I would have to update my vertex/index buffer every frame. I am worried that this will take a up lot of CPU time. Where, I believe, sending all the data to the Vertex Shader will do all the calculations on the GPU.

I'm not 100% sure how variables get passed into the Vertex Shader as well. I think this is done through the Input Layout right? So for this I would have to set up a Input Layout that looks


D3D11_INPUT_ELEMENT_DESC inputDesc[] =
{

//Pos 0: XY
//Pos 1: Z
//Pos 2: W
{"POSITION0", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0},
{"POSITION1", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
{"POSITION2", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0},
{"COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0}

};

This topic is closed to new replies.

Advertisement