2nd order derivative in HLSL

Started by
7 comments, last by fang 14 years, 8 months ago
Does anybody know the way to calculate the second order partial derivative of a variable to x or y coordinates (just like ddx() and ddy() for calculating first order partial derivative)? I tried ddx(ddx(var)) but the result is always 0. Any ideas?
Advertisement
You need to render the 1st-order derivatives to a texture, and use a separate pass to take their derivatives.

The hardware only calculates first-order derivatives. The original reason for the hardware to calculate them is to determine the mip level of a texture to use by observing the rate of change of texture coordinates across a pixel.

The design intent of the ddx and ddy instructions is that you may need to access the derivatives manually to implement custom sampling (for example, antialiasing procedural textures).

Niko Suni

Quote:Original post by Nik02
You need to render the 1st-order derivatives to a texture, and use a separate pass to take their derivatives.

The hardware only calculates first-order derivatives. The original reason for the hardware to calculate them is to determine the mip level of a texture to use by observing the rate of change of texture coordinates across a pixel.

The design intent of the ddx and ddy instructions is that you may need to access the derivatives manually to implement custom sampling (for example, antialiasing procedural textures).


Thanks for the suggestion. I'm trying to implement the render to texture approach, but it would be nice to be able to do it in one pass.

I knew the hardware doesn't find 2nd derivatives useful, but since it provides the generic instruction dsx to calculate the change in any register between shader executions, it should be easy to calculate the change of the changes. Here's the my code:

   float d = ddx(In.vPos.z);   Out.vColor = ddx(d);


and its assembly is:

//// Generated by Microsoft (R) HLSL Shader Compiler 9.22.949.2248    ps_3_0    dcl_texcoord v0.z    dsx r0.x, v0.z    dsx oC0, r0.x// approximately 4 instruction slots used


The last instruction calculates the change in r0.x, which in turn is the change in v0.z. That should give the 2nd derivative of v0.z. I just don't see why this doesn't work...
The delta represented by oC0 would stay a constant zero across the pixel, since the screen-space derivatives are evaluated only once for a given pixel shader invocation. The rate of change of the first derivatives (that is, second derivatives) across the pixel, therefore, is zero.

In effect, the machine code works as designed, but the end result doesn't happen to be what you want.

When you do write the first derivatives to a texture and sample that in a subsequent pass, the hardware has more data to work with; in this case, you actually have access to the first derivatives of the adjacent pixels, which are required to find the second derivatives.

The instructions aren't as generic as you seem to think. This stems from the simple usage of mip level selection for the current pixel, which is the historical reason why the hardware has the ability to calculate the first-order derivatives.

Niko Suni

This is just the way the hardware works. The derivatives are calculated using the finite difference between adjacent pixels in a quad: the hardware rasterizes groups of 4 pixels at a time partly for this reason. Since the two adjacent pixels necessarily have a partial derivative equal in magnitude but opposite in sign the second order derivative will be zero. The finite difference method used is an adequate approximation of the first order derivative for most purposes but can't be used to calculate a second order derivative.

Game Programming Blog: www.mattnewport.com/blog

Quote:Original post by Nik02
The delta represented by oC0 would stay a constant zero across the pixel, since the screen-space derivatives are evaluated only once for a given pixel shader invocation. The rate of change of the first derivatives (that is, second derivatives) across the pixel, therefore, is zero.

In effect, the machine code works as designed, but the end result doesn't happen to be what you want.

When you do write the first derivatives to a texture and sample that in a subsequent pass, the hardware has more data to work with; in this case, you actually have access to the first derivatives of the adjacent pixels, which are required to find the second derivatives.

The instructions aren't as generic as you seem to think. This stems from the simple usage of mip level selection for the current pixel, which is the historical reason why the hardware has the ability to calculate the first-order derivatives.


Thanks, I believe you are right that the hardware won't keep the derivative value between pixel (n-2) and (n-1) when it moves on to evaluate the derivative between pixel (n-1) and n, so there won't be enough information for the hardware to calculate the 2nd order derivative directly. But my point is, I SAVED the information in a temporary register r0.x, so the hardware only needs to evaluate the 1st order derivative of that register between pixel (n-1) and n.

So there's a question whether the content of a temporary register from the last invocation of PS can be kept until the hardware is ready to calculate the derivative for the next pixel. I don't know much about what's happening in the hardware, so please correct me, or if I'm too far off, please point me to some readings...
Quote:Original post by mattnewport
This is just the way the hardware works. The derivatives are calculated using the finite difference between adjacent pixels in a quad: the hardware rasterizes groups of 4 pixels at a time partly for this reason. Since the two adjacent pixels necessarily have a partial derivative equal in magnitude but opposite in sign the second order derivative will be zero. The finite difference method used is an adequate approximation of the first order derivative for most purposes but can't be used to calculate a second order derivative.


Thanks. I might have not get the picture right, but are you saying that for each pixel, the hardware grabs three neighboring pixels to form a quad with it, and run the pixel shader for each of them? Does that mean for a 10x10 frame buffer the pixel shader will actually be executed 400 times instead of 100 times?

Only the interpolant derivatives are calculated on a 2x2 grid for each pixel, and the system may even skip the calculations if you don't use the values. The rest of the pixel shader logic is run once per pixel, with the exception of D3D10.1 and up where you can specify that it runs per sample.

In practice, the hardware usually calculates approximate derivatives for a 2x2 group of pixels simultaneously, since the derivatives of adjacent pixels inside such groups are just sign-flipped versions of their neighbors. Some professional cards and/or consumer cards with max mip quality settings may calculate the true derivatives by always observing the values to the right and down at each pixel.

The hardware and/or driver generally reorder the shader instructions so the actual code that the machine executes (machine code) doesn't usually literally follow the intermediate assembly that D3D generates. The same applies to modern CPUs as well.

Niko Suni

Ok... the hardware is really different from my imagination...

Thanks for your patient explanation!

This topic is closed to new replies.

Advertisement