Sign in to follow this  
matthew007101

DirectX texture fetch optimization

Recommended Posts

I have a pixel shader with a loop which runs for a fixed number of times for all pixels. Inside this loop I perform two consecutive texture sampling operations.

These two consecutive texture sampling operations causes the performance to drop to about 10 fps for 1024 iterations of the for loop.

If I comment out any of these two texture fetching operations, the frame rate goes to 650 fps!

Is there a way to improve performance?

I pasted a code snippet below from my pixel shader to give you a clearer idea:


[loop]
for(uint i=0; i<N_THETA*N_PHI; ++i)
{
const float3 vEnvSample = g_SampleBuffer.Load(i).rgb;

float3 vEnvSampleLocal, vEnvSampleWorld;
TransformUniformSample(vEnvSample, TBNInv_G2L, TBN_L2G, vEnvSampleLocal, vEnvSampleWorld, vNormalWorld);

float3 envmap = g_EnvMapTexCube.SampleLevel(PointSamplerClamp, vEnvSampleWorld, 0).rgb; // Texture sampling operation 1

float theta_half, phi_half, theta_diff, phi_diff;
convert_to_half_diff_coords(vCameraLocal, vEnvSampleLocal, theta_half, phi_half, theta_diff, phi_diff);

int3 fIndex;
getMERL_textureIndex(theta_half, theta_diff, phi_diff, fIndex);

float3 brdf = g_MERL_BRDFTex3D.Load(int4(fIndex.xyz, 0)).rgb; // Texture sampling operation 2

output.rgb += brdf * envmap * dot( vEnvSampleWorld, vNormalWorld );
}

Share this post


Link to post
Share on other sites
My first suggestion would be to compile all three versions of the shader with fxc.exe, and look at the disassembly for each. You'll probably find it's hard to take out one texture read without affecting the rest of the code that gets generated for that loop.

Also note that reading one texture based on another one can be significantly slower than two independent texture reads.

Share this post


Link to post
Share on other sites
Quote:
Original post by MJP
I don't think you're ever going to get good performance with 2048 texture samples in a pixel shader.
QFT

Commenting out one read might leave the other texture fully in the cache which would explain the vast difference between your 1x and 2x sample versions.

tbh I'm amazed you're seeing 650 fps with just the one of them. Nice job! :)

Share this post


Link to post
Share on other sites
I render my geometry, a simple geo-sphere, only once, and the rest is deferred shading. so it's fast!

More interesting is the fact that when I set my texture sizes to a very low value, such as 16x16, and set the bit depth to lowest possible, the frame rate is still around 10fps.

I even replaced the 3d and cube texture with two simple 2D texture, but nothing changed. :(

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this