DX10/11: Generating MipMaps for 3D Texture?

Graphics and GPU Programming Programming OpenGL

Started by XBTC December 21, 2011 12:14 AM

6 comments, last by XBTC 12 years, 4 months ago

122

Author

December 21, 2011 12:14 AM

Hi Guys,

I have a 3D Volume Texture and I want to generate MipMaps for it. I looked through the DX Documentation and the Web without finding something useful.

My questions:

1. Is there a way to let DX generate the MipMaps automatically? It seems OpenGL is able to do this...I guess no...

2. What would be the best way to generate them myself?

My idea so far: Bind the mip-levels of the 3D-Texture as Render Targets and fill them with averaged values from the higher levels via vertex/geometry/pixel shaders. But how can I bind the different MIP-Levels of a 3D Texture as RenderTargets?

Thanks in advance for any Input/Pointers,
XBTC

Tsus

1,191

December 21, 2011 12:39 AM

The device context has a method called GenerateMips. I just tested it, it works on 3D textures as well.
Note the Remarks on the site I linked.

MJP

20,295

December 21, 2011 01:08 AM

If the texture is being generate by the GPU every frame, then GenerateMips will do what you want. If the texture is something that you load once, then you can use the D3DX helper functions (such as D3DX11FilterTexture) to do it on the CPU.You don't really want to use GenerateMips if you don't have to, since it requires you creating it with USAGE_DEFAULT instead of USAGE_IMMUTABLE, and also requires you to make it a render target.

The Blog | The Book

XBTC

122

Author

December 21, 2011 07:09 PM

Thank you very much! I will try GenerateMips then...if I have to do max or min MipMaps I guess I will have to generate them myself?

Tsus

1,191

December 21, 2011 09:49 PM

if I have to do max or min MipMaps I guess I will have to generate them myself?

You mean finding the minimum or maximum value in a volume by some sort of reduction algorithm? Then yes, you’d have to do this yourself.

XBTC

122

Author

December 21, 2011 10:24 PM

Yeah! I need to build a mip-map where a texel of the higher level contains the maximum of the 8 corresponding texels in the lower level.

How would I build these mipmaps in a high performance way?
[color=#1C2837][size=2]

[color=#1C2837][size=2]

Bind the mip-levels of the 3D-Texture as Render Targets and fill them with the max-values from the lowerlevels via vertex/geometry/pixel shaders?

[color=#1C2837][size=2]

But how can I bind the different MIP-Levels of a 3D Texture as RenderTargets?

Tsus

1,191

December 22, 2011 01:11 AM

When creating a render target view on a 3D texture you have to specify the mip slice you are rendering into. Thus, you’d have to create one view per slice. The rendering would be pretty much straight-forward. In total you will end up having O(N³ log N) texture memory accesses.

However, if you only need to find the maximum (and do not need the intermediate mip maps) you can be faster, by using compute shaders for the reduction. The idea is to only do O(N³ log K) memory accesses from texture memory, whereas K < N (depending on the number of threads you’re using.) If your total number of voxels is smaller than 1024 the factor K will be 1. (1024 is the maximum number of threads available per group.) Besides you avoid the rasterizer overhead.

[subheading]The idea.[/subheading]
For simplicity I explain the idea on a 1D example (1D texture with N texels). The idea extends to 3D as well. To make things easier, we will for now assume that N is smaller than 1024. This means we can dispatch a compute shader with a single group containing N threads.
First, each thread reads its respective voxel and stores it in group shader memory. Afterwards you synchronize all thread to be sure that from now on the data will be completely available. You can now operate on the much, much faster group shared memory. (More than 100 times faster than texture or global memory, if data is not present in the cache.) But now your memory access pattern matters a lot. You get the most performance out of it, if you avoid bank conflicts.

[subheading]Bank conflicts.[/subheading]
Shared memory is organized interleaved in banks. The number of banks is vendor specific and depends on the generation of your graphics card. 16 or 32 banks are usual today. Let’s assume we have 16 banks, then the shared memory would be organized in that way:

shared [0] -> bank[0]	(32 bit = 1 word)

shared [1] -> bank[1]

shared [15] -> bank[15]

shared [16] -> bank[0]

shared [17] -> bank[1]   and so on.

A bank conflict occurs if multiple threads access different (the different only applies since Fermi) words within the same bank. (If they access the same word the access will be very fast as well. That’s a special case called broadcast.)

[subheading]The idea. (continued)[/subheading]
The following snipped avoids bank conflicts (if possible).

int i = N/2;

while (i != 0) {

  if (sharedIndex < i)

   shared[sharedIndex] = max(shared [sharedIndex], shared[sharedIndex + i]);

  GroupMemoryBarrierWithGroupSync();

  i /= 2;

}

The variable sharedIndex is a linear index of your threads (threads can be organized in 1, 2 or 3 dimensions). You can use the SV_GroupIndex for that. (Make sure you have the latest driver, there used to be a driver bug on Fermi some time ago.)
You see that with each iteration another half of the threads starts idling (not very ideal, though). We will need log(N) iterations in total. In the last iteration the last remaining thread writes the maximum of the whole sequence into shared[0]. Finally you have to store that value in a texture or constant buffer (depends on how you use it later). Only one of your threads is needed for that last step, so put an if (threadId.x == 0) around it. In total we needed only O(N³) texture memory accesses. (Every voxel was read just once.)

[subheading]More voxels than threads.[/subheading]
So, how to extend this for cases where you have more than 1024 voxels (N>1024)? Then you have to serialize your computation. Distribute all the voxels of the respective mip slice among multiple groups (for now we used only one) and then dispatch your compute shader multiple times. Each dispatch computes a new mip slice, whereas we actually skip several slices (which we stored intermediately in group shared memory.) Each iteration you’ll need fewer groups. In fact, you eventually end up needing only one group and then you’re done.

Setting this up is perfect to familiarize yourself with compute shaders.
Hope this helps you a little.

XBTC

122

Author

December 22, 2011 04:22 PM

Awesome + detailed post man!!!! Thank you very very much!!!!

Unfortunately I need the intermediate mip maps so I cannot make use of this approach right now but I might run into a situation soon where I dont need the intermediate levels.

Your post was a great read anyway as I never used compute shaders before. Now I can see their potential uses...

DX10/11: Generating MipMaps for 3D Texture?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

DX10/11: Generating MipMaps for 3D Texture?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines