Sign in to follow this  

Texture arrays vs. multiple (single) textures?

This topic is 824 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have been thinking about this, but seem unable to find much information about it.

Are texture arrays really just a convenience thing or are there any real, tangible advantages to using them as opposed to multiple single-textures in certain events?

 

From what I understand, a texture array requires that all array slices have the same dimensions and colour formats, it can be bound as a single resource to a shader, and said shader can then access it as an array bound to whatever resource slot the array was bound to. However, if I create multiple single textures (that is with ArraySize = 1) and bind these to registers t0, t1 and t2, I can still declare and reference them as a 3-element array bound to / beginning at register t0 in my shader code. There also doesn't seem to be any particular difference in access speed, and I can't imagine they would be faster to write to when rendering to multiple targets either as you'd still have to create multiple render target views for each array slice in that case.

The sole thing I can think of that may be an advantage with texture arrays is that you might get away with fewer texture bind calls, but maybe it doesn't work that way under the hood anyway?

 

I'm basically trying to decide on whether it'd be worth to extend my current Texture classes to support texture arrays for cascaded shadow maps, or if I may just as well just use an array of said class on the CPU side and be done with it.

Share this post


Link to post
Share on other sites

Texute arrays are very different than allocating a contiguous block of 't' registers.

All Dx10 GPUs can do array indexing into texture arrays almost for free (similar cost to choosing mipmap level), but they have no support at all for indexing into an array to 't' regs. The shader compiler will either give an error or produce horrible code (e.g. [tt]if(idx==0)result=t0.sample(uv) else f(idx==1)result=t1.sample(uv) else...[/tt]).

On DX12 GPU's they can probably implement both fairly sensibly, but texture-arrays are still simpler -- they have a single texture descriptor (uniform/const/shared between every pixel, so stored in SGPRs) and use normal sampling instructions.
Arrays-of-textures have to load an array of texture descriptors into SGPRs, then somehow extract one (potentially different per pixel) into VGPRs, and then issue a texture fetch instruction that reads a descriptor from an VGPR address (varying per pixel). I'm not sure if that's even supported, so it might have to serialize that operation and do each pixel's texture fetch one after the other, instead of fetching data for 64 pixels in parallel.

 

[edit] Yeah, on AMD at least, the texture descriptor that's being used for the sample/load instruction has to be stored in SGPRs (aka constant/uniform memory), so it's not possible for those GPU, which normally shades 64 pixels in parallel, to fetch from a texture-descriptor that varies per pixel. If it does work, it will be at 1/64th throughput due to the serialization. See 8.2.1 Image Instructions

Edited by Hodgman

Share this post


Link to post
Share on other sites


All Dx10 GPUs can do array indexing into texture arrays almost for free (similar cost to choosing mipmap level), but they have no support at all for indexing into an array to 't' regs. The shader compiler will either give an error or produce horrible code (e.g. if(idx==0)result=t0.sample(uv) else f(idx==1)result=t1.sample(uv) else...).

Oh really, I was not aware of that, I see. I didn't notice any particular issues doing it that way, but then I did use literal array indices anyway so I guess it would be worse if using dynamic ones. 

 

Well, I suppose I should look into the texture arrays further then, thanks for enlightening me :-)

Share this post


Link to post
Share on other sites

[edit] Yeah, on AMD at least, the texture descriptor that's being used for the sample/load instruction has to be stored in SGPRs (aka constant/uniform memory), so it's not possible for those GPU, which normally shades 64 pixels in parallel, to fetch from a texture-descriptor that varies per pixel. If it does work, it will be at 1/64th throughput due to the serialization. See 8.2.1 Image Instructions


They can do it. They basically have to construct a while loop that handles each divergent case by putting the descriptor into SGPR's, and terminates once all possible cases were handled. So if you have a wavefront that uses 4 different descriptors, then the loop will execute 4 times and you'll get a max 1/4 throughput.

Share this post


Link to post
Share on other sites

This topic is 824 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this