[DirectX 10] Cost of many shaders

Started by
7 comments, last by GoodFun 15 years, 4 months ago
Phantom stated the following in one of my earlier posts: 'Changing shader is more expensive than changing textures' Can anyone tell me what makes it more expensive? I don't quite understand the process of how textures and shaders are handled by DirectX 10 in terms of swapping them in and out of context. Also is this a limit per device? I.e. would many devices with fewer shaders be more efficent than one device with many shaders? Basically I currently have 46 shaders with 1 - 35 animation slots rendering on one device. I am wondering which one of the below is the best solution and for what reason. Also, the texture are about 8gig in size all together and they run on a system with a 1 gig graphics card and 16 gig of system RAM. Solution 1: 1 device, 46 shaders, swapping textures in and out on every request (unless the same animation slot is requested again) Solution 2: 1 device, 1026 shaders, one set of textures, updating textures on 46 shaders every 5 minutes Solution 3: 46 devices, 1 - 15 shaders each, one set of textures per shader, updating textures every 5 minutes Maybe someone can shed some light on what's happening inside of DX10 when shaders are created and textures are assigned / changed... Thanks Marcel
Advertisement
hmmm, no one with any insight on what happens when shaders and textures are created / assigned in DirectX 10???
Quote:Original post by GoodFun
hmmm, no one with any insight on what happens when shaders and textures are created / assigned in DirectX 10???

You, once you give it a try. We don't have any magic numbers up our sleeves, especially not about machines with 16gb of ram loading 8gb textures.

This is the type of thing you'll need to figure out for yourself. If you can't afford to invest the full time to check this out completely, try writing short, simple, benchmark applications to compare the techniques.

Heck, you might even learn some new stuff during the process.
Sirob Yes.» - status: Work-O-Rama.
well, it was mostly about some insight in what is happening behind the scenes in DirectX 10 when shaders and / or textures are switched. I don't necessarily believe in Trial and Error, I rather understand what's going on but so far I haven't found much material that goes into that much detail.

If you do know a good book about DirectX 10 that really dives into the topic of how things work (instead of yet another how to write games with a pinch of DirectX code in it, those I can find by the masses), then I gladly burry myself in it to see what's going on.

I understand that my application is very different from what most people here are doing, hence I was looking for some pointers on how DirectX 10 works behind the scene so I can come to my own conclusions on what should work best for my case.
The problem is that your question is not so much a DirectX detail (although I could be wrong) but is more of a driver/hardware level question. Which means it could vary from vendor to vendor, and from chipset to chipset.

This is why trial and error is indeed a useful mechanic for such determinations.
Well, I don't have any hard numbers for you, but, it seems to me that if you have 8gb of textures and 1gb of video memory, then you are going to have to start swapping textures in and out of video memory, which seems like it would be expensive to me. A shader on the other hand, is a relatively small piece of data.

Changing either a shadere or texture is bound to cause a pipeline stall (particularly the shader). Though its remotely possible that binding a texture to a sampler unused by the current shader might get around that stall.

The 46 devices solution seems totally insane, since you still have the same amount of textures/shaders and so it doesn't gain you anything but extra complication and overhead.

I'm curious as to what you are doing that needs this many textures/shaders.

I also don't get how solution 1 and 2 would really differ in performance. Since you clearly have 46 distinct tasks (the shaders), what do the rest of the 1026 shaders in solution 2 do? If they just take more textures as inputs, then its not really any different than solution 1. If they do some calculations in lieu of textures, then, it would likely be better, but, that doesn't seem to be the case.
Thanks for the responses...

Andur,
yes, I definitely have to swap textures in and out of the graphics card memory, though some textures are more likely to be used than others, so I might get more 'cache' hits in certain cases.

A little bit about what I'm doing. I'm processing scientific data and render it in real time on a request basis. Basically clients request images of the data at different zoom levels and different locations inside of the textures. This is being rendered on an dual quad core Xeons server with 16 gigs of RAM and currently a 8800GT with 1GB of Video RAM.

Approximately every 5 minutes I get a new texture for each of the 46 layers, for most of them I keep 36 textures available for rendering (3 hours), some I just keep the most recent one (hence the 1026 texture which isn't really a multiple of any of the numbers mentioned here). When I get a new texture, all the textures fall back one slot and the oldest one gets removed from the system.

As for solution 2, the shaders aren't 1026 different shader codes, they would be different instances of a total of so far 5 different shader codes. Each of the 1026 shader instances would have the parameters set for this set of data and would have the texture for that time slot assigned to it. This way, I would set a different shader before I render an image instead of having to set the shader and then set the texture in the shader.
In the first solution (which is what I'm currently using), I'm changing textures for every image I render unless it happens to be exactly the same texture as the most recent request.
And all of this is happening in a multi threaded environment with only the actual render call and shader setup being single threaded in a lock.

The 46 devices was just thrown in there in case someone felt that this was a good idea...

You say that changing the shader causes a pipeline stall, can you explain this a bit more? Somehow I think in my case this will happen on every render call anyways. It will never happen inside of a render call though as each image that gets rendered is using just one shader with one set of textures.

Hope this makes more sense... I'll look around for more information on render pipeline stalling and see if I can find some more stuff there...

again, thanks for your continued patience and help

Marcel

Well, the pipeline stall from changing a shader isn't a huge deal. There's almost certainly one caused by changing a texture or any parameter for that matter. But, since you are loading in totally different code for the shader units, I imagine that does horrible things to the pipeline.

Sounds like since you have only 5 actual different shaders, that you are best off with only creating 5 of them, and changing the textures that you are passing into them.

I don't quite get what you mean by this:
"Each of the 1026 shader instances would have the parameters set for this set of data and would have the texture for that time slot assigned to it. This way, I would set a different shader before I render an image instead of having to set the shader and then set the texture in the shader."

Since you'll have to set the shader and then set the textures no matter what.

Binding a new texture or shader really isn't all that expensive, you can easily get away with setting hundreds of textures per frame with no problems. The massive texture memory requirements that you have make this more difficult, but, since you only need the images that are requested, a good caching system should take care of this, its not like you need all 8gb per frame.

I assume you are using those cpus for something else at the same time, or that would be totally overkill :) (though you might be able to offload some of the calculations onto them)
You are correct that there are only 5 different shaders in terms of shader code. Right now the data structures are organized by input data type though and not by shader code, hence the 46 instances of shaders (comprising of the 5 different shader codes that I have).

What I mean with the 1026 shader instances is that I would keep one shader instance per type of data per animation slot. I would assign one set of textures (one or two textures) to each of those shaders and they would keep that texture during their lifetime. That way I would only specify which shader I'm using in a render request and wouldn't have to set parameters or textures as they would already have been set.

The CPUs are used to take the 8 bit output array from the off screen 8 bit unsigned render target and make PNGs out of it and then send it back to the client over a netTCP WCF endpoint. When I load the system, I do get about 80% CPU load on all 8 cores while rendering at full speed.

This topic is closed to new replies.

Advertisement