Jump to content
  • Advertisement
Sign in to follow this  
L. Spiro

Reducing Usage of Temporary Registers in HLSL

This topic is 3082 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

As I expanded my shader I began getting compiler errors regarding the lack of free temporary registers.

What types of things can I do to eliminate this error? What types of things cause my temporary registers to disappear? Why isn’t the compiler able to re-use them?


Thank you,
Yogurt Emperor

Share this post


Link to post
Share on other sites
Advertisement
Internally all the registers are float4, so every time you use a float, you're wasting 3/4ths of a register. If you do use lots of floats, try packing them into float4 variables.


On reason that you need to reduce the number of used registers (and a case where the compiler can't help) is that the GPU tries to process as many pixels at once as possible.

For example, on the Xbox360 (because these numbers are public information), pixels are processed in 2x2 blocks (always 4 pixels at once). These 2x2 blocks are grouped into 'vectors' of 16 blocks. There's 3 ALUs, and each one tries to process 2 vectors at once.

If we multiply that out (4 pixels per block * 16 blocks per vector * 2 vectors per ALU * 3 ALUs), we can see the GPU tries to process 384 pixels at the same time. Each one of those pixels consumes temporary registers. If your pixel shader uses too many temporary registers, then the GPU won't have enough to share around.
e.g. if your shader uses 20 temporary registers, then the GPU actually needs to allocate 7680 (20*384) temporary registers in order to keep the pipeline flowing.

Share this post


Link to post
Share on other sites
I see.

Thank you for that useful information; that helped me drop my problem into the “solved” bucket.
I did not know that every local variable was a register; I was thinking in standard programming terms and wondering why it could not re-use EAX (for example).


Thank you,
Yogurt Emperor

Share this post


Link to post
Share on other sites
Actually, the compiler is written to make judicious use temporary registers especially with regards to packing. So, using a float does not waste 3 other floats worth of space. The compiler will make use of that temporary space should there be something that it can put there. It's also fairly good at reusing the temp registers.

You can control stream packing to some degree.

The best way to reduce temp usage is to do less work, or provide more data in a pre-calculated/cached form. The compiler does try to optimize repetitive calculation by using more temp registers to store them.

Occasionally there are compiler bugs that pop where temp registers aren't getting reused. Should you have such and example posting it here would help improve the compiler.

Using too many temp registers can reduce the number of threads that can run concurrently and therefore slow down your shaders. You could always use multipass techniques to keep the throughput high.

Share this post


Link to post
Share on other sites
In the end the code that the compiler generates is not what the hardware will execute. There is another step that converts the Direct3D shader byte code into the real GPU code. During this step the number of temporary registers could be change. If you want to know what the hardware does with your shader you need to download the tools from the hardware vendors.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!