Performance between half and float under SM3.0

Graphics and GPU Programming Programming

Started by HolyOdin April 28, 2013 08:58 AM

4 comments, last by MJP 10 years, 12 months ago

130

Author

April 28, 2013 08:58 AM

Is there a peroformance gain if i change some variable from fp32 to fp16??
for example:
void PSMain(VS_OUTPUT psIn,out float4 oCol :COLOR0,out float4 oCol1 :COLOR1)
{
float4 wNormal = mul(g_Instant_Constant.World,float4(psIn.oNormal,0.0f) );
float3 normal = wNormal.xyz;
normal = normalize(normal);
float4 color;
color = tex2D(g_DiffuseTex, psIn.UV0.xy);
color.a *= psIn.DifLight.a;
?
and
void PSMain(VS_OUTPUT psIn,out half4 oCol :COLOR0,out half4 oCol1 :COLOR1)
{
half4 wNormal = mul(g_Instant_Constant.World,half4(psIn.oNormal,0.0f) );
half3 normal = wNormal.xyz;
normal = normalize(normal);
half4 color;
color = tex2D(g_DiffuseTex, psIn.UV0.xy);
color.a *= psIn.DifLight.a;
?
thanks for your help

kunos

2,256

April 28, 2013 09:09 AM

in theory yes, there must be.. in practice, I haven't seen one.

Stefano Casillo
TWITTER: [twitter]KunosStefano[/twitter]
AssettoCorsa - netKar PRO - Kunos Simulazioni

belfegor

2,836

April 28, 2013 09:18 AM

I read some recent article (can find link now) and they mentioned that half is slower, it was used for old nv 5xxx FX series cards as it performed better if i remember correctly.

Hodgman

52,717

April 28, 2013 09:18 AM

On some old cards, around '04/'05/'06 maybe, then the half type did actually make your shaders run faster. Most GPUs though ignore it and treat it the same as float.

. 22 Racing Series .

21st Century Moose

13,459

April 28, 2013 10:17 AM

D3D10+ specifies full float (source), so even if running SM3 code on such hardware, you're more likely to get half mapped to float.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

MJP

20,295

April 29, 2013 07:24 AM

The only GPU's that ever supported half precision in shaders were Nvidia's FX series, 6000 series, and 7000 series GPU's. On the FX series using half-precision was actually critical for achieving good performance, since full precision came with a significant performance penalty. ATI hardware used a weird 24-bit precision internally for everything on their early DX9 hardware, since the spec for SM2.0 was somewhat loose in terms of how it defined the precision and format of floating-point operations. Later ATI DX9 hardware used full 32-bit precision for everything, since SM3.0 required IEEE compliance (or at least something much closer to it).

For SM4.0, the half-precision instructions and registers were completely removed from the specification. Using the "half" type in HLSL will cause the compiler to use full-precision instructions, and in practice no DX10 or DX11 GPU's support half-precision arithmetic internally. Weirdly enough lower-precision instructions have made a comeback in D3D11.1, primarily for mobile hardware. However in 11.1 the syntax for using it is different, you have to use types like "min16float" and "min16uint".

The Blog | The Book

Performance between half and float under SM3.0

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Performance between half and float under SM3.0

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines