Performance between half and float under SM3.0

Started by
4 comments, last by MJP 10 years, 12 months ago
Is there a peroformance gain if i change some variable from fp32 to fp16??
for example:
void PSMain(VS_OUTPUT psIn,out float4 oCol :COLOR0,out float4 oCol1 :COLOR1)
{
float4 wNormal = mul(g_Instant_Constant.World,float4(psIn.oNormal,0.0f) );
float3 normal = wNormal.xyz;
normal = normalize(normal);
float4 color;
color = tex2D(g_DiffuseTex, psIn.UV0.xy);
color.a *= psIn.DifLight.a;
?
and
void PSMain(VS_OUTPUT psIn,out half4 oCol :COLOR0,out half4 oCol1 :COLOR1)
{
half4 wNormal = mul(g_Instant_Constant.World,half4(psIn.oNormal,0.0f) );
half3 normal = wNormal.xyz;
normal = normalize(normal);
half4 color;
color = tex2D(g_DiffuseTex, psIn.UV0.xy);
color.a *= psIn.DifLight.a;
?
thanks for your help
Advertisement

in theory yes, there must be.. in practice, I haven't seen one.

Stefano Casillo
TWITTER: [twitter]KunosStefano[/twitter]
AssettoCorsa - netKar PRO - Kunos Simulazioni

I read some recent article (can find link now) and they mentioned that half is slower, it was used for old nv 5xxx FX series cards as it performed better if i remember correctly.

On some old cards, around '04/'05/'06 maybe, then the half type did actually make your shaders run faster. Most GPUs though ignore it and treat it the same as float.

D3D10+ specifies full float (source), so even if running SM3 code on such hardware, you're more likely to get half mapped to float.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

The only GPU's that ever supported half precision in shaders were Nvidia's FX series, 6000 series, and 7000 series GPU's. On the FX series using half-precision was actually critical for achieving good performance, since full precision came with a significant performance penalty. ATI hardware used a weird 24-bit precision internally for everything on their early DX9 hardware, since the spec for SM2.0 was somewhat loose in terms of how it defined the precision and format of floating-point operations. Later ATI DX9 hardware used full 32-bit precision for everything, since SM3.0 required IEEE compliance (or at least something much closer to it).

For SM4.0, the half-precision instructions and registers were completely removed from the specification. Using the "half" type in HLSL will cause the compiler to use full-precision instructions, and in practice no DX10 or DX11 GPU's support half-precision arithmetic internally. Weirdly enough lower-precision instructions have made a comeback in D3D11.1, primarily for mobile hardware. However in 11.1 the syntax for using it is different, you have to use types like "min16float" and "min16uint".

This topic is closed to new replies.

Advertisement