Jump to content

View more

Image of the Day

Adding some finishing touches...
Follow us for more
#screenshotsaturday #indiedev... by #MakeGoodGames https://t.co/Otbwywbm3a
IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.


Sign up now

How will fp16 affect games

4: Adsense
  • You cannot reply to this topic
7 replies to this topic

#1 JoeJ   Members   

2487
Like
2Likes
Like

Posted 19 May 2017 - 11:54 AM

Personally i'm excited about upcoming Vegas double rate fp16 capability.
In my realtime GI project i can use it probably for totally everything, including position.
I do not expect wonders. We already store compressed data so there is no bandwidth to win, but it should be a big speed up.

I guess the same is true for any renderer as well, but i'm no expert here.
What do you think? Do you plan to use it or already have experience you can share?
What expections do you have, where are the limitations?


I wonder why the feature is not in Scorpio, while PS4Pro already has it.
(Assuming a game uses equal amounts of fp32 and fp16 calculations, both have the same GFolps. If it uses mainly fp16, PS4Pro wins!)
I'm not sure if consumer Volta will have faster fp16 than fp32.
Also we read a lot that fp16 is an AI thing and not for games, which IMHO is total nonsense. Isn't it?

#2 ErnieDingo   Members   

560
Like
1Likes
Like

Posted 19 May 2017 - 04:38 PM

Having read a little this comment on read it sort of sums up alot of FP16 simply.

 

 

FP 16 is less accurate with just 5bits for the exponent and 10 bits for the fraction. So less smaller numbers and a greater distance between high numbers. And the maximum number is also way smaller than it is for FP32.         

 

Anything around geometry, im not sure it has a real practical use, im sure Hodge might have some nefarious way of using it.   But I could see some benefits in local co-ordinates systems, I know some people working on long running popular commercial games where there are vast distances and this does play a part.  So maybe performance here.  

Maybe useful on Texture UV's also (one I can see being very useful).  Small numbers most of the time.

I would like to know though how well FP16 and FP32 play together, because it might be a similar story to single to double precision where there is a hit to do the cast.  So, it might be limited in what real applications you can use FP16 for.


Indie game developer - Game WIP 

 

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

 

Insane Software Facebook


#3 Hodgman   Moderators   

50857
Like
6Likes
Like

Posted 19 May 2017 - 11:00 PM

*
POPULAR

It's useful for colour/lighting calculations, but yeah, not so great for geometry calculations.

This used to be a thing back at the dawn of programmable shaders. Anyone who writes shaders for Unity will be aware of the choice between float, half and fixed data types, with float being float32, half being float16, and fixed being 10, 11 or 12 bit signed fixed point...

On recent PC's these all just get compiled as float32... but back on the GeForce FX, it actually did have float16 computation hardware too. These new GPU's are just bringing it back into style again :D

Old desktop PC's also used to have ~8 FP32 interpolators (VS->PS varying slots) and ~2 10/11/12bit fixed point interpolators, which were usually used for passing non-HDR colour information from the VS to the PS, so Unity's fixed keyword actually would do something when used for this kind of variable.

The real reason that Unity still has these three data types though is because mobile GPU's didn't follow the same evolution as desktop ones :D



#4 TheChubu   Members   

9377
Like
0Likes
Like

Posted 20 May 2017 - 01:30 PM

GeForce FX, it actually did have float16 computation hardware too. These new GPU's are just bringing it back into style again

 This is what I was thinking, read the thread title and I was like "FP16? Again?". Also quite important in mobile GPUs. Haven't kept up with the GPU news lately...


Edited by TheChubu, 20 May 2017 - 01:31 PM.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#5 pcmaster   Members   

967
Like
1Likes
Like

Posted 20 May 2017 - 02:05 PM

It can be quite useful on PS4 Neo, I expect the same on new PC HW. You just have to know what you're doing and not mix it with fp32 (too much). And profile to see if it's better than with fp32.

The yield can be savings in interpolators (VS->PS param cache), double rate of (some) ALU, maybe even lower register pressure? It's totally good for games, e.g. HDR colour computations where you don't need that much precision but also anything else.



#6 JoeJ   Members   

2487
Like
0Likes
Like

Posted 20 May 2017 - 03:23 PM

I hope the register count doubles but also LDS if we use it for fp16 data (we would get 100% occupancy for almost everything).
Conversion with fp32 should be no issue. There might be a instruction that does this in one cycle, or even adding fp16+fp32 with free conversation.

I guess something like a deferred lighting pass could be done entirely with fp16.
But skinning a detailed character in fp16 and adding the result to a fp32 offset? Probaly too bad artefacts.

#7 Hodgman   Moderators   

50857
Like
0Likes
Like

Posted 20 May 2017 - 06:52 PM

I hope the register count doubles but also LDS if we use it for fp16 data

It's already common to compress data going in/out of LDS in cases where it avoids a spill to memory - the f32tof16 and f16tof32 intrinsics let you do the conversation and pack 2 floats into a 32bit LDS variable  :wink:


Edited by Hodgman, 21 May 2017 - 12:38 AM.


#8 JoeJ   Members   

2487
Like
0Likes
Like

Posted 21 May 2017 - 02:15 AM

I hope the register count doubles but also LDS if we use it for fp16 data

It's already common to compress data going in/out of LDS in cases where it avoids a spill to memory - the f32tof16 and f16tof32 intrinsics let you do the conversation and pack 2 floats into a 32bit LDS variable :wink:


I've done this a lot and since that i think fp16 would do for me even for positions.
OpenCL is more flexible here, it allows to un/pack a single value to hi or low word. We often use 1 or 3 numbers and want the remaining 16 bits for other stuff so that's very nice to have.
To get this in GLSL i made my own conversation routines but i can't remember if this was slower than using the built in functions with one zero and masking.

Remainds me on how much better OpenCL C is than shading languages.
OpenCL has pointers so you can use the same LDS memory for float4* and later for int2* or half*
With native fp16 the need for this in shading languages will become just more obvious.

Khronos plans to merge CL & VK, but i hope someone builds an open source GLSL replacement on top of SPIR-V before that happens... :)