Those optimisations are really nice. I implemented them, however i got some interesting results with the assembler instruction optimisation.
fistp relies on the rounding state the fpu is currently in, with default being rounding to the nearest integer, instead of clamping towards zero. In this case you can get negative fractional parts, which however gives some nice results.
Maybe a procedural texture method of generating "modern metal with cracks" is born ;)
This article describes it way better than i can [stereopsis on fistp]
Here are the results:
Fast fBm noise 1
Fast fBm noise 2
Fast fBm noise 3 - this one is my favorite