The original Perlin noise function that is used is Perlin's Improved noise function.
Of course, a speed increase of x7 means that we're going to loose some nice properties / accuracy of the original noise, but everything is a trade-off in life, isn't it ? Whether you want to use the fast version instead of the original version is up to you and depends on if you need performance over quality or not..
On a Pentium 4 @ 3 Ghz, Improved noise as implemented directly in C++ from the code linked above takes 7270 milliseconds to generate 1024x1024 samples of fBm noise ( the basis function being Improved noise ) with 12 octaves. In other words, for each sample, the Improved noise function is called 12 times. I'm using the 3D version of the algorithm. I ran the test 8 times and averaged the results to get something relevant.
The first thing that can be improved in the Improved noise are the casts from float-to-int:
const TInt xtr = floorf(xyz.x);
const TInt ytr = floorf(xyz.y);
const TInt ztr = floorf(xyz.z);
Those are really, really.. REALLY bad. Instead you can use a bit of assembly:
__forceinline TInt __stdcall MFloatToInt(const TFloat x)
__asm fld x
__asm fistp t
and replace the floor calls by:
const TInt xtr = MFloatToInt(xyz.x - 0.5f);
const TInt ytr = MFloatToInt(xyz.y - 0.5f);
const TInt ztr = MFloatToInt(xyz.z - 0.5f);
The same performance test ( 1024x1024 @ 12 octaves ) now takes 3324 milliseconds, a 118% improvement.
The next trick is an idea of a co-worker, Inigo Quilez, who's heavily involved in the 64 KB-demo coding scene. His idea is simple: instead of computing a gradiant, just replace it with a lookup table.
The original code looked like this:
return(_lerp(w, _lerp(v, _lerp(u, _grad(ms_p[AA], x, y, z),
_grad(ms_p[BA], x - 1, y, z)),
_lerp(u, _grad(ms_p[AB], x, y - 1, z),
_grad(ms_p[BB], x - 1, y - 1, z))),
_lerp(v, _lerp(u, _grad(ms_p[AA + 1], x, y, z - 1),
_grad(ms_p[BA + 1], x - 1, y, z - 1)),
_lerp(u, _grad(ms_p[AB + 1], x, y - 1, z - 1),
_grad(ms_p[BB + 1], x - 1, y - 1, z - 1)))));
The "fast" version looks like this:
return(_lerp(w, _lerp(v, _lerp(u, ms_grad4[AA],
_lerp(v, _lerp(u, ms_grad4[AA + 1],
ms_grad4[BA + 1]),
_lerp(u, ms_grad4[AB + 1],
ms_grad4[BB + 1]))));
Here, "ms_grad4" is a 512-entry lookup table that contains absolutely random float values in the [-0.7; +0.7] range.
It is initialized like this:
static TFloat ms_grad4;
for (TInt i = 0; i < 256; i++)
kkf = -1.0f + 2.0f * ((TFloat)i / 255.0f);
for (TInt i = 0; i < 256; i++)
ms_grad4 = kkf[ms_p2] * 0.7f;
At this point you're maybe wondering why 0.7 ? This is not clear to us yet; we're suspecting it has something to do with the average value you can get from a normal Perlin noise basis. We tried a range of [ -1; +1 ] originally, but we found that the fBm was saturating to black or white too often, hurting quality by a lot.
This "fast" version of noise runs at 1027 milliseconds, a 223% improvement compared to the previous tweak, or a 607% improvement compared to the original Improved noise version.
As for quality, well, everything has a price. There's no visible artifacts in the fast noise, but the features don't seem to be as well regular and well distributed than the improved noise. The following image has been generated with the same frequency/scaling values. The features appear different since the lookup table holds different values in the two versions, but that's normal.