ASEToBin 0.7, SSE noise, Art of Infinity..

Published November 07, 2007
Advertisement
ASEToBin 0.7

The latest version of ASEToBin has been released:

http://www.fl-tw.com/Infinity/Docs/SDK/ASEToBin/ASEToBin_v0.7.zip

For more infos, please refer to this thread:
http://www.fl-tw.com/InfinityForums/viewtopic.php?p=125647#125647

SSE2 noise

I spent a bit more than a week to learn SSE2 instructions, to use them in my perlin noise classes / fractal sums, to speed up performance.

Learning SSE2 instructions was surprisingly easy. I blame that on the fact that I'm coding shaders daily, and the two are quite similar ( operating on registers of N values at the same time ).

Translating my fast Perlin noise code to SSE2 was done in an hour. In theory, SSE2 is 4 times faster than standard calculations, but of course in practise there's some overhead, so I was expecting it to be 2 or 3 times faster.. so I was pretty excited to benchmark my new code. Tadaaaa!

2 times slower !

Huh ?

Aaargh. After investigating, it appeared that there was two kind of problems:

- the overhead in setting up the parameters and return value, to convert from standard structures to SSE vectors and vice-versa. Once I understood the problem, it was quickly fixed with a few swizzles, but I didn't really earn a lot of performance.

- the lookup tables ( memory accesses ) in the noise code. That one is tricky. The permutation/gradiants tables are an integral part of the noise algorithm.

I submitted the problem to my coworker, I. Quilez, and we brainstormed a bit about the problem, to find a potential solution: rewritting the noise algorithm without lookup tables!

The thing is, the lookup tables are a CPU optimization. Removing them doesn't sound, at first, a good idea, since the code ends up being slower. But by replacing each call by a maths formula, we've suddenly made the whole algorithm fit SSE instructions / workflow a lot better ( read: near perfectly! ). No more memory accesses, just pure arithmetic.

We called this new type of noise "computed noise". The standard implementation is, as expected, 30% slower than fast Perlin noise ( while giving similar noise quality ). However, once ported to SSE, our "computed noise" ends up being twice faster than the fast noise.

Remember that the "fast noise" was already an optimization we made on top of Perlin's "improved noise", that was already twice faster. In summary, our SSE "computed noise" is 4 times faster than Perlin's improved noise.

We are happy..

The Art of Infinity

Last week I had 4 days of vacation ( halloween thursday + friday, plus week end ). I spent that time to work on the "Art of Infinity", a suggestion by Koshime to help organize our concept art.

Initially, it was only supposed to be a gallery for concept art and maybe renders, but quickly came to me the idea to use it to manage contributions too.

After Singapore was installed, we realized it had a few limitations. The main one was the requirement to not mix images and sub-galleries: if you are creating a sub-gallery, you cannot have raw images at the same level.

Otherwise, Singapore is pretty nice. It can work on ascii files or on a mysql database. It's possible to upload images through FTP, and update the whole gallery with a button. And of couse it has an admin mode in which you can organize galleries and upload images from your hard drive. It's simple and works quite well.

Still, it lacks a few features, like the color status ( making galleries appearing in a color depending on its status ), or private, password-protected sections. I implemented both in php and it seems to work well ( excepting a few holes with the password protection, that a few people discovered.. heh.. but that was quickly fixed ).

I had to customize the template a bit to make it look less invasive, and as usual with CSS, I had to spend hours to make it look "okay" both on firefox and IE.

After that, I spent around 20 hours over 2 days to upload some contributions to the new gallery, checking the status and reviewing the existing models.. so far I think we've got near a hundred contribs in the gallery, but I haven't finished to add them all, so it'll continue to grow over time..
Previous Entry Shadowing part IV
0 likes 2 comments

Comments

Ysaneya
Quote:Original post by Ysaneya
ASEToBin 0.7

The latest version of ASEToBin has been released:

http://www.fl-tw.com/Infinity/Docs/SDK/ASEToBin/ASEToBin_v0.7.zip

For more infos, please refer to this thread:
http://www.fl-tw.com/InfinityForums/viewtopic.php?p=125647#125647

SSE2 noise

I spent a bit more than a week to learn SSE2 instructions, to use them in my perlin noise classes / fractal sums, to speed up performance.

Learning SSE2 instructions was surprisingly easy. I blame that on the fact that I'm coding shaders daily, and the two are quite similar ( operating on registers of N values at the same time ).

Translating my fast Perlin noise code to SSE2 was done in an hour. In theory, SSE2 is 4 times faster than standard calculations, but of course in practise there's some overhead, so I was expecting it to be 2 or 3 times faster.. so I was pretty excited to benchmark my new code. Tadaaaa!

2 times slower !

Huh ?

Aaargh. After investigating, it appeared that there was two kind of problems:

- the overhead in setting up the parameters and return value, to convert from standard structures to SSE vectors and vice-versa. Once I understood the problem, it was quickly fixed with a few swizzles, but I didn't really earn a lot of performance.

- the lookup tables ( memory accesses ) in the noise code. That one is tricky. The permutation/gradiants tables are an integral part of the noise algorithm.

I submitted the problem to my coworker, I. Quilez, and we brainstormed a bit about the problem, to find a potential solution: rewritting the noise algorithm without lookup tables!

The thing is, the lookup tables are a CPU optimization. Removing them doesn't sound, at first, a good idea, since the code ends up being slower. But by replacing each call by a maths formula, we've suddenly made the whole algorithm fit SSE instructions / workflow a lot better ( read: near perfectly! ). No more memory accesses, just pure arithmetic.

We called this new type of noise "computed noise". The standard implementation is, as expected, 30% slower than fast Perlin noise ( while giving similar noise quality ). However, once ported to SSE, our "computed noise" ends up being twice faster than the fast noise.

Remember that the "fast noise" was already an optimization we made on top of Perlin's "improved noise", that was already twice faster. In summary, our SSE "computed noise" is 4 times faster than Perlin's improved noise.

We are happy..

The Art of Infinity

Last week I had 4 days of vacation ( halloween thursday + friday, plus week end ). I spent that time to work on the "Art of Infinity", a suggestion by Koshime to help organize our concept art.

Initially, it was only supposed to be a gallery for concept art and maybe renders, but quickly came to me the idea to use it to manage contributions too.

After Singapore was installed, we realized it had a few limitations. The main one was the requirement to not mix images and sub-galleries: if you are creating a sub-gallery, you cannot have raw images at the same level.

Otherwise, Singapore is pretty nice. It can work on ascii files or on a mysql database. It's possible to upload images through FTP, and update the whole gallery with a button. And of couse it has an admin mode in which you can organize galleries and upload images from your hard drive. It's simple and works quite well.

Still, it lacks a few features, like the color status ( making galleries appearing in a color depending on its status ), or private, password-protected sections. I implemented both in php and it seems to work well ( excepting a few holes with the password protection, that a few people discovered.. heh.. but that was quickly fixed ).

I had to customize the template a bit to make it look less invasive, and as usual with CSS, I had to spend hours to make it look "okay" both on firefox and IE.

After that, I spent around 20 hours over 2 days to upload some contributions to the new gallery, checking the status and reviewing the existing models.. so far I think we've got near a hundred contribs in the gallery, but I haven't finished to add them all, so it'll continue to grow over time..
November 07, 2007 07:49 AM
dgreen02
*in confused voice* wh-wh-where are the screenshots?!?

I clicked the link...but...there are no screenshots!

*curls up on floor and starts shaking*

Congrats on the noise optimization...interesting story too.

Keep up the good work man.
November 08, 2007 03:13 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement