Archived

This topic is now archived and is closed to further replies.

HellRaiZer

Cloud Rendering on GPU withOUT pixel or texture shaders...

Recommended Posts

I know that cloud rendering became a trivial topic this days, and forgive me for posting one more thread on that, but i have a little problem. I''m working on a GeForce 256 DDR, with no pixel shaders (and low performance vertex shaders). I''m trying to do as much of the cloud rendering as i can, on the GPU. I''m using render-to-texture, for blending together the octaves (4 of them, because of the performance problems). The way i blend those 4 octaves is: Equation 1: Result = 0.5 * (Oct[0] + 0.5 * (Oct[1] + 0.5 * (Oct[2] + 0.5 * Oct[3]))) <=> Result = 0.5 * Oct[0] + 0.25 * Oct[1] + 0.125 * Oct[2] + 0.0625 * Oct[3] (1)
// Equation (1)

//

// Res1 = 0.0625 * Oct[3] + 0.0f * BackGround;

float coef = 1.0f / 16.0f;
glBlendColorEXT(coef,coef,coef,coef);
glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ZERO);
Clouds.Texture[3].Use();
RenderQuad();

// Res2 = 0.125 * Oct[2] + Res1;

coef = 1.0f / 8.0f;
glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ONE);
glBlendColorEXT(coef,coef,coef,coef);
Clouds.Texture[2].Use();
RenderQuad();

// Res3 = 0.25 * Oct[1] + Res2;

coef = 1.0f / 4.0f;
glBlendColorEXT(coef,coef,coef,coef);
Clouds.Texture[1].Use();
RenderQuad();

// Res4 = 0.5 * Oct[0] + Res3;

coef = 1.0f / 2.0f;
glBlendColorEXT(coef,coef,coef,coef);
Clouds.Texture[0].Use();
RenderQuad();
RenderQuad() renders the currently binded texture using ortho view, so it covers the whole p-Buffer. After that, i have a texture (in the dims of the p-Buffer), with no exponential pass. The first thing i have to do is subtract the CutOff (CloudDensity * White) texture, from the above result. I an extra pass: Equation 2: FinalResult = NoiseTexture * 1.0 - WhiteTexture * CloudDensity;
coef = Clouds.CloudDensity / 255.0f;
glBlendEquationEXT(GL_FUNC_REVERSE_SUBTRACT_EXT);
glBlendColorEXT(coef, coef, coef, coef);
glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ONE);
Clouds.CutOff->Use();
RenderQuad();

glBlendEquationEXT(GL_FUNC_ADD_EXT);
The above procedure is similar to Kim Pallister''s article on cloud rendering, converted to OpenGL API. My problem now is, that the final texture is very dark, and you can hardly the cloud formation on it. Kim Pallister suggest a modulate 2x for making it brighter, but how can i do it? Is there a way of executing the exponential pass on the GPU withOUT pixel and texture shaders??? I remember Yann suggested, keeping the exponential values in a 256x1 1D texture and using this as a lookup table. But this needs texture shaders, doesn''t it? Forgive my ignorance on that kind of stuff. This is first "project" that uses so much of the GPU, and i''m starting to confuse things up. If it isn''t possible (the exponatiation pass on the GPU), what can i do to make them look a little brighter? Calculating everything on the CPU, and having them animate, is a little hard to optimize, so i need your opinion on that. Thanks in advance. HellRaiZer

Share this post


Link to post
Share on other sites
Thanks for repling. A second pass with additive blending, may do the trick, if you don't shade your clouds! Because you must use alpha blending for making them transparent, i don't know if this would help. Thanks for it anyway. I can't test it right now because i don't shade my clouds. I only use the opacity map as clouds.

My fault posting such question! I know that if it was possible, then Yann may have already address this in the "Sky Rendering Techniques..." thread.

Some other questions.

Do you tile the smaller octaves on the biggest one, or you stretch them? I know stretching is looking more physical, if you think what lower frequency octaves suppose to represent (large/main cloud formations), but Elias in his tutorial (if i remember correct) suggest tiling. Is this true? I ask that, because tiling is more CPU friendly. No filtering is needed, just a mod with smaller octave's dimension. In contrast with stretching which needs something like bilinear filtering to look good. But i couldn't make tiling look good. I mean, if i tile smaller octaves on top of the largest, you can see the boundaries of them! Maybe a mirror-tile thing will do the trick.
In the above procedure i described (p-Buffer rendering of the octaves) OpenGL automatically handles bilinear filtering, and it looks good. I tried to read back the texture from the p-Buffer, apply the exponential pass to it, and then upload it again. Despite it was crowling (around 250 msec for reading back the texture), it looks good. As good as it can look without shading, i mean!

What's your primary octave's dimensions, and how many octaves do you use for good looking clouds? I remember Yann mention something about 8-12 octaves, with base octave's dimensions be 16x16 in his pre-GPU approach. This is huge!!! The biggest octave will be 2048x2048, and if i follow Elias's or Pallister's tut, then this is the one that must be updated the most. What i'm missing here? It couldn't be possible, generating new octaves for these dimensions, not even in a preproccess step!

On Pallister's tut. His clouds are looking good, and he don't even use texture shaders. It's not like Yann's clouds, but it's better than mine. I'm not familiar with DirectX, so i can't follow his code. I followed him to the point i could. I did what he described in the paper, or at least i tried. Can someone, who had implemented his paper on OpenGL, explain how he did it. What's the next step, after subtracting the cloud-density texture from the final-noise texture?

Thanks in advance, and sorry if all the above look messed up. I'm a little confused, so...

HellRaiZer

[EDIT]
One thing i want to add, is despite the fact that the clouds aren't shaded correctly, and not exponented (??), they are nearly completely animated. I say nearly because, instead of generating 2d octaves, i generate 3d octaves, and interpolate between them. Also adding a different u, v offset every frame on every octave, result in (nearly) different clouds every frame. An extra parameter could be cloud opacity. Irrelevant, but...
[/EDIT]


[edited by - HellRaiZer on October 24, 2003 7:36:14 AM]

Share this post


Link to post
Share on other sites
quote:

Is there a way of executing the exponential pass on the GPU withOUT pixel and texture shaders?


No. It requires a dependent texture lookup, which is not supported below a GF3. You could, however, try to abuse the paletted texture functionality for that, which basically is a from of dependent lookup. Although it will probably require a copyback over the CPU, and that's very slow. Your best best is to do the exponentiation on the CPU.

And while you're at it, check if rendering the entire clouds on the CPU is not going to be faster than rendering them on a slow GF1 GPU, and reading them back (chances are, that it will be faster on the CPU). Delegating a process onto the GPU is generally only worth it, if you can keep everything on the 3D card, without needing to read something back. In all other cases, consider the CPU directly.

quote:

Do you tile the smaller octaves on the biggest one, or you stretch them?


Both. The first (lowest) octave is tiled by factor x. The next higher octave by x*2, the next one by x*4, and so on. Choose x by using trial and error, just use something that looks nice on your particular skyplane implementation. If you want absolutely no repetitions on your sky, then x should be one. But keep in mind, that you'll need lots of octaves, if you want to get a good detail on the higher frequencies. Increasing x will give you a tradeoff between the number of required octaves (ie. performance) and repetitive patterns. Put it as high as you can visually handle it.

quote:

What's your primary octave's dimensions, and how many octaves do you use for good looking clouds? I remember Yann mention something about 8-12 octaves, with base octave's dimensions be 16x16 in his pre-GPU approach. This is huge!!! The biggest octave will be 2048x2048, and if i follow Elias's or Pallister's tut, then this is the one that must be updated the most.


When doing the octave overlay on the GPU, then every single octave is exactly 16x16 in size. You could go up to 100 octaves, if you want (and if you have a very precise floating point pipeline...), every octave texture would still be 16x16. The higher octaves are tiled on the GPU to achieve the (faked) larger resolution. But in memory, they are just simple 16x16 textures. It's just as described on Hugo Elias' page, only that the GPU handles the tiling and octave additions.

quote:

What's the next step, after subtracting the cloud-density texture from the final-noise texture?


I don't remember his tutorial, but you mentioned something about a modulate x2. That's simply an operation, where you multiply the result with 2, clamping it afterwards. You can try several variations of that algorithm. you can try to multiply the result with 4 to get even higher constrast, or you can multiply it with itself to simulate a basic exponential. But while trying all those things, keep in mind that the GF1 has very limited precision in the fragment combiners - only 9bit. Each time you apply a complex operation to a fragment, you lose precision. You'll quickly run into serious visual problems, if you do too much. They will manifest themselves as "banding", ie. irritating coloured lines and blocks instead of smooth gradients, and sometimes flickering and colour noise.

Seriously, if you want good quality clouds on a GF1, you should consider doing everything on the CPU. What type of CPU do you have ?


[edited by - Yann L on October 24, 2003 12:03:19 AM]

Share this post


Link to post
Share on other sites
Yann thanks for repling

quote:

And while you''re at it, check if rendering the entire clouds on the CPU is not going to be faster than rendering them on a slow GF1 GPU, and reading them back (chances are, that it will be faster on the CPU). Delegating a process onto the GPU is generally only worth it, if you can keep everything on the 3D card, without needing to read something back. In all other cases, consider the CPU directly.



I''m not so used with 3D pipeline, and the fact that i managed to get to that point (sum weighted octaves on the GPU), is a start, i think. But i''ll stop here. I''m returning to the CPU computation approach, making it completely static for the beggining, and start adding some "alive" parts to it till it starts crowling. Then i''ll try to optimize it. I''m not very familiar with mmx or 3DNow! instructions, but i''ll start from common x86 assembly. Then i''ll dive into 3DNow!.

quote:

Both. The first (lowest) octave is tiled by factor x. The next higher octave by x*2, the next one by x*4, and so on. Choose x by using trial and error, just use something that looks nice on your particular skyplane implementation. If you want absolutely no repetitions on your sky, then x should be one. But keep in mind, that you''ll need lots of octaves, if you want to get a good detail on the higher frequencies. Increasing x will give you a tradeoff between the number of required octaves (ie. performance) and repetitive patterns. Put it as high as you can visually handle it.



Now i''m completely confused. First of all, what are you define as the lowest octave, and what as the highest? With lowest octave i mean, lowest resolution (octave level 0 = base resolution^2). How can you repeat/tile the smaller octave x times and the bigger octaves by x*n???? I''m missing something here! Which octave resolution has the biggest frequency? the smaller octaves (e.g. 16x16) or the biggest octaves (e.g. 512x512)? If i tile them, then i think having the smaller one for the little (continusly changing) details (higher frequency), is sufficient. And the opposite stands for the bigger octaves (slowly changing general cloud formations). I this correct? What if i''m going to tile them, and i want to stretch them all. Do i have to invert the connections (bigger res - bigger freq, smaller res - smaller freq), or the procedure is fixed?

quote:

When doing the octave overlay on the GPU, then every single octave is exactly 16x16 in size. You could go up to 100 octaves, if you want (and if you have a very precise floating point pipeline...), every octave texture would still be 16x16. The higher octaves are tiled on the GPU to achieve the (faked) larger resolution. But in memory, they are just simple 16x16 textures. It''s just as described on Hugo Elias'' page, only that the GPU handles the tiling and octave additions.



I must admit, i misunderstood Elias'' tutorial. Because i didn''t understood what the "resampling" step suppose to do, i treated every octave as a different resolution, instead of just different frequency and amplitude. Does the above procedure stands for the CPU approach, or for better visual results, i must make them bigger as the octave increase?

quote:

Seriously, if you want good quality clouds on a GF1, you should consider doing everything on the CPU. What type of CPU do you have ?



I''m into it right now. I may have learned something from my GPU approach so far (p-Buffers, dynamic textures, blending equations etc.), but i have to leave it for now. I''m working on getting everything done on the CPU, as you said. CPU is a Duron 700 MHz (my development PC) and AthonXP 2000+ (my brother''s PC). I''m familiar with common x86 assembly, but i haven''t written anything with 3DNow! or MMX. I must dive into them; but this is something i''ll do after i finish the whole cloud generation and rendering. It''ll be an optimization step.

Thanks again.

HellRaiZer

Share this post


Link to post
Share on other sites
I once did the cloud gen with something similar with what you want to do using textures. First generate some noise then smooth it then resample it do different sizes and add them up.

quote:

Now i''m completely confused. First of all, what are you define as the lowest octave, and what as the highest? With lowest octave i mean, lowest resolution (octave level 0 = base resolution^2). How can you repeat/tile the smaller octave x times and the bigger octaves by x*n???? I''m missing something here! Which octave resolution has the biggest frequency? the smaller octaves (e.g. 16x16) or the biggest octaves (e.g. 512x512)? If i tile them, then i think having the smaller one for the little (continusly changing) details (higher frequency), is sufficient. And the opposite stands for the bigger octaves (slowly changing general cloud formations). I this correct? What if i''m going to tile them, and i want to stretch them all. Do i have to invert the connections (bigger res - bigger freq, smaller res - smaller freq), or the procedure is fixed?



Lets say octave 0 is the 512x512 one. Take a 16x16 texture and stretch it to 512x512. Then stretch (render the 512x512 quad with it) octave 1 to 256x256 and use texcoords (0, 0) to (2, 2). Then octave 2 to 128x128 and use texcoords (0, 0) to (4, 4). and so on until octave MaxOctaves is not streched at all just tiled to fit the whole quad.

There was an article about this (by Intel) but i dont remember it''s name.




[ My Site ]

Member of "Un-Ban nes8bit" association (UNA) (to join put this in your sig) (Welcome back to JesperT)
Founding member of "Dave Astle is an ALIEN god" association (DAIAAGA) (proof here)(to join put this in your sig and praise Dave daily)
/*ilici*/

Share this post


Link to post
Share on other sites
quote:

Lets say octave 0 is the 512x512 one. Take a 16x16 texture and stretch it to 512x512. Then stretch (render the 512x512 quad with it) octave 1 to 256x256 and use texcoords (0, 0) to (2, 2). Then octave 2 to 128x128 and use texcoords (0, 0) to (4, 4). and so on until octave MaxOctaves is not streched at all just tiled to fit the whole quad.



That''s what i was doing, with p-Buffer rendering. But now i abandoned the p-Buffer idea, and i''m trying to do it on CPU. I have some problems with tiling. Edges appear in the middle of the final noise, and the whole texture is completely repeated, so i don''t like it. The previous noises i had, with GPU stretching and MIRRORED_REPEAT tiling were thousands times better. I''m trying to fix this for now.

Thanks.

HellRaiZer

Share this post


Link to post
Share on other sites
I think i understand the procedure now. I made some experiments, with tiling, stretching (real bas stretching, something like NEAREST, NEAREST filter), different resolution octaves etc.

I want your opinion on the steps. Esp. on the frequency series...

1) Generate n octaves of noise, using the same dimensions for all of them (e.g 32x32), and with (Frequency * 2^i, Persistance^i) for every one of them.

2) Having those n "textures" at (e.g) 16x16 resolution, start stretching them in the order: (final texture 256x256, n = 5)


# octave - original dims - final dims - tile (u, v) - weight - frequency
-------------------------------------------------------------------------
0 16 x 16 256x256 1 x 1 1.0 Base
1 16 x 16 128x128 2 x 2 0.5 Base * 2
2 16 x 16 64 x 64 4 x 4 0.25 Base * 4
3 16 x 16 32 x 32 8 x 8 0.125 Base * 8
4 16 x 16 16 x 16 16x16 0.0625 Base * 16


3) Add those octaves together by tiling them accordingly, using a weighted averange.

4) Do the exponential pass and shading (???)

Requoting Yann's post.
quote:

Both. The first (lowest) octave is tiled by factor x. The next higher octave by x*2, the next one by x*4, and so on. Choose x by using trial and error, just use something that looks nice on your particular skyplane implementation. If you want absolutely no repetitions on your sky, then x should be one. But keep in mind, that you'll need lots of octaves, if you want to get a good detail on the higher frequencies. Increasing x will give you a tradeoff between the number of required octaves (ie. performance) and repetitive patterns. Put it as high as you can visually handle it.



I think, i understand the above paragraph now. I was confused with the terminology. x is the tile coef used for the 0 octave of the above table. So if x = 2 then oct[4] must be tiled 32x32 times instead of 16x16. Is this correct?

I think thread's subject is out-of-date, but what the hell!

To some more questions now.

Using octaves' resolution 16x16, when creating the noise on the GPU, is just for that. I mean, you keep it low, because it's calculated on the GPU, or this resolution gives good results either way (CPU - GPU)? Should i keep it small for faster re-generation, or there are other tricks for that?

Till now my stretching, as i said earlier, is like NEAREST, NEAREST filter, and it looks really bad. Should i implement something like bilinear filtering for stretching the originally-small octaves? The way i did it is something like that :


int octaveSize = 1 << (curOctaveIndex + BaseResolution);
int pixelSize = TextureSize / octaveSize; // this is the tile (u, v) from the above table.


for(int i=0;i<octaveSize;i++)
{
// int x = i % octaveSize; // this is used for tiling

int x = i / pixelSize;

for(int j=0;j<octaveSize;j++)
{
// int y = j % octaveSize; // this is used for tiling

int y = j / pixelSize;

curPixel = noise[curOctaveIndex][x + y * octaveSize];
}
}


Is there a similar easy way for a better stretching "filter"? Can i perform tiling and stretching in one pass? Or i have to stretch every octave to its final dimensions, and then add them togoether by tiling? (Does it make any sense??)

Until summation, should i keep noise data in the [-1, 1] field, or i can convert them to [0, 255] field? It saves 3 bytes, but i have to transform the values back to [-1, 1] for weighting them and summing. Is there anything intermediate i can do, but keeping memory requirements low ([0, 255] field), and make calculations faster ([-1, 1] field)?

Also all the questions from my previous post are still alive!!! Especially the tiling issue.

HellRaiZer

[EDIT]
Tiling problem solved, by "simulating" MIRRORED_REPEAT behavior. Everything seems to work fine right now. I must make animate now. What will give better animated results. 3D noise for every octave, interpolate over time, and reconstruct the final texture as neccessary, or generate some frames of the final texture, and interpolate over them (looping)?

I resample octaves using DevIL, but this is slow!
[/EDIT]

[edited by - HellRaiZer on October 25, 2003 1:07:57 PM]

Share this post


Link to post
Share on other sites
Hello again. This starts to be funny/crazy posting replies to myself (!!!) but i really want your suggestions.

I re-implemented cloud generation and exponentiation without any use of the GPU, and i tried to make it animate. I made some experiments, and nothing was good enough.

First i tried to animate every octave indepentently, every frame, and reconstruct the final texture. This was slow!!!! No hard optimization was done, but it was unacceptable slow to fight for! Also, it didn''t looked too good (from the anim point of view) to keep it. Next i tried to build some "layers" (instead of individual octaves), without the exp pass, interpolate over time, and exp. the result. The things were different now. Performance was better (with some problems when the final texture was huge), but animation was awfull! I don''t know what to do, and more importantly, what i should expect for such a system.

I have to admit, that the p-Buffer approach was better, if you don''t want the extreme cloud-look exponentiation gives you.

The reason i''m still thinking of the p-Buffer approach is, because despite the fact that it wasn''t really good-looking, the general cloud pattern was nearly 100% unpredictable and non-repeative every different frame, without having to regenerate noise textures when the old one was out-dated. A few different frames for every octave, different tiling, different alpha value, different u,v coords for every one of them, and linear interpolation between frames, made it look completely new every time it was updated.

I''ll try to see what i can get by abusing paletted textures as Yann suggested. I have never worked with paletted texture before, so i may have a problem here.

quote:

No. It requires a dependent texture lookup, which is not supported below a GF3. You could, however, try to abuse the paletted texture functionality for that, which basically is a from of dependent lookup. Although it will probably require a copyback over the CPU, and that''s very slow. Your best best is to do the exponentiation on the CPU.



I can''t understand why this would require a copyback. Having each color as an index in the palette, in the range [0, 255], is completely the same as having the clouds as an alpha (single channel) texture. The only thing you have to do is give OpenGL the palette which will be the exponent table. What you actually have to do, is supply a different palette every time you change cloud sharpness and density. What i''m missing here?

Can i have a single-channel (8-bit) paletted texture?
Is it possible to (reg) combine an A texture with an RGB one, and take as a result a RGBA, which will be used as the cloud texture?
If it is, is it possible one of these textures be paletted, and the other true color?

Yann suggested checking if the whole precedure will be faster on the CPU instead of a GPU-CPU mix. So far, i can say that having animated noise, combined on the GPU is faster. This is of course may be my fault. In fact, thinking of Elias'' cloud demo, i can say, this is definetely my fault!

HellRaiZer

PS. Sorry for keeping posting to myself. If nobody answers, then i''ll not do it again. Thanks for your attention!!!

Share this post


Link to post
Share on other sites
OK, here some answers. Took a while, but I was busy with Real Life™ this weekend

quote:
Original post by HellRaiZer
I want your opinion on the steps. Esp. on the frequency series...

1) Generate n octaves of noise, using the same dimensions for all of them (e.g 32x32), and with (Frequency * 2^i, Persistance^i) for every one of them.

2) Having those n "textures" at (e.g) 16x16 resolution, start stretching them in the order: (final texture 256x256, n = 5)

3) Add those octaves together by tiling them accordingly, using a weighted averange.

4) Do the exponential pass and shading (???)


Yes, that is correct.

quote:

I think, i understand the above paragraph now. I was confused with the terminology. x is the tile coef used for the 0 octave of the above table. So if x = 2 then oct[4] must be tiled 32x32 times instead of 16x16. Is this correct?


Yes. I now realize that the variable name I picked (''x'') must have been a little confusing. Just call it ''tc'' for tile coefficient, if you want.

quote:

Using octaves'' resolution 16x16, when creating the noise on the GPU, is just for that. I mean, you keep it low, because it''s calculated on the GPU, or this resolution gives good results either way (CPU - GPU)? Should i keep it small for faster re-generation, or there are other tricks for that?


16x16 is just a size. I use 64x64 on my current implementation, but it doesn''t really matter that much. Don''t make it too large, as you''ll have to regenerate this on the CPU to animate the clouds. And larger pieces will take longer to generate and upload. Don''t make it too small, or the noise will show visual artifacts. Typically, 16², 32² or 64² are good candidates.

quote:

Till now my stretching, as i said earlier, is like NEAREST, NEAREST filter, and it looks really bad. Should i implement something like bilinear filtering for stretching the originally-small octaves?


Well, if you want good quality, then the answer is definitely yes. The advantage of adding the octaves on the GPU, is that you''ll get the filtering automatically. If you want to render the noise on the CPU, you''ll also need to implement some form of interpolation. Back in the old days, this was quite hard and very slow. But nowadays, with SIMD, it''s not that bad anymore.

quote:

Is there a similar easy way for a better stretching "filter"? Can i perform tiling and stretching in one pass? Or i have to stretch every octave to its final dimensions, and then add them togoether by tiling? (Does it make any sense??)


There are tons of different scaling techniques for 2D images available. Many can combine tiling and interpolation. Tiling is nothing but a simple wrap-around, at least if the textures are powers of two. Search the net, there are plenty of resources out there. For good performance, I would definitely recommend SIMD ASM though.

quote:

Until summation, should i keep noise data in the [-1, 1] field, or i can convert them to [0, 255] field? It saves 3 bytes, but i have to transform the values back to [-1, 1] for weighting them and summing.


Depends. You can keep them biased from -128 to 127, in one signed byte per texel. But you need to do the octave combining in a higher precision format, otherwise you''ll run into banding artifacts. Floating point is OK, but requires expensive integer-float conversions. Fixed point is more appropriate here, 16.16 (32bit) or 8.16 (24bit) usually work pretty well. You could also experiment with byte to float conversion tables, this might be fast than a direct conversion. Depends on the CPU.

quote:

3D noise for every octave, interpolate over time, and reconstruct the final texture as neccessary, or generate some frames of the final texture, and interpolate over them (looping)?


Compute a new noise octave every 10 or 20 frames, and interpolate the other ones. That way, you can gradually compute the noise tiles, distributed over several frames. Divide and conquer. That''s a nice trick to reduce the per-frame noise generation overhead.

quote:

First i tried to animate every octave indepentently, every frame, and reconstruct the final texture. This was slow!!!! No hard optimization was done, but it was unacceptable slow to fight for! Also, it didn''t looked too good (from the anim point of view) to keep it. Next i tried to build some "layers" (instead of individual octaves), without the exp pass, interpolate over time, and exp. the result. The things were different now. Performance was better (with some problems when the final texture was huge), but animation was awfull! I don''t know what to do, and more importantly, what i should expect for such a system.


The speed is a question of optimization. This is where ASM and SIMD (MMX, SSE, 3DNow) come in very handy. In practice, you can get basic Perlin noise tile generation to be extremely fast this way. In my case, it takes more time to upload the new noise textures to the GPU, than to generate them... But don''t worry about that right now, make it work and look good first.

About the animation. Well, animating realistic clouds by changing the individual octaves is very difficult to adjust. My advice: don''t bother. In reality, clouds will never morph that way. The most realistic effects are sometimes the easiest. Animating the cloud density offset and base exponent factor give very cool results (for example simulating an approaching storm). General cloud movement is typically done by linear shift of the tile texture coordinate offsets in the wind direction. Perhaps in multiple layers. That will give you basic moving clouds. Don''t let it wrap over at the edges, but generate new tiles. This way, you''ll have infinite amount of never repeating clouds. If at the same time you slowly animate the base octave tiles, the result will be very nice.

quote:

I have to admit, that the p-Buffer approach was better, if you don''t want the extreme cloud-look exponentiation gives you.


Of course the GPU generation is better, with and without the exponentiation. But in the case of a GF1, I don''t if it''s that good from a performance point of view. You have to try and decide.

quote:

I can''t understand why this would require a copyback. Having each color as an index in the palette, in the range [0, 255], is completely the same as having the clouds as an alpha (single channel) texture. The only thing you have to do is give OpenGL the palette which will be the exponent table. What you actually have to do, is supply a different palette every time you change cloud sharpness and density. What i''m missing here?


That OpenGL will not accept an RGB or alpha image (such as the one in your p-buffer) as valid paletted texture data. For that to work, you need to supply a texture with a GL_COLOR_INDEX format, and a GL_COLOR_INDEX8_EXT internal format. So you need to readback your greyscale image, and reupload it with the above mentioned format parameters. From a point of view of the GPU, it''s completely unnecessary. This is an API limitation, as the paletted texture functionality was never intended to be (ab-)used in such a way...

quote:

Can i have a single-channel (8-bit) paletted texture?
Is it possible to (reg) combine an A texture with an RGB one, and take as a result a RGBA, which will be used as the cloud texture?
If it is, is it possible one of these textures be paletted, and the other true color?


Yes, yes and yes.

quote:

Yann suggested checking if the whole precedure will be faster on the CPU instead of a GPU-CPU mix. So far, i can say that having animated noise, combined on the GPU is faster. This is of course may be my fault.


Optimization

Share this post


Link to post
Share on other sites
quote:

OK, here some answers. Took a while, but I was busy with Real Life™ this weekend.



I tend to forget about Real Life some times Yesterday was my name''s day (how do you call it?) and i was fighting with clouds... The phone was ringing all the time, and i was thinking of octaves and how to combine them!!!! Am i becoming a freak???

quote:

Depends. You can keep them biased from -128 to 127, in one signed byte per texel. But you need to do the octave combining in a higher precision format, otherwise you''ll run into banding artifacts. Floating point is OK, but requires expensive integer-float conversions. Fixed point is more appropriate here, 16.16 (32bit) or 8.16 (24bit) usually work pretty well. You could also experiment with byte to float conversion tables, this might be fast than a direct conversion. Depends on the CPU.



I haven''t thought of the [-128, 127] signed byte field. This must work. And about the byte to float conversion table, it''s a great idea. I''ll try it.

quote:

About the animation. Well, animating realistic clouds by changing the individual octaves is very difficult to adjust. My advice: don''t bother. In reality, clouds will never morph that way. The most realistic effects are sometimes the easiest. Animating the cloud density offset and base exponent factor give very cool results (for example simulating an approaching storm).



When i mentioned
quote:

animate every octave indepentently, every frame, and reconstruct the final texture


i meant, "interpolate over a 3d noise field for every octave over time..." not regenerate octaves every frame! This will be worst of course. But i can''t understand what you are suggesting. Animating cloud density, and base exponent, doesn''t require a texture regeneration and uploading? If i have a 1024^2 cloud texture, wouldn''t it be as slow to reconstruct, without interpolating octaves? In fact uploading may be also slow!

What''s a good resolution for the cloud texture? I know you will say that you texture it procedularly, and there is no fixed size, but i can''t understand it! Proceduarly texturing and infinite plane must require huge amount of calculations to make it look good. I can''t think of a way to implement this, so i must have a fixed size cloud texture.

quote:

General cloud movement is typically done by linear shift of the tile texture coordinate offsets in the wind direction. Perhaps in multiple layers. That will give you basic moving clouds. Don''t let it wrap over at the edges, but generate new tiles. This way, you''ll have infinite amount of never repeating clouds. If at the same time you slowly animate the base octave tiles, the result will be very nice.



I read this in the "Sky Rendering Techniques..." thread, and you describe it so "easy", that make me think i really miss something here! Memory movement of the cloud data to arbitary directions (wind direction) makes me sick, only by thinking of it!!! About generating new noise for the edges, it seems easy enough.

quote:

That OpenGL will not accept an RGB or alpha image (such as the one in your p-buffer) as valid paletted texture data. For that to work, you need to supply a texture with a GL_COLOR_INDEX format, and a GL_COLOR_INDEX8_EXT internal format. So you need to readback your greyscale image, and reupload it with the above mentioned format parameters. From a point of view of the GPU, it''s completely unnecessary. This is an API limitation, as the paletted texture functionality was never intended to be (ab-)used in such a way...



I forgot about the p-Buffer''s accepted formats. Only RGB_ARB and RGBA_ARB are accepted; no COLOR_INDEXx_ARB!

I now have to search for bilinear filter using MMX/3DNow!, and a way to have stretching and tiling in one fast pass!

HellRaiZer

Share this post


Link to post
Share on other sites
I''ve managed to get some kind of basic animation/motion. The problem is that it''s moving only to one direction. And this direction is the V texture coordinate. That''s because it''s easier to move memory of a bitmap to that direction. It''s just a simple memmove texture data one line at a time, and generate new data for that line. I now have to make it move to arbitary directions, and of course, generate the appropriate data. Any ideas on that, to make my life easier?

If i keep the final texture to small dimensions, then updating every frame, even with 8 octaves of noise, is fast. The problem is when the texture is big (i have to split the proccess in multiple frames), or when i don''t want the update to happen every frame, because that way the clouds are moveing fast. If one of the above is true, then the motion looks blocky! And that way i can''t have a speed parameter to handle the shift, so it looks smoother, and stable (FPS indepented).

Also, i''m trying to find a way to apply cloud deformation to the motion. My problem is that after sometime, all the data will not have any "connection" to the firstly generated data, and i have to generate data for at least 2 frames of animation each time i shift texcoords, so in the middle of 2 shifts (N frames) i can use this to interpolate on time. Any suggestions on that?

HellRaiZer

Share this post


Link to post
Share on other sites
What do you think of these?

Octaves = 5, AttenFactor = 32.0f

Octaves = 6, AttenFactor = 16.0f

Octaves = 8, AttenFactor = 12.0f


I''ve just finished writing a basic DDA, based on the "Grid Tracing" algorithm for heightfield raytracing, and i hacked the light attenuation formula to see what i can get. The results are the above with different parameters.

Can somebody explain in simple words Harris method on cloud shading? I know this is not a sane question, but i can''t understand anything from it. Is there another (simpler) way of calculating/simulating cloud shading? I remember Kill mentioned something in the huge thread, about using Perlin Noise using cloud''s heightfield to calculate shading. Can somebody explain this a little? Does it give the same (or at least similar) results as Harris'' method?

The way i calculate shading is this:


for every voxel
{
calc ray from sun to voxel (v1)
calc intersection point with the cloud bbox
find intersection voxel''s x, z coords in voxel field (entry to the field) (curVox = entry)

set voxel''s (v1) color to 255 (white)
set attenuation to 0.0f

start traversing the field.
{
if the ray passes through current voxel (curVox)
{
calculate how much the ray travels in the voxel (len)
calculate the maximum possible ray path length (maxlen)

increase total attenuation by (attenFactor * len / maxlen)
}

advance to next voxel the ray will hit (curVox = nextVoxel)
}
end traversing the field if the ray exceeded cloud''s bbox.

voxel''s (v1) color -= attenuation
}


What color should i use for sun? Is skydome''s color at sun''s position enough?

I know there are problems with the pics; any suggestion appreciated. Something to note is that everything is static now.
How can i combine movement as i described it in the previous post, with shading? Do i have to re-shade all the voxels, or the just the new voxels?

HellRaiZer

PS. Davepermen if you have any problem uploading stuff to your space, please tell it. If you want to delete those, be my guest. I don''t have any problem. Anyway thanks

Share this post


Link to post
Share on other sites