Cloud Rendering on GPU withOUT pixel or texture shaders...

Started by
12 comments, last by HellRaiZer 20 years, 5 months ago
I know that cloud rendering became a trivial topic this days, and forgive me for posting one more thread on that, but i have a little problem. I''m working on a GeForce 256 DDR, with no pixel shaders (and low performance vertex shaders). I''m trying to do as much of the cloud rendering as i can, on the GPU. I''m using render-to-texture, for blending together the octaves (4 of them, because of the performance problems). The way i blend those 4 octaves is: Equation 1: Result = 0.5 * (Oct[0] + 0.5 * (Oct[1] + 0.5 * (Oct[2] + 0.5 * Oct[3]))) <=> Result = 0.5 * Oct[0] + 0.25 * Oct[1] + 0.125 * Oct[2] + 0.0625 * Oct[3] (1)

// Equation (1)

//

// Res1 = 0.0625 * Oct[3] + 0.0f * BackGround;

float coef = 1.0f / 16.0f;
glBlendColorEXT(coef,coef,coef,coef);
glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ZERO);
Clouds.Texture[3].Use();
RenderQuad();

// Res2 = 0.125 * Oct[2] + Res1;

coef = 1.0f / 8.0f;
glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ONE);
glBlendColorEXT(coef,coef,coef,coef);
Clouds.Texture[2].Use();
RenderQuad();

// Res3 = 0.25 * Oct[1] + Res2;

coef = 1.0f / 4.0f;
glBlendColorEXT(coef,coef,coef,coef);
Clouds.Texture[1].Use();
RenderQuad();

// Res4 = 0.5 * Oct[0] + Res3;

coef = 1.0f / 2.0f;
glBlendColorEXT(coef,coef,coef,coef);
Clouds.Texture[0].Use();
RenderQuad();
RenderQuad() renders the currently binded texture using ortho view, so it covers the whole p-Buffer. After that, i have a texture (in the dims of the p-Buffer), with no exponential pass. The first thing i have to do is subtract the CutOff (CloudDensity * White) texture, from the above result. I an extra pass: Equation 2: FinalResult = NoiseTexture * 1.0 - WhiteTexture * CloudDensity;

coef = Clouds.CloudDensity / 255.0f;
glBlendEquationEXT(GL_FUNC_REVERSE_SUBTRACT_EXT);
glBlendColorEXT(coef, coef, coef, coef);
glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ONE);
Clouds.CutOff->Use();
RenderQuad();

glBlendEquationEXT(GL_FUNC_ADD_EXT);
The above procedure is similar to Kim Pallister''s article on cloud rendering, converted to OpenGL API. My problem now is, that the final texture is very dark, and you can hardly the cloud formation on it. Kim Pallister suggest a modulate 2x for making it brighter, but how can i do it? Is there a way of executing the exponential pass on the GPU withOUT pixel and texture shaders??? I remember Yann suggested, keeping the exponential values in a 256x1 1D texture and using this as a lookup table. But this needs texture shaders, doesn''t it? Forgive my ignorance on that kind of stuff. This is first "project" that uses so much of the GPU, and i''m starting to confuse things up. If it isn''t possible (the exponatiation pass on the GPU), what can i do to make them look a little brighter? Calculating everything on the CPU, and having them animate, is a little hard to optimize, so i need your opinion on that. Thanks in advance. HellRaiZer
HellRaiZer
Advertisement
Can''t you do an extra pass and render the texture on top of itself using additive blending?
Thanks for repling. A second pass with additive blending, may do the trick, if you don't shade your clouds! Because you must use alpha blending for making them transparent, i don't know if this would help. Thanks for it anyway. I can't test it right now because i don't shade my clouds. I only use the opacity map as clouds.

My fault posting such question! I know that if it was possible, then Yann may have already address this in the "Sky Rendering Techniques..." thread.

Some other questions.

Do you tile the smaller octaves on the biggest one, or you stretch them? I know stretching is looking more physical, if you think what lower frequency octaves suppose to represent (large/main cloud formations), but Elias in his tutorial (if i remember correct) suggest tiling. Is this true? I ask that, because tiling is more CPU friendly. No filtering is needed, just a mod with smaller octave's dimension. In contrast with stretching which needs something like bilinear filtering to look good. But i couldn't make tiling look good. I mean, if i tile smaller octaves on top of the largest, you can see the boundaries of them! Maybe a mirror-tile thing will do the trick.
In the above procedure i described (p-Buffer rendering of the octaves) OpenGL automatically handles bilinear filtering, and it looks good. I tried to read back the texture from the p-Buffer, apply the exponential pass to it, and then upload it again. Despite it was crowling (around 250 msec for reading back the texture), it looks good. As good as it can look without shading, i mean!

What's your primary octave's dimensions, and how many octaves do you use for good looking clouds? I remember Yann mention something about 8-12 octaves, with base octave's dimensions be 16x16 in his pre-GPU approach. This is huge!!! The biggest octave will be 2048x2048, and if i follow Elias's or Pallister's tut, then this is the one that must be updated the most. What i'm missing here? It couldn't be possible, generating new octaves for these dimensions, not even in a preproccess step!

On Pallister's tut. His clouds are looking good, and he don't even use texture shaders. It's not like Yann's clouds, but it's better than mine. I'm not familiar with DirectX, so i can't follow his code. I followed him to the point i could. I did what he described in the paper, or at least i tried. Can someone, who had implemented his paper on OpenGL, explain how he did it. What's the next step, after subtracting the cloud-density texture from the final-noise texture?

Thanks in advance, and sorry if all the above look messed up. I'm a little confused, so...

HellRaiZer

[EDIT]
One thing i want to add, is despite the fact that the clouds aren't shaded correctly, and not exponented (??), they are nearly completely animated. I say nearly because, instead of generating 2d octaves, i generate 3d octaves, and interpolate between them. Also adding a different u, v offset every frame on every octave, result in (nearly) different clouds every frame. An extra parameter could be cloud opacity. Irrelevant, but...
[/EDIT]


[edited by - HellRaiZer on October 24, 2003 7:36:14 AM]
HellRaiZer
Have you tried using the TexEnv commands? They work on a GF2 and should do the trick.
quote:
Is there a way of executing the exponential pass on the GPU withOUT pixel and texture shaders?

No. It requires a dependent texture lookup, which is not supported below a GF3. You could, however, try to abuse the paletted texture functionality for that, which basically is a from of dependent lookup. Although it will probably require a copyback over the CPU, and that's very slow. Your best best is to do the exponentiation on the CPU.

And while you're at it, check if rendering the entire clouds on the CPU is not going to be faster than rendering them on a slow GF1 GPU, and reading them back (chances are, that it will be faster on the CPU). Delegating a process onto the GPU is generally only worth it, if you can keep everything on the 3D card, without needing to read something back. In all other cases, consider the CPU directly.

quote:
Do you tile the smaller octaves on the biggest one, or you stretch them?

Both. The first (lowest) octave is tiled by factor x. The next higher octave by x*2, the next one by x*4, and so on. Choose x by using trial and error, just use something that looks nice on your particular skyplane implementation. If you want absolutely no repetitions on your sky, then x should be one. But keep in mind, that you'll need lots of octaves, if you want to get a good detail on the higher frequencies. Increasing x will give you a tradeoff between the number of required octaves (ie. performance) and repetitive patterns. Put it as high as you can visually handle it.

quote:
What's your primary octave's dimensions, and how many octaves do you use for good looking clouds? I remember Yann mention something about 8-12 octaves, with base octave's dimensions be 16x16 in his pre-GPU approach. This is huge!!! The biggest octave will be 2048x2048, and if i follow Elias's or Pallister's tut, then this is the one that must be updated the most.

When doing the octave overlay on the GPU, then every single octave is exactly 16x16 in size. You could go up to 100 octaves, if you want (and if you have a very precise floating point pipeline...), every octave texture would still be 16x16. The higher octaves are tiled on the GPU to achieve the (faked) larger resolution. But in memory, they are just simple 16x16 textures. It's just as described on Hugo Elias' page, only that the GPU handles the tiling and octave additions.

quote:
What's the next step, after subtracting the cloud-density texture from the final-noise texture?

I don't remember his tutorial, but you mentioned something about a modulate x2. That's simply an operation, where you multiply the result with 2, clamping it afterwards. You can try several variations of that algorithm. you can try to multiply the result with 4 to get even higher constrast, or you can multiply it with itself to simulate a basic exponential. But while trying all those things, keep in mind that the GF1 has very limited precision in the fragment combiners - only 9bit. Each time you apply a complex operation to a fragment, you lose precision. You'll quickly run into serious visual problems, if you do too much. They will manifest themselves as "banding", ie. irritating coloured lines and blocks instead of smooth gradients, and sometimes flickering and colour noise.

Seriously, if you want good quality clouds on a GF1, you should consider doing everything on the CPU. What type of CPU do you have ?


[edited by - Yann L on October 24, 2003 12:03:19 AM]
Yann thanks for repling

quote:
And while you''re at it, check if rendering the entire clouds on the CPU is not going to be faster than rendering them on a slow GF1 GPU, and reading them back (chances are, that it will be faster on the CPU). Delegating a process onto the GPU is generally only worth it, if you can keep everything on the 3D card, without needing to read something back. In all other cases, consider the CPU directly.


I''m not so used with 3D pipeline, and the fact that i managed to get to that point (sum weighted octaves on the GPU), is a start, i think. But i''ll stop here. I''m returning to the CPU computation approach, making it completely static for the beggining, and start adding some "alive" parts to it till it starts crowling. Then i''ll try to optimize it. I''m not very familiar with mmx or 3DNow! instructions, but i''ll start from common x86 assembly. Then i''ll dive into 3DNow!.

quote:
Both. The first (lowest) octave is tiled by factor x. The next higher octave by x*2, the next one by x*4, and so on. Choose x by using trial and error, just use something that looks nice on your particular skyplane implementation. If you want absolutely no repetitions on your sky, then x should be one. But keep in mind, that you''ll need lots of octaves, if you want to get a good detail on the higher frequencies. Increasing x will give you a tradeoff between the number of required octaves (ie. performance) and repetitive patterns. Put it as high as you can visually handle it.


Now i''m completely confused. First of all, what are you define as the lowest octave, and what as the highest? With lowest octave i mean, lowest resolution (octave level 0 = base resolution^2). How can you repeat/tile the smaller octave x times and the bigger octaves by x*n???? I''m missing something here! Which octave resolution has the biggest frequency? the smaller octaves (e.g. 16x16) or the biggest octaves (e.g. 512x512)? If i tile them, then i think having the smaller one for the little (continusly changing) details (higher frequency), is sufficient. And the opposite stands for the bigger octaves (slowly changing general cloud formations). I this correct? What if i''m going to tile them, and i want to stretch them all. Do i have to invert the connections (bigger res - bigger freq, smaller res - smaller freq), or the procedure is fixed?

quote:
When doing the octave overlay on the GPU, then every single octave is exactly 16x16 in size. You could go up to 100 octaves, if you want (and if you have a very precise floating point pipeline...), every octave texture would still be 16x16. The higher octaves are tiled on the GPU to achieve the (faked) larger resolution. But in memory, they are just simple 16x16 textures. It''s just as described on Hugo Elias'' page, only that the GPU handles the tiling and octave additions.


I must admit, i misunderstood Elias'' tutorial. Because i didn''t understood what the "resampling" step suppose to do, i treated every octave as a different resolution, instead of just different frequency and amplitude. Does the above procedure stands for the CPU approach, or for better visual results, i must make them bigger as the octave increase?

quote:
Seriously, if you want good quality clouds on a GF1, you should consider doing everything on the CPU. What type of CPU do you have ?


I''m into it right now. I may have learned something from my GPU approach so far (p-Buffers, dynamic textures, blending equations etc.), but i have to leave it for now. I''m working on getting everything done on the CPU, as you said. CPU is a Duron 700 MHz (my development PC) and AthonXP 2000+ (my brother''s PC). I''m familiar with common x86 assembly, but i haven''t written anything with 3DNow! or MMX. I must dive into them; but this is something i''ll do after i finish the whole cloud generation and rendering. It''ll be an optimization step.

Thanks again.

HellRaiZer
HellRaiZer
I once did the cloud gen with something similar with what you want to do using textures. First generate some noise then smooth it then resample it do different sizes and add them up.

quote:
Now i''m completely confused. First of all, what are you define as the lowest octave, and what as the highest? With lowest octave i mean, lowest resolution (octave level 0 = base resolution^2). How can you repeat/tile the smaller octave x times and the bigger octaves by x*n???? I''m missing something here! Which octave resolution has the biggest frequency? the smaller octaves (e.g. 16x16) or the biggest octaves (e.g. 512x512)? If i tile them, then i think having the smaller one for the little (continusly changing) details (higher frequency), is sufficient. And the opposite stands for the bigger octaves (slowly changing general cloud formations). I this correct? What if i''m going to tile them, and i want to stretch them all. Do i have to invert the connections (bigger res - bigger freq, smaller res - smaller freq), or the procedure is fixed?


Lets say octave 0 is the 512x512 one. Take a 16x16 texture and stretch it to 512x512. Then stretch (render the 512x512 quad with it) octave 1 to 256x256 and use texcoords (0, 0) to (2, 2). Then octave 2 to 128x128 and use texcoords (0, 0) to (4, 4). and so on until octave MaxOctaves is not streched at all just tiled to fit the whole quad.

There was an article about this (by Intel) but i dont remember it''s name.




[ My Site ]

Member of "Un-Ban nes8bit" association (UNA) (to join put this in your sig) (Welcome back to JesperT)
Founding member of "Dave Astle is an ALIEN god" association (DAIAAGA) (proof here)(to join put this in your sig and praise Dave daily)
/*ilici*/
quote:
Lets say octave 0 is the 512x512 one. Take a 16x16 texture and stretch it to 512x512. Then stretch (render the 512x512 quad with it) octave 1 to 256x256 and use texcoords (0, 0) to (2, 2). Then octave 2 to 128x128 and use texcoords (0, 0) to (4, 4). and so on until octave MaxOctaves is not streched at all just tiled to fit the whole quad.


That''s what i was doing, with p-Buffer rendering. But now i abandoned the p-Buffer idea, and i''m trying to do it on CPU. I have some problems with tiling. Edges appear in the middle of the final noise, and the whole texture is completely repeated, so i don''t like it. The previous noises i had, with GPU stretching and MIRRORED_REPEAT tiling were thousands times better. I''m trying to fix this for now.

Thanks.

HellRaiZer
HellRaiZer
I think i understand the procedure now. I made some experiments, with tiling, stretching (real bas stretching, something like NEAREST, NEAREST filter), different resolution octaves etc.

I want your opinion on the steps. Esp. on the frequency series...

1) Generate n octaves of noise, using the same dimensions for all of them (e.g 32x32), and with (Frequency * 2^i, Persistance^i) for every one of them.

2) Having those n "textures" at (e.g) 16x16 resolution, start stretching them in the order: (final texture 256x256, n = 5)

# octave - original dims - final dims - tile (u, v) - weight - frequency-------------------------------------------------------------------------   0          16 x 16      256x256       1 x 1         1.0       Base   1          16 x 16      128x128       2 x 2         0.5       Base * 2   2          16 x 16      64 x 64       4 x 4         0.25      Base * 4   3          16 x 16      32 x 32       8 x 8         0.125     Base * 8   4          16 x 16      16 x 16       16x16         0.0625    Base * 16  


3) Add those octaves together by tiling them accordingly, using a weighted averange.

4) Do the exponential pass and shading (???)

Requoting Yann's post.
quote:
Both. The first (lowest) octave is tiled by factor x. The next higher octave by x*2, the next one by x*4, and so on. Choose x by using trial and error, just use something that looks nice on your particular skyplane implementation. If you want absolutely no repetitions on your sky, then x should be one. But keep in mind, that you'll need lots of octaves, if you want to get a good detail on the higher frequencies. Increasing x will give you a tradeoff between the number of required octaves (ie. performance) and repetitive patterns. Put it as high as you can visually handle it.


I think, i understand the above paragraph now. I was confused with the terminology. x is the tile coef used for the 0 octave of the above table. So if x = 2 then oct[4] must be tiled 32x32 times instead of 16x16. Is this correct?

I think thread's subject is out-of-date, but what the hell!

To some more questions now.

Using octaves' resolution 16x16, when creating the noise on the GPU, is just for that. I mean, you keep it low, because it's calculated on the GPU, or this resolution gives good results either way (CPU - GPU)? Should i keep it small for faster re-generation, or there are other tricks for that?

Till now my stretching, as i said earlier, is like NEAREST, NEAREST filter, and it looks really bad. Should i implement something like bilinear filtering for stretching the originally-small octaves? The way i did it is something like that :

int octaveSize = 1 << (curOctaveIndex + BaseResolution);int pixelSize = TextureSize / octaveSize; // this is the tile (u, v) from the above table.for(int i=0;i<octaveSize;i++){//	int x = i % octaveSize; // this is used for tiling	int x = i / pixelSize;	for(int j=0;j<octaveSize;j++)	{//		int y = j % octaveSize; // this is used for tiling		int y = j / pixelSize; 		curPixel = noise[curOctaveIndex][x + y * octaveSize];	}}


Is there a similar easy way for a better stretching "filter"? Can i perform tiling and stretching in one pass? Or i have to stretch every octave to its final dimensions, and then add them togoether by tiling? (Does it make any sense??)

Until summation, should i keep noise data in the [-1, 1] field, or i can convert them to [0, 255] field? It saves 3 bytes, but i have to transform the values back to [-1, 1] for weighting them and summing. Is there anything intermediate i can do, but keeping memory requirements low ([0, 255] field), and make calculations faster ([-1, 1] field)?

Also all the questions from my previous post are still alive!!! Especially the tiling issue.

HellRaiZer

[EDIT]
Tiling problem solved, by "simulating" MIRRORED_REPEAT behavior. Everything seems to work fine right now. I must make animate now. What will give better animated results. 3D noise for every octave, interpolate over time, and reconstruct the final texture as neccessary, or generate some frames of the final texture, and interpolate over them (looping)?

I resample octaves using DevIL, but this is slow!
[/EDIT]

[edited by - HellRaiZer on October 25, 2003 1:07:57 PM]
HellRaiZer
Hello again. This starts to be funny/crazy posting replies to myself (!!!) but i really want your suggestions.

I re-implemented cloud generation and exponentiation without any use of the GPU, and i tried to make it animate. I made some experiments, and nothing was good enough.

First i tried to animate every octave indepentently, every frame, and reconstruct the final texture. This was slow!!!! No hard optimization was done, but it was unacceptable slow to fight for! Also, it didn''t looked too good (from the anim point of view) to keep it. Next i tried to build some "layers" (instead of individual octaves), without the exp pass, interpolate over time, and exp. the result. The things were different now. Performance was better (with some problems when the final texture was huge), but animation was awfull! I don''t know what to do, and more importantly, what i should expect for such a system.

I have to admit, that the p-Buffer approach was better, if you don''t want the extreme cloud-look exponentiation gives you.

The reason i''m still thinking of the p-Buffer approach is, because despite the fact that it wasn''t really good-looking, the general cloud pattern was nearly 100% unpredictable and non-repeative every different frame, without having to regenerate noise textures when the old one was out-dated. A few different frames for every octave, different tiling, different alpha value, different u,v coords for every one of them, and linear interpolation between frames, made it look completely new every time it was updated.

I''ll try to see what i can get by abusing paletted textures as Yann suggested. I have never worked with paletted texture before, so i may have a problem here.

quote:
No. It requires a dependent texture lookup, which is not supported below a GF3. You could, however, try to abuse the paletted texture functionality for that, which basically is a from of dependent lookup. Although it will probably require a copyback over the CPU, and that''s very slow. Your best best is to do the exponentiation on the CPU.


I can''t understand why this would require a copyback. Having each color as an index in the palette, in the range [0, 255], is completely the same as having the clouds as an alpha (single channel) texture. The only thing you have to do is give OpenGL the palette which will be the exponent table. What you actually have to do, is supply a different palette every time you change cloud sharpness and density. What i''m missing here?

Can i have a single-channel (8-bit) paletted texture?
Is it possible to (reg) combine an A texture with an RGB one, and take as a result a RGBA, which will be used as the cloud texture?
If it is, is it possible one of these textures be paletted, and the other true color?

Yann suggested checking if the whole precedure will be faster on the CPU instead of a GPU-CPU mix. So far, i can say that having animated noise, combined on the GPU is faster. This is of course may be my fault. In fact, thinking of Elias'' cloud demo, i can say, this is definetely my fault!

HellRaiZer

PS. Sorry for keeping posting to myself. If nobody answers, then i''ll not do it again. Thanks for your attention!!!
HellRaiZer

This topic is closed to new replies.

Advertisement