Cloud Rendering on GPU withOUT pixel or texture shaders...

Jim Drygiannakis · 2003-10-28T13:06:07

I know that cloud rendering became a trivial topic this days, and forgive me for posting one more thread on that, but i have a little problem. I''m working on a GeForce 256 DDR, with no pixel shaders (and low performance vertex shaders). I''m trying to do as much of the cloud rendering as i can, on the GPU. I''m using render-to-texture, for blending together the octaves (4 of them, because of the performance problems). The way i blend those 4 octaves is: Equation 1: Result = 0.5 * (Oct[0] + 0.5 * (Oct[1] + 0.5 * (Oct[2] + 0.5 * Oct[3]))) <=> Result = 0.5 * Oct[0] + 0.25 * Oct[1] + 0.125 * Oct[2] + 0.0625 * Oct[3] (1) // Equation (1) // // Res1 = 0.0625 * Oct[3] + 0.0f * BackGround; float coef = 1.0f / 16.0f; glBlendColorEXT(coef,coef,coef,coef); glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ZERO); Clouds.Texture[3].Use(); RenderQuad(); // Res2 = 0.125 * Oct[2] + Res1; coef = 1.0f / 8.0f; glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ONE); glBlendColorEXT(coef,coef,coef,coef); Clouds.Texture[2].Use(); RenderQuad(); // Res3 = 0.25 * Oct[1] + Res2; coef = 1.0f / 4.0f; glBlendColorEXT(coef,coef,coef,coef); Clouds.Texture[1].Use(); RenderQuad(); // Res4 = 0.5 * Oct[0] + Res3; coef = 1.0f / 2.0f; glBlendColorEXT(coef,coef,coef,coef); Clouds.Texture[0].Use(); RenderQuad(); RenderQuad() renders the currently binded texture using ortho view, so it covers the whole p-Buffer. After that, i have a texture (in the dims of the p-Buffer), with no exponential pass. The first thing i have to do is subtract the CutOff (CloudDensity * White) texture, from the above result. I an extra pass: Equation 2: FinalResult = NoiseTexture * 1.0 - WhiteTexture * CloudDensity; coef = Clouds.CloudDensity / 255.0f; glBlendEquationEXT(GL_FUNC_REVERSE_SUBTRACT_EXT); glBlendColorEXT(coef, coef, coef, coef); glBlendFunc(GL_CONSTANT_COLOR_EXT, GL_ONE); Clouds.CutOff->Use(); RenderQuad(); glBlendEquationEXT(GL_FUNC_ADD_EXT); The above procedure is similar to Kim Pallister''s article on cloud rendering, converted to OpenGL API. My problem now is, that the final texture is very dark, and you can hardly the cloud formation on it. Kim Pallister suggest a modulate 2x for making it brighter, but how can i do it? Is there a way of executing the exponential pass on the GPU withOUT pixel and texture shaders??? I remember Yann suggested, keeping the exponential values in a 256x1 1D texture and using this as a lookup table. But this needs texture shaders, doesn''t it? Forgive my ignorance on that kind of stuff. This is first "project" that uses so much of the GPU, and i''m starting to confuse things up. If it isn''t possible (the exponatiation pass on the GPU), what can i do to make them look a little brighter? Calculating everything on the CPU, and having them animate, is a little hard to optimize, so i need your opinion on that. Thanks in advance. HellRaiZer

Graphics and GPU Programming Programming

Started by HellRaiZer October 23, 2003 04:07 AM

12 comments, last by HellRaiZer 20 years, 5 months ago

Yann L

1,806

October 26, 2003 06:04 PM

OK, here some answers. Took a while, but I was busy with Real Life™ this weekend

quote:Original post by HellRaiZer
I want your opinion on the steps. Esp. on the frequency series...

1) Generate n octaves of noise, using the same dimensions for all of them (e.g 32x32), and with (Frequency * 2^i, Persistance^i) for every one of them.

2) Having those n "textures" at (e.g) 16x16 resolution, start stretching them in the order: (final texture 256x256, n = 5)

3) Add those octaves together by tiling them accordingly, using a weighted averange.

4) Do the exponential pass and shading (???)

Yes, that is correct.

quote:
I think, i understand the above paragraph now. I was confused with the terminology. x is the tile coef used for the 0 octave of the above table. So if x = 2 then oct[4] must be tiled 32x32 times instead of 16x16. Is this correct?

Yes. I now realize that the variable name I picked (''x'') must have been a little confusing. Just call it ''tc'' for tile coefficient, if you want.

quote:
Using octaves'' resolution 16x16, when creating the noise on the GPU, is just for that. I mean, you keep it low, because it''s calculated on the GPU, or this resolution gives good results either way (CPU - GPU)? Should i keep it small for faster re-generation, or there are other tricks for that?

16x16 is just a size. I use 64x64 on my current implementation, but it doesn''t really matter that much. Don''t make it too large, as you''ll have to regenerate this on the CPU to animate the clouds. And larger pieces will take longer to generate and upload. Don''t make it too small, or the noise will show visual artifacts. Typically, 16², 32² or 64² are good candidates.

quote:
Till now my stretching, as i said earlier, is like NEAREST, NEAREST filter, and it looks really bad. Should i implement something like bilinear filtering for stretching the originally-small octaves?

Well, if you want good quality, then the answer is definitely yes. The advantage of adding the octaves on the GPU, is that you''ll get the filtering automatically. If you want to render the noise on the CPU, you''ll also need to implement some form of interpolation. Back in the old days, this was quite hard and very slow. But nowadays, with SIMD, it''s not that bad anymore.

quote:
Is there a similar easy way for a better stretching "filter"? Can i perform tiling and stretching in one pass? Or i have to stretch every octave to its final dimensions, and then add them togoether by tiling? (Does it make any sense??)

There are tons of different scaling techniques for 2D images available. Many can combine tiling and interpolation. Tiling is nothing but a simple wrap-around, at least if the textures are powers of two. Search the net, there are plenty of resources out there. For good performance, I would definitely recommend SIMD ASM though.

quote:
Until summation, should i keep noise data in the [-1, 1] field, or i can convert them to [0, 255] field? It saves 3 bytes, but i have to transform the values back to [-1, 1] for weighting them and summing.

Depends. You can keep them biased from -128 to 127, in one signed byte per texel. But you need to do the octave combining in a higher precision format, otherwise you''ll run into banding artifacts. Floating point is OK, but requires expensive integer-float conversions. Fixed point is more appropriate here, 16.16 (32bit) or 8.16 (24bit) usually work pretty well. You could also experiment with byte to float conversion tables, this might be fast than a direct conversion. Depends on the CPU.

quote:
3D noise for every octave, interpolate over time, and reconstruct the final texture as neccessary, or generate some frames of the final texture, and interpolate over them (looping)?

Compute a new noise octave every 10 or 20 frames, and interpolate the other ones. That way, you can gradually compute the noise tiles, distributed over several frames. Divide and conquer. That''s a nice trick to reduce the per-frame noise generation overhead.

quote:
First i tried to animate every octave indepentently, every frame, and reconstruct the final texture. This was slow!!!! No hard optimization was done, but it was unacceptable slow to fight for! Also, it didn''t looked too good (from the anim point of view) to keep it. Next i tried to build some "layers" (instead of individual octaves), without the exp pass, interpolate over time, and exp. the result. The things were different now. Performance was better (with some problems when the final texture was huge), but animation was awfull! I don''t know what to do, and more importantly, what i should expect for such a system.

The speed is a question of optimization. This is where ASM and SIMD (MMX, SSE, 3DNow) come in very handy. In practice, you can get basic Perlin noise tile generation to be extremely fast this way. In my case, it takes more time to upload the new noise textures to the GPU, than to generate them... But don''t worry about that right now, make it work and look good first.

About the animation. Well, animating realistic clouds by changing the individual octaves is very difficult to adjust. My advice: don''t bother. In reality, clouds will never morph that way. The most realistic effects are sometimes the easiest. Animating the cloud density offset and base exponent factor give very cool results (for example simulating an approaching storm). General cloud movement is typically done by linear shift of the tile texture coordinate offsets in the wind direction. Perhaps in multiple layers. That will give you basic moving clouds. Don''t let it wrap over at the edges, but generate new tiles. This way, you''ll have infinite amount of never repeating clouds. If at the same time you slowly animate the base octave tiles, the result will be very nice.

quote:
I have to admit, that the p-Buffer approach was better, if you don''t want the extreme cloud-look exponentiation gives you.

Of course the GPU generation is better, with and without the exponentiation. But in the case of a GF1, I don''t if it''s that good from a performance point of view. You have to try and decide.

quote:
I can''t understand why this would require a copyback. Having each color as an index in the palette, in the range [0, 255], is completely the same as having the clouds as an alpha (single channel) texture. The only thing you have to do is give OpenGL the palette which will be the exponent table. What you actually have to do, is supply a different palette every time you change cloud sharpness and density. What i''m missing here?

That OpenGL will not accept an RGB or alpha image (such as the one in your p-buffer) as valid paletted texture data. For that to work, you need to supply a texture with a GL_COLOR_INDEX format, and a GL_COLOR_INDEX8_EXT internal format. So you need to readback your greyscale image, and reupload it with the above mentioned format parameters. From a point of view of the GPU, it''s completely unnecessary. This is an API limitation, as the paletted texture functionality was never intended to be (ab-)used in such a way...

quote:
Can i have a single-channel (8-bit) paletted texture?
Is it possible to (reg) combine an A texture with an RGB one, and take as a result a RGBA, which will be used as the cloud texture?
If it is, is it possible one of these textures be paletted, and the other true color?

Yes, yes and yes.

quote:
Yann suggested checking if the whole precedure will be faster on the CPU instead of a GPU-CPU mix. So far, i can say that having animated noise, combined on the GPU is faster. This is of course may be my fault.

Optimization

HellRaiZer

1,001

Author

October 27, 2003 02:15 AM

quote:
OK, here some answers. Took a while, but I was busy with Real Life™ this weekend.

I tend to forget about Real Life some times

Yesterday was my name''s day (how do you call it?) and i was fighting with clouds... The phone was ringing all the time, and i was thinking of octaves and how to combine them!!!! Am i becoming a freak???

quote:
Depends. You can keep them biased from -128 to 127, in one signed byte per texel. But you need to do the octave combining in a higher precision format, otherwise you''ll run into banding artifacts. Floating point is OK, but requires expensive integer-float conversions. Fixed point is more appropriate here, 16.16 (32bit) or 8.16 (24bit) usually work pretty well. You could also experiment with byte to float conversion tables, this might be fast than a direct conversion. Depends on the CPU.

I haven''t thought of the [-128, 127] signed byte field. This must work. And about the byte to float conversion table, it''s a great idea. I''ll try it.

quote:
About the animation. Well, animating realistic clouds by changing the individual octaves is very difficult to adjust. My advice: don''t bother. In reality, clouds will never morph that way. The most realistic effects are sometimes the easiest. Animating the cloud density offset and base exponent factor give very cool results (for example simulating an approaching storm).

When i mentioned

quote:
animate every octave indepentently, every frame, and reconstruct the final texture

i meant, "interpolate over a 3d noise field for every octave over time..." not regenerate octaves every frame! This will be worst of course. But i can''t understand what you are suggesting. Animating cloud density, and base exponent, doesn''t require a texture regeneration and uploading? If i have a 1024^2 cloud texture, wouldn''t it be as slow to reconstruct, without interpolating octaves? In fact uploading may be also slow!

What''s a good resolution for the cloud texture? I know you will say that you texture it procedularly, and there is no fixed size, but i can''t understand it! Proceduarly texturing and infinite plane must require huge amount of calculations to make it look good. I can''t think of a way to implement this, so i must have a fixed size cloud texture.

quote:
General cloud movement is typically done by linear shift of the tile texture coordinate offsets in the wind direction. Perhaps in multiple layers. That will give you basic moving clouds. Don''t let it wrap over at the edges, but generate new tiles. This way, you''ll have infinite amount of never repeating clouds. If at the same time you slowly animate the base octave tiles, the result will be very nice.

I read this in the "Sky Rendering Techniques..." thread, and you describe it so "easy", that make me think i really miss something here! Memory movement of the cloud data to arbitary directions (wind direction) makes me sick, only by thinking of it!!! About generating new noise for the edges, it seems easy enough.

quote:
That OpenGL will not accept an RGB or alpha image (such as the one in your p-buffer) as valid paletted texture data. For that to work, you need to supply a texture with a GL_COLOR_INDEX format, and a GL_COLOR_INDEX8_EXT internal format. So you need to readback your greyscale image, and reupload it with the above mentioned format parameters. From a point of view of the GPU, it''s completely unnecessary. This is an API limitation, as the paletted texture functionality was never intended to be (ab-)used in such a way...

I forgot about the p-Buffer''s accepted formats. Only RGB_ARB and RGBA_ARB are accepted; no COLOR_INDEXx_ARB!

I now have to search for bilinear filter using MMX/3DNow!, and a way to have stretching and tiling in one fast pass!

HellRaiZer

HellRaiZer

1,001

Author

October 28, 2003 03:37 AM

I''ve managed to get some kind of basic animation/motion. The problem is that it''s moving only to one direction. And this direction is the V texture coordinate. That''s because it''s easier to move memory of a bitmap to that direction. It''s just a simple memmove texture data one line at a time, and generate new data for that line. I now have to make it move to arbitary directions, and of course, generate the appropriate data. Any ideas on that, to make my life easier?

If i keep the final texture to small dimensions, then updating every frame, even with 8 octaves of noise, is fast. The problem is when the texture is big (i have to split the proccess in multiple frames), or when i don''t want the update to happen every frame, because that way the clouds are moveing fast. If one of the above is true, then the motion looks blocky! And that way i can''t have a speed parameter to handle the shift, so it looks smoother, and stable (FPS indepented).

Also, i''m trying to find a way to apply cloud deformation to the motion. My problem is that after sometime, all the data will not have any "connection" to the firstly generated data, and i have to generate data for at least 2 frames of animation each time i shift texcoords, so in the middle of 2 shifts (N frames) i can use this to interpolate on time. Any suggestions on that?

HellRaiZer

HellRaiZer

1,001

Author

October 28, 2003 01:06 PM

What do you think of these?

Octaves = 5, AttenFactor = 32.0f

Octaves = 6, AttenFactor = 16.0f

Octaves = 8, AttenFactor = 12.0f

I''ve just finished writing a basic DDA, based on the "Grid Tracing" algorithm for heightfield raytracing, and i hacked the light attenuation formula to see what i can get. The results are the above with different parameters.

Can somebody explain in simple words Harris method on cloud shading? I know this is not a sane question, but i can''t understand anything from it. Is there another (simpler) way of calculating/simulating cloud shading? I remember Kill mentioned something in the huge thread, about using Perlin Noise using cloud''s heightfield to calculate shading. Can somebody explain this a little? Does it give the same (or at least similar) results as Harris'' method?

The way i calculate shading is this:

for every voxel{	calc ray from sun to voxel (v1)	calc intersection point with the cloud bbox	find intersection voxel''s x, z coords in voxel field (entry to the field) (curVox = entry)		set voxel''s (v1) color to 255 (white)	set attenuation to 0.0f	start traversing the field.	{		if the ray passes through current voxel (curVox)		{			calculate how much the ray travels in the voxel (len)			calculate the maximum possible ray path length (maxlen)			increase total attenuation by (attenFactor * len / maxlen)		}		advance to next voxel the ray will hit (curVox = nextVoxel)	}	end traversing the field if the ray exceeded cloud''s bbox.	voxel''s (v1) color -= attenuation}

What color should i use for sun? Is skydome''s color at sun''s position enough?

I know there are problems with the pics; any suggestion appreciated. Something to note is that everything is static now.
How can i combine movement as i described it in the previous post, with shading? Do i have to re-shade all the voxels, or the just the new voxels?

HellRaiZer

PS. Davepermen if you have any problem uploading stuff to your space, please tell it. If you want to delete those, be my guest. I don''t have any problem. Anyway thanks

HellRaiZer

Cloud Rendering on GPU withOUT pixel or texture shaders...

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Cloud Rendering on GPU withOUT pixel or texture shaders...

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines