Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

Vilem Otte

Member Since 11 May 2006
Offline Last Active Yesterday, 06:56 PM
****-

#5003921 Projected grid water using a sphere instead of a plane?

Posted by Vilem Otte on 25 November 2012 - 05:15 AM

Basically why you need paper, projected grid approaches are really simple thing. Just generate rays from your camera, test them against plane, sphere, ... and compute vertex locations for your grid*. For avoiding seams at screen edges, you basically use (for ray generation) camera with a little more fov, that your actual camera.

And because vertices you get are a grid, computing indices for drawing is very simple (same as indices for rendering quad-based uniform grid). Texture coordinates (this also counts for normals and tangent vectors) can be computed from vertex position (because you know it's a simple plane or sphere).

Then you just need some awesome FFT for displacement, good vertex & fragment shaders (maybe even tessellation one) and some render perfect water. Posted Image

*Note for sphere you get 2 solutions, if you are above water surface, pick the closer one, if you're under water surface, pick the further one (basically it will be non-negative one).


#5001029 Making an octree with OpenCL, thread-safe issues

Posted by Vilem Otte on 14 November 2012 - 04:30 PM

You actually answered yourself - you need to make code thread safe (same way you'd do it on CPU). You *should* use atomics for this (I'm not sure, though I think now every GPU supporting OpenCL (1.1 i think, where it became core) supports also cl_khr_global_int32_base_atomics ... I don't know which GPU you have, but I'd try updating drivers & SDK for it) even though it will make code, as you stated, slower (and there probably isn't any other solution, that would be effective enough, either). The question is, whether it will be faster than doing that on CPU (with heavily optimized algorithms)?

Implementing F.e. mutex (which might work for what you need) with atomics is then really straight forward and easy stuff ... you actually could use the same construction:
while (atom_cmpxchg(mutex,1,0)==0);  // Acquire mutex
/** Modify data in critical section **/
atom_cmpxchg(mutex,0,1);				   // Release mutex




#5000750 GPU Skinning problem (MD5 Doom)

Posted by Vilem Otte on 13 November 2012 - 08:04 PM

As I've implemented MD5 skinning on GPU, I can give you few tips (I don't have time right now to move through your code as I'm being quite in time-pressure till thursday)...

1.) Start from simple MD5 file, that contains only translation and single bone. If it works, create only rotation, then combine them together, then build double-bone - e.g. trace where the problems lie
2.) Double check your math. I was sitting for few hours on problem with inverse matrices, then I found that VS actually *removes* few instructions from my SSE2 matrix inversion code (... forgot volatiles *huh* .. problem hadn't appeared on my main dev platform - linux & gcc)
3.) And triple check your quaternion <-> matrices. Often the problem lies only in rotations, and doing row-major matrix out of quaternion instead of column-major matrix is totally wrong (because of nature of 3x3 rotational matrix, you will actually get inverse rotation, and this can create mess you see).

And last point - try your skinning (e.g. inverse-bind-pose and matrix transformation) on CPU first, it's a lot easier to debug there, than on GPU. Posted Image

If you'll still have troubles implement it, I can post fragments of my code (with little how-to) here, although not till Thursday evening.


#4897521 When is the next version of the CG Toolkit going to be released?

Posted by Vilem Otte on 26 December 2011 - 06:18 PM

Well, the next Cg toolkit will probably be released sooner or later (on official website, the last version in 3 from Feb 2011). Although I strongly recommend you to move to GLSL.

Why? Well it is quite simple, Cg support is good on NVidia gpus, although it suck a lot on AMD ones; On the other hand GLSL support is "same good" (there are differences, though small ones) on both.




#4894227 Choose Path in Path Tracing

Posted by Vilem Otte on 15 December 2011 - 12:01 PM

In physics, you would use a material's index of refraction to tell you how likely a photon is to be reflected vs refracted.


Wrong.

In physics the photon actually reflects with some energy and refracts with some energy (see. Fresnel equations for that). As Fresnel equation is defined as: "When light moves from one medium with refractive index n1 to another medium with refractive index n2, both reflection and refraction might occur. The relationship between incident ray and reflected ray angles is Law of reflection, and relationship between incident ray and refracted ray is Snell's law of refraction. In simple model, the fraction of power that has been reflected is known as reflectance and the fraction of power that has been refracted is known as transmittance - the rest will be absorbance (if any), to keep model simple, we assume both mediums aren't magnetic.
Both reflectance and trasmittance depends on polarisation of light. All equations can be found in any good physics book.


Thats the only correct way to do it, otherwise it is just an approximation.




#4892686 Radiosity c++ tutorial

Posted by Vilem Otte on 10 December 2011 - 07:30 PM

Don't know if it is related, but I've implemented algorithm from Hugo Elias website some 7 years ago (unwrapping static meshes to light maps is evil!!!) through render-to-texture and even in that time an average ray tracer could beat it in speed, or was it that using GPU for the actual implementation is that slow?

Ah... that was a long time ago, I bet that most modern games doesn't use light maps (mostly) - I personally would think twice before implementing them (of course if it is not for learning purpose - then go for it), as you can do dynamic lighting on most PCs today (on some even dynamic GI, ... eh... good dynamic GI, not that SSAO trick that everyone overdo and then it looks very very bad :wink: ... not that it couldn't be nice, but well most people overdo SSAO a lot).




#4877393 Indoor rendering

Posted by Vilem Otte on 26 October 2011 - 06:02 PM

Well...

I've been recently working with high-performance BVHs in my scene management subsystem in my engine and actually it works a bit better than BSP (because it can dynamically handle fully dynamic environment). View frustum culling is at very low price (giving a bit of boost to performance), but Occlusion culling is not giving that much boost (depends on scene) - it's price is really a lot higher than frustum culling test.

As far as I can tell - try BVHs, you won't get as good results as with BSPs (in terms of speed), but you'll be able to handle scene fully dynamically (BVHs can be built/rebuilt in realtime quite quickly).




#4877113 Morph targets.

Posted by Vilem Otte on 26 October 2011 - 03:33 AM

Okay, so there are generally 2 (actually 3) basic ways how to do it. They are:
1) Using CPU to perform morphing and then pass every frame geometry through VBO to GPU (quite resource eating, but when you need geometry not on GPU but also in RAM - it is the best way)
2) Using GPU on vertex shaders
3) Using GPU in OpenCL

I'll try to describe the 1. and 2.

1.) You actually do in your code:

Let CModel be class containing mNumVertices (number of vertices in model), mVertices (actuall vertices of model - a CVector3 class with x, y and z float members). And g_mInterp be value between 0.0 and 1.0 holding morph target phase between CModel1 and CModel2 Pseudo code:

// You have to load 2 models (they have to have same number of vertices and CModel1.mVertices[i] has to has morpth target in CModel2.mVertices[i])

// ... During initialization ...
assert(CModel1.mNumVertices == CModel2.mNumVertices); // Let us check if we have same number of vertices on both sides
CModel ModelResult;
ModelResult.mNumVertices = CModel1.mNumVertices;
ModelResult.mVertices = new CVector3[ModelResult.mNumVertices];

// ... During rendering loop ...
for(int i = 0; i < ModelResult.mNumVertices; ++i)
{
    ModelResult.mVertices[i] = CModel1.mVertices[i] * (1.0f - g_mInterp) + CModel2.mVertices[i] * g_mInterp;
}

// Now you have morph target in ModelResult stored, you just need to render it (you can create VBO from its vertices and use draw arrays (For example) 

Okay, but well this is quite resource waste - because if you're just going to use it for rendering on GPU, you can do most of the stuff in vertex shader...


2.) Doing it all in vertex shader is quite straight forward

Let all variables stay the same, except that our CModel class would contain mVbo (unsigned integer type) which is VBO of our vertices (actually it is "ID of VBO on GPU in VRAM" - but well....)

// .. During initialization ...
assert(CModel1.mNumVertices == CModel2.mNumVertices); // Let us check if we have same number of vertices on both sides

// Load shader and don't forget to setup these attributes for it
glBindAttribLocationARB(this->ShaderProgram, 0, "Model1_Vertex");
glBindAttribLocationARB(this->ShaderProgram, 1, "Model2_Vertex");

// ... During rendering ...
// Turn on your shader
glUniform1fARB(glGetUniformLocationARB(ShaderProgram, "Interp"), g_mInterp);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, CModel1->mVbo);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArrayARB(0);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, CModel2->mVbo);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArrayARB(1);

// Render (F.e. using glDrawArrays)
glDrawArrays(GL_TRIANGLES, 0, CModel1->mNumVertices);

glDisableVertexAttribArrayARB(0);
glDisableVertexAttribArrayARB(1);

// Turn off your shader


But we're still not yet done, we need the shader source to actually perform morphing (I'll post just vertex shader).


// Don't forget GLSL version, and other stuff for your shader (it won't be as short as mine)
in vec3 Model1_Vertex;
in vec3 Model2_Vertex;

uniform float Interp;

void main()
{
    // This could also be done through lerp built-in function (but to see that the code is similar to what has been done on CPU
    vec3 Morphed_Vert = Model1_Vertex * (1.0 - Interp) + Model2_Vertex * Interp;

    gl_Position = gl_ModelViewProjectionMatrix * vec4(Morphed_Vert, 1.0);
}


Hope I have everything easy to understand ... if not, feel free to ask.

Btw. I wanted to actually write morph-target library a long time ago and I'm actually really thinking about that again ... thanks :D


#4862274 Animation File Format Suggestion

Posted by Vilem Otte on 15 September 2011 - 06:19 PM

Actually you can use Collada. It stores animation and it is easy to load. I think the same goes for FBX. Then you'll just make your own format out of it (because both of these two aren't binary but text and also these stores a lot of unnecessary information).

I'm also using my own format (though it is very similar to MD5 in the end - dunno how that happened :D).




#4862124 Texture compression DDS / S3TC

Posted by Vilem Otte on 15 September 2011 - 10:29 AM

Lower quality


Not necessarily, while using same space as standard texture, S3TC (DXT1) compressed one can have 8-times the size of uncompressed one actually, while still keeping very good quality - so you can actually have a lot more micro details on texture, even with a little less colors (still not too much visible difference between s3tc compressed one and uncompressed - in most cases...)

Slower rendering?? Or is it actually faster (less bandwidth)??


Actually it is faster, less bandwidth, and also u have less memory storage.

About the performance. I've been tought, 100 years ago, that decompressing takes time. Not sure how the video-card does things, but does the decompression hit the performance?


Actually S3TC compressions are hardware implemented on GPU, so it won't take any time more than using standard uncompressed texture.

How to calculate the video memory usage of a compressed texture?


Memory usage = Number of pixels * Bits per pixel / Compression Ratio

Or does OpenGL / GPU convert the data?


The good thing on S3TC compressions is that GPU still keeps them compressed in VRAM - e.g. they're actually a lot smaller than standard textures.

Note that S3TC compressions are a must for Megatextures.

Btw. I think I'm still having some S3TC compression code lying around if you would like to try encoding, but still I don't know if its patented and for how long (I wrote it just out of curiosity and I even read patent, my code is different than what they describe :P)





#4790159 scenegraph for triangle-data

Posted by Vilem Otte on 24 March 2011 - 04:47 PM

As far as I know, picking triangles is best solved using ray-cast to scene and testing it against triangles. One ray-tri test is here (Moller test, one of the most famous today) - http://www.cs.virginia.edu/~gfx/Courses/2003/ImageSynthesis/papers/Acceleration/Fast%20MinimumStorage%20RayTriangle%20Intersection.pdf and also theory behind barycentric coordinates: http://mathworld.wolfram.com/BarycentricCoordinates.html (read first this - it might also help you with picking edges).

For point picking I would use replacing them with virtual spheres (not seen) and perform ray-cast against them. Sphere radius would grow with depth from camera (doesn't count for orthogonal projections, just for perspective), so your picking area would be same by distant and close sphere. Writing ray-sphere intersection is just a minute (or thinking it out of the head) with simple quadratic equation ... a little help for ray-sphere: Quite uneffective, but working, although very good explanation and easy to understand (e.g. read first this) http://wiki.cgsociety.org/index.php/Ray_Sphere_Intersection much better here http://www.dreamincode.net/forums/topic/124203-ray-sphere-intersection/ (read after, to see, how it can be computed effectively).

I'm not expert for writing mesh editors (I actually had never written any), but picking can be done this way quite minimalisticaly, in easy way and is quite effective (and if you pack your meshes under Bounding Volume Hierarchies, then bloody effective).


#4790155 Still specular

Posted by Vilem Otte on 24 March 2011 - 04:32 PM

In this case, some trickery with matrices is needed - on ancient Fixed function we generated projection coordinates with glTexGen and parameter GL_REFLECTION_MAP. How to compute the exact coordinates is described here - http://www.opengl.or...ics_of_glTexGen

Havin' a good day, so also explanation:
<BR>for(i=0; i<total; i++)<BR>{<BR>   myEyeVertex = MatrixTimesVector(ModelviewMatrix, myVertex[i]);<BR>   myEyeVertex = Normalize(myEyeVertex);<BR>   myEyeNormal = VectorTimesMatrix(myNormal[i], InverseModelviewMatrix);<BR>   dotResult = 2.0 * dot3D(myEyeNormal, myEyeNormal);<BR>   //I am emphasizing that we write to s and t and r. Used to sample a cubemap.<BR>   myTexCoord[i].s = myEyeVertex.x - myEyeNormal.x * dotResult;<BR>   myTexCoord[i].t = myEyeVertex.y - myEyeNormal.y * dotResult;<BR>   myTexCoord[i].r = myEyeVertex.z - myEyeNormal.z * dotResult;<BR>}<BR>

First compute vector from eye to vertex (as described here - gl_ModelViewMatrix * gl_Vertex) and normalize it.
Then compute eye normal that would be gl_NormalMatrix * gl_Normal and compute 2 times dot product of eye normal vector and eye normal vector, in the end generate all three coordinates as described in code, in GLSL it might look as:
<BR>vec4 eye_vert = normalize(gl_ModelViewMatrix * gl_Vertex);<BR>vec3 eye_norm = gl_NormalMatrix * gl_Normal;<BR>float dp = 2.0 * dot(eye_norm, eye_norm)<BR>reflection_coord.x = eye_vert.x - eye_norm.x * dp;<BR>reflection_coord.y = eye_vert.y - eye_norm.y * dp;<BR>reflection_coord.z = eye_vert.z - eye_norm.z * dp;<BR><BR>

Note that reflection_coord is vec3 going out to geometry and/or fragment shader (pixel shader), I also don't remember if you didn't have to do eye_vert.xyz = eye_vert.xyz / eye_vert.w; although it should be (according to OpenGL.org wiki correct, and also I didn't have time to test it with built-in matrices, as I build my own matrices that I pass to glsl and I'm not using built-in ones).

EDIT: Sorry, my mistake - I was thinking it is cubemap reflection (my appologize, I actually posted in wrong forum and realized just today ... could mod please delete this post (when reading this)? It's definitely not related, Thank you ... ) I wonder if I can report myself? And also note that there isn't working function of delete your own post on forum...


#4788852 Libs to use graphic cards for calculation

Posted by Vilem Otte on 21 March 2011 - 04:48 PM

In fact your OpenCL code should also work on NVidia directly. If there are some issues it is due to differences in OpenCL implementation on NVidia and AMD. I've been mostly working with AMD gpus (even with some OpenCL) and I didn't have any difficulties running applications on NVidia gpus.

Although you need to test it and if it won't work, it should be just some little issues (this is a bit similar to little issues in GLSL under AMD and under NVidia).


#4785960 Efficiently clipping lights in a deferred shader

Posted by Vilem Otte on 15 March 2011 - 03:21 AM

2) scissor optimization. I don't know how to improve this right off the bat - using GL's internal scissoring might be more efficient since the culling would occur earlier in the pipeline, but it would require me to restructore the entire shader from an internal loop (run on the GPU) to an external loop (run on the CPU), which IMO should be a bad idea. Again, comments are welcome.

Yes, definitely go this way. You may also try stencil test to cull out light volumes (it optimizes also fairly good) ... with stencil culling you might not need to restructure entire shader, but you will have to add some lines to CPU code.


#4781309 GPU and system RAM

Posted by Vilem Otte on 03 March 2011 - 01:51 AM

When you have 32-bit system, there is a lot less addressable space than 4 GiB = 4096 MiB. First of all, there are some resources reserved right after bootloader, when you're detecting hardware and setting up paging (32 MiB I think for 4096 MiB addressable space), then comes the system loading (lets say some 128 MiB at least), okay - now we can setup VGA - if we use VESA modes, we have to store whole frontbuffer (and maybe also backbuffer) in memory ... this is simple, direct and great (though no GPU 3D rendering ... and well that is not needed, based upon what you're doing; ofc you can do fancy graphics without hardware rendering).
If you use hardware rendering you "dont" need to store whole GPU vram, as there were written.
Ad Kernel mode reserved space - there is some, but don't say exact number ... there is not just Windows OS ;)




PARTNERS