Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 10 Nov 2002
Offline Last Active Jan 07 2015 08:00 AM

Topics I've Started

Lights : Multi-pass vs Multi-light and abstraction

29 January 2007 - 05:01 AM

Hi... I'm currently experimenting with Cg interfaces in order to create shader permutations easily, at run-time and on demand. My main interest is the ability to abstract light related calculations in such a way, so the material shader shouldn't care about the light type. This include light attenuation as well as shadowmap calculations and projected textures. When i finished the basic interface for setting up parameters for shader interfaces, i thought i should try to write a multi-light shader in case to see the differences in performance from multipass. I created 2 versions of a simple dot3 diffuse shader. One which renders only one light at a time (accumulating lights through additive blending) and one which renders 4 lights at once. Both shaders perform tangent-space transformations in the fragment program (per-vertex tangent basis are passed as texcoords to the fragment program, and the light vector is transformed in tangent space using the interpolated matrices). The scene is just a box, with its faces facing inwards (something like a room), with 4 colored spot-lights with both angular and distance attenuation. Based on FPS calculations, both versions performed the same (with minor differences; sometimes the multipass approach is faster and sometimes the multi-light is faster, depending on the window size, or (more correctly), on the number of shaded pixels, if the camera is in the room). This is true even if the framebuffer is a 64-bit floating point texture (16-bit per channel RGBA). In fact when using a floating point render-target, the multi-light shader performs the same or worse than multipass and the FPS isn't steady. Now to the questions: 1) what is wrong with the above comparison? I expected the multi-light shader to perform better that the multi-pass approach, because : a) there are less texture fetches (2 for multi-light instead of 8 for multipass) b) there is no blending with the multi-light shader (no read-modify-write), and the difference should have been more visible when using a floating point render target. As a note, rendering was done on a 1280x1024 render-target (either directly to the window or to the float buffer). 2) when dealing with multiple lights in a single shader, where do you perform tangent space transformations? Inside the vertex program, passing light vectors to the fragment program as texcoords, or inside the fragment program? The first approach limits the amount of lights a single shader can process by the number of free texcoords you have to pass around data, while the second performs too many calculations in the fragment program. I know that the scene is really simple, and things would be different if i have used a more complex model (more polygons). In this case i would expect the multi-light approach to be far better than multipass because then i would be vertex bound. Now that vertex calculations are kept to a minimum, shouldn't the above mentioned points (1.a and 1.b) make a difference between the two versions? Thanks for reading this. If you have anything to comment, i'd be glad to hear it. HellRaiZer EDIT : I forgot to mention that my gfx card is a 7800GTX and the compiled shader profiles were NV_vertex_program3 and NV_fragment_program2.

Analysis of memory allocations

13 October 2006 - 08:56 PM

Hi... I have a memory problem with my engine and i can't find what is going wrong. Based on Windows' task manager my application consumes 140MB of memory, and the problem is that there is no apparent reason for that. Let me explain. When the loading phase finishes, the task manager reports about 22MB which seems logical. Until then the only thing that i can imagine taking the most of the allocated memory is model geometry, because this is the only thing that is loaded and kept for later use. Other things include image filenames, material properties, etc. After that, when the first frame is going to be rendered, i create all vertex buffers required by the visible portion of the scene, as well as load all the textures for the visible part. Because of that, i first tried to see if there is a memory leak in the image loading part but there isn't any. What happens is when an image is uploaded as a texture to GL, its data are freed from system memory (and i can see the ups and downs of the total consumed memory from task manager when a big texture is loaded). Except from images/textures the only other thing that happens at run-time is vertex/index buffer creation from the loaded geometry data. This seems to be the problem, but i can't understand why. GLExpert reports that all vertex/index buffers are allocated in video memory. And even if task manager reported video memory in the application's memory footprint, i wouldn't expect it to be more that 50MB (double the memory footprint from the loading phase). To the question. Is there any program/library which can tell me where and when all the allocations happen, and the true allocated memory for each one of them? Is there any other way i can get this info without writing custom memory allocators on my own? I will do eventually (if nothing else helps), but i would like to try something simpler/faster first. Notice that this is about unmanaged code (native C/C++). Thanks in advance. HellRaiZer PS. Sorry if this has been asked again, but i couldn't find something relevant in the forums.

Tetris packing with complex shapes

07 September 2006 - 04:59 AM

Hi... I have a problem and i don't know how to search for a possible answer. I have an infinite number of things with this shape: Tetris Block and an empty 2D lattice. I want to fill this lattice with as much of these things as i can, thus minimizing the free space in the lattice. Any object can be rotated +-90 degrees. Because of the shape is symmetric in both x and y axis, only one rotation makes sense. Every colored square in the shape has exactly the same dimensions as the others. This looks a lot like Tetris, but with more complex shapes. I tried to search how this can be done, but i haven't found something useful (a paper, an algorithm). Does somebody have any suggestions/papers on that? Any keywords for searching on Google? Anything? As you can figure out, packing multiple objects like the above, always leave some empty space. This is because you can't have one object perfectly fit inside the other. There will always be some empty cells, left and right of the blue cells. Knowing this, is there any way i can calculate the minimum empty space that can be formed inside an arbitrary lattice? Thanks in advance. And if you have any questions, please ask. HellRaiZer

Moving an animated character

29 July 2006 - 08:18 AM

Hi... I'm in the process of adding character animation to my engine, and i thought instead of trying reinventing the wheel, i should try Cal3D. For test animations i'm using md5 models. I have a problem figuring out how i can make a character move around a level. MD5 models have a bone named 'origin' which seems to be an indicator for the model's position relative to the origin. This particular bone affects all other bones (has no parent) and (e.g.) when the character walks, moves forward (away from the origin). If i make the walk animation loop, then the character jumps back to the origin and starts from there again. This is because Cal3D needs all animations to be played around the origin (the character must not move). What i'm trying to do is to repeat the walk animation from the last position, but with no success. What i have tried so far. 1) Delete all keyframes from the 'origin' bone, and make the skeleton walk without moving. In case to move him, i transform the scenegraph node that corresponds to the animated model, with a constant velocity, so the character seems to move. The problem is that the whole motion isn't realistic. The character seems to slide instead of walking. This can be fixed with a little tweaking, for the walking animation, but this doesn't solve another problem. The animations can't have a variable velocity. E.g. there is an animation where a character does a lond range attack with its hand. At the beginning he moves slowly and then jumps fast to the target. This can't be handled with one (constant) velocity. And i think this is why md5 models have this special bone. 2) I thought of accumulating the translation of the walking animation and set it (again) as the scenegraph node's transformation. This means that until the first walking cycle is finished the node's transformation is identity, and when the new cycle begins it is set to the special bone's last frame. (i hope this makes sense). The problem with this method is that i can't blend together two animations and get the correct overall movement (e.g. a walk cycle animation and a long range attack initiated in the middle of a walk cycle). How can i make this work, without writing special case code? I mean, without knowing that the walk animation just finished, or an action was blended with it, or the run animation started to blend with the walk animation. How do you handle character movement? It doesn't have to be around Cal3D, but if you have a solution for Cal3D i'd be glad to hear it. And one last thing. Is there any other free library for character animation with the same or more features as Cal3D? I'm still searching (nothing is permanent) so any suggestions are appreciated. I hope the above are clear enough. Please, if you have questions ask. I'll try to clear things up as much as i can :) Thanks in advance. HellRaiZer PS. Feel free to comment even if you aren't familiar with Cal3D. That's why i posted here, instead of the Alternative Game Libraries forum. If you think it should be there, please move it. Thnx

Emmiting assembly for calling member functions

04 July 2006 - 09:45 PM

Hi... First of all, sorry for the lengthy post. I hope someone will read it all and have some suggestions to make. I just found a bug in my scripting engine which seems really hard to remove. The code that emits assembly for a member function call is doing things wrong. Here is a little example. (Sorry for writing in scripting syntax; the whole thing could be easily translated in C but i wanted to show that this happens in a scripting language). I hope the code isn't that confusing.
struct Vector3
	var float x;
	var float y;
	var float z;

	function void Set(float xx, float yy, float zz);
	function Vector3 ScaleCopy(float scale);

function void Vector3::Set(float xx, float yy, float zz)
	x = xx;
	y = yy;
	z = zz;

function Vector3 Vector3::ScaleCopy(float scale)
	local Vector3 v;
	v.Set(x * scale, y * scale, z * scale);
	return v;

The function that has the problem is ScaleCopy(). I also put Set() above, in case for the example to be complete. Here is the emmited assembly for ScaleCopy().
1 : function ?ScaleCopy$Vector3&float!1#Global
2 : {
3 : 	param float scale[1] : -3;
4 :	local Vector3 v[1] : -1;
5 :
6 : 	new _stack[v + 0], "Vector3";				// because Vector3 is a struct, we have to create a local instance in case to return it.
7 :	push _this;						// store _this register for using it.
8 :	mov _this, _stack[v + 0];				// set _this register to point to the object for which we are calling a function.
9 :	mov _reg_0, _this[?x&float!1@Vector3 + 0];		// _reg_0 = _this.x;
10:	fmul _reg_0, _stack[scale + 0];				// _reg_0 = _this.x * scale;
11:	push _reg_0;						// push _this.x * scale on stack for the function call that follows.
12:	mov _reg_2, _this[?y&float!1@Vector3 + 0];		// _reg_2 = _this.y;
13:	fmul _reg_2, _stack[scale + 0];				// _reg_2 = _this.y * scale;
14:	push _reg_2;						// push _this.y * scale on stack for the function call that follows.
15:	mov _reg_1, _this[?z&float!1@Vector3 + 0];		// _reg_1 = _this.z;
16:	fmul _reg_1, _stack[scale + 0];				// _reg_1 = _this.z * scale;
17:	push _reg_1;						// push _this.z * scale on stack for the function call that follows.
18:	call ?Set$void&float!1&float!1&float!1, 1, "Vector3";	// call Vector3::Set(float, float, float) on the object pointed by _this register;
19:	pop _this;						// restore _this pointer to point to the correct object.
20:	mov _ret_val, _stack[v + 0];				// set _ret_val register to point to the local copy of the scale vector.
21:	ret ;							// return from the function.
22: }

I hope the assembly isn't that confusing. It is very similar to x86 asm, so i hope there is no problem understanding what is going on if you know asm. In the above code, there is a mistake. This is the first mov instruction (line 8) which moves the local Vector3 to _this register in case to call the corresponding function later. When this happens, all the following references to _this members (lines 9, 12, 15) are incorrect, because _this points to the v, instead of the true "this" object. The problem starts from the AST representation of the member function call. When emitting assembly code for a call, the first thing that is visible is the object on which we want to call a function. This means that i have to somehow set the _this register to point to the correct object, before assembling the function argument expressions. This is correct from one point of view, because if i was doing it the other way (first assembling function arguments, and then setting _this register to point to the correct object) i would have pushed too many unnessecary things on stack. In case to see what i'm saying just move lines 7 and 8 just before the call (line 18). One can suggest leaving line 7 where it is (in case not to pollute the stack) and just move line 8 just before the call. But then there is a problem if i have statements like this:
Matrix3x3 mat.
mat.Row0.Set(x * scale, y * scale, z * scale);
In this case, i have to push _this register twice on stack in case to reach the correct member for which i want to call a function. But again, function arguments are referencing members of the current object. I hope the above is clear enough. I know this is a very specific problem, but i want to believe someone will have something to suggest. If you have anything to say please say it. I'm desperate for a solution. If you need more info or anything please ask. Thanks in advance. HellRaiZer