Jump to content

  • Log In with Google      Sign In   
  • Create Account


Morph targets.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
19 replies to this topic

#1 Wilhelm van Huyssteen   Members   -  Reputation: 903

Like
0Likes
Like

Posted 25 October 2011 - 01:04 PM

Hi.

I want to add support for Morph targets to my engine and im looking for some pointers as to how I i should pack the data into my VBO's and how i should set up vertex atributes etc.

A link to a good resource or a quick high level explanation would be very much appreciated!

Thnx in Advance!

Sponsor:

#2 Vilem Otte   Crossbones+   -  Reputation: 1347

Like
2Likes
Like

Posted 26 October 2011 - 03:33 AM

Okay, so there are generally 2 (actually 3) basic ways how to do it. They are:
1) Using CPU to perform morphing and then pass every frame geometry through VBO to GPU (quite resource eating, but when you need geometry not on GPU but also in RAM - it is the best way)
2) Using GPU on vertex shaders
3) Using GPU in OpenCL

I'll try to describe the 1. and 2.

1.) You actually do in your code:

Let CModel be class containing mNumVertices (number of vertices in model), mVertices (actuall vertices of model - a CVector3 class with x, y and z float members). And g_mInterp be value between 0.0 and 1.0 holding morph target phase between CModel1 and CModel2 Pseudo code:

// You have to load 2 models (they have to have same number of vertices and CModel1.mVertices[i] has to has morpth target in CModel2.mVertices[i])

// ... During initialization ...
assert(CModel1.mNumVertices == CModel2.mNumVertices); // Let us check if we have same number of vertices on both sides
CModel ModelResult;
ModelResult.mNumVertices = CModel1.mNumVertices;
ModelResult.mVertices = new CVector3[ModelResult.mNumVertices];

// ... During rendering loop ...
for(int i = 0; i < ModelResult.mNumVertices; ++i)
{
    ModelResult.mVertices[i] = CModel1.mVertices[i] * (1.0f - g_mInterp) + CModel2.mVertices[i] * g_mInterp;
}

// Now you have morph target in ModelResult stored, you just need to render it (you can create VBO from its vertices and use draw arrays (For example) 

Okay, but well this is quite resource waste - because if you're just going to use it for rendering on GPU, you can do most of the stuff in vertex shader...


2.) Doing it all in vertex shader is quite straight forward

Let all variables stay the same, except that our CModel class would contain mVbo (unsigned integer type) which is VBO of our vertices (actually it is "ID of VBO on GPU in VRAM" - but well....)

// .. During initialization ...
assert(CModel1.mNumVertices == CModel2.mNumVertices); // Let us check if we have same number of vertices on both sides

// Load shader and don't forget to setup these attributes for it
glBindAttribLocationARB(this->ShaderProgram, 0, "Model1_Vertex");
glBindAttribLocationARB(this->ShaderProgram, 1, "Model2_Vertex");

// ... During rendering ...
// Turn on your shader
glUniform1fARB(glGetUniformLocationARB(ShaderProgram, "Interp"), g_mInterp);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, CModel1->mVbo);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArrayARB(0);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, CModel2->mVbo);
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArrayARB(1);

// Render (F.e. using glDrawArrays)
glDrawArrays(GL_TRIANGLES, 0, CModel1->mNumVertices);

glDisableVertexAttribArrayARB(0);
glDisableVertexAttribArrayARB(1);

// Turn off your shader


But we're still not yet done, we need the shader source to actually perform morphing (I'll post just vertex shader).


// Don't forget GLSL version, and other stuff for your shader (it won't be as short as mine)
in vec3 Model1_Vertex;
in vec3 Model2_Vertex;

uniform float Interp;

void main()
{
    // This could also be done through lerp built-in function (but to see that the code is similar to what has been done on CPU
    vec3 Morphed_Vert = Model1_Vertex * (1.0 - Interp) + Model2_Vertex * Interp;

    gl_Position = gl_ModelViewProjectionMatrix * vec4(Morphed_Vert, 1.0);
}


Hope I have everything easy to understand ... if not, feel free to ask.

Btw. I wanted to actually write morph-target library a long time ago and I'm actually really thinking about that again ... thanks :D

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com


#3 Wilhelm van Huyssteen   Members   -  Reputation: 903

Like
0Likes
Like

Posted 26 October 2011 - 02:35 PM

Thnx.

I just have 1 concern. I see that you bind two sets of vertex coords and you morph between them. I have a detailed character model. It has 31 facial expressions (each being a morph target). Should i create 31 VBO's + vertex attrributes? and what if i later have a model with more than 31? Also the facial expression morph target only affects a small fraction of the total model. Storing all the vertex coords for the detailed character model 31 times seems a bit bad.

I was thinking of just seperating the characters face from the rest of its body so that the rest of the body doesnt get replicated unecesarily but im not sure if this is the best way to do it. But Il still be using 31 VBO's imposing a hard limit of 31 morph targets per model...

#4 Vilem Otte   Crossbones+   -  Reputation: 1347

Like
0Likes
Like

Posted 26 October 2011 - 03:59 PM

Actually there is no game, nor engine that actually blends between F.e. 31 morph targets. They're ignoring those that has none weight (and you blend just on those that has some weight in interpolation - I showed simple example where there is just linear interpolation between two morph targets), e.g. you might have 31 VBOs in RAM and 31 corresponding morphing weight (e.g. how much it affects geometry) - you select just 4 (actually I think that most games use just 2 morph targets at once) that has the highest effect and ignore the others (you lose something, but I'm not sure if you'd be able to bind 31 VBOs at once, and considering performance - it is probably better to stick with just 4 affecting morph targets at once).

Even though If you actually need high precision morphing (e.g. to compute even with morph targets that has little or almost none effect), it might be better to perform CPU-based morphing (although I presume you're developing a game or demo - so you probably won't need it).

E.g. summed - you optimize it by using just morph targets that take significant effect (the less their count is, the better for performance it is).




My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com


#5 dpadam450   Members   -  Reputation: 842

Like
0Likes
Like

Posted 27 October 2011 - 01:04 AM

You can only bind one VBO at once. My suggestion is having one massive VBO, Where you put the texture coords first, then you can put each morph target data consecutively after that, in the same vbo.

Then when you draw you just bind the 1 massive vbo, and when you are setting attributes, just change the pointers to what is the start morph attributes and end morph attributes.

#6 Wilhelm van Huyssteen   Members   -  Reputation: 903

Like
0Likes
Like

Posted 27 October 2011 - 02:21 AM

OK that makes sense. Il use the 4 morphs that will have the most significant impact on the model (have the highest interpolation value). One last thing about storing all this data in VBO's. If the model i loaded has 31 it would stil mean i need to store the coords of all the vertices 31 times. even if only a small fraction of the vertices get affected by the morphs. Also since i pack more than one model into a VBO it would mean that i would replicate the coords of even a model that doesnt have any morphs. Unless I always make sure that a big model with lots of morphs gets its own VBO or is there a better way? I guess i can even take all the vertices that has morph targets and place them in their own VBO to ensure that no coords gets replicated unnecesarily. Does this make sense?

#7 RobTheBloke   Crossbones+   -  Reputation: 2295

Like
0Likes
Like

Posted 27 October 2011 - 10:10 AM

Also since i pack more than one model into a VBO it would mean that i would replicate the coords of even a model that doesnt have any morphs.

So don't do that......

I guess i can even take all the vertices that has morph targets and place them in their own VBO to ensure that no coords gets replicated unnecesarily. Does this make sense?


Yes it does....

#8 dpadam450   Members   -  Reputation: 842

Like
1Likes
Like

Posted 27 October 2011 - 02:25 PM

I would probably say no to separating the used morph verts and unused? It depends on how many verts we are talking about. Do you want to change your normal,vert,tex coord pointers, and shader to the non-morphed portion of your model and send another draw call? Thats 5 or more openGL calls you will have to make. Unless this model is a million verts and most are not used, I would probably say just do the extra morph and shader on the static vertices. If your static verts are 1-2,000 then your very close to, or better to just draw them than to make 5 GL calls most likely.

#9 Wilhelm van Huyssteen   Members   -  Reputation: 903

Like
0Likes
Like

Posted 28 October 2011 - 01:59 AM

Thnx for all the input

#10 RobTheBloke   Crossbones+   -  Reputation: 2295

Like
0Likes
Like

Posted 28 October 2011 - 09:09 AM

I would probably say no to separating the used morph verts and unused? It depends on how many verts we are talking about. Do you want to change your normal,vert,tex coord pointers, and shader to the non-morphed portion of your model and send another draw call? Thats 5 or more openGL calls you will have to make. Unless this model is a million verts and most are not used, I would probably say just do the extra morph and shader on the static vertices. If your static verts are 1-2,000 then your very close to, or better to just draw them than to make 5 GL calls most likely.


That's 5 draw calls to make sure you aren't just filling your gpu memory with 50+ copies of exactly the same data.
That's 5 more draw calls to alleviate a potential memory bottleneck on low end GPUs (eg laptops/netbooks) [in cases where they utilise system memory]
That's 5 draw calls that will give a significant improvement in the way you are utilising the GPU and it's ram.

Remember: Always optimise for memory before you optimise for computational performance. If there was always zero overhead for performing a memory read, you might have a point. Since that's not the case, it's best not to make that assumption.

#11 RobTheBloke   Crossbones+   -  Reputation: 2295

Like
0Likes
Like

Posted 28 October 2011 - 09:26 AM

Btw. I wanted to actually write morph-target library a long time ago and I'm actually really thinking about that again ... thanks :D



Consider:

1. Base mesh.
2. Eye raise.
3. Jaw Open.


If you simply 'lerp' between 2 and 3, you'll have a half open Jaw, and a half raised eyebrow.
Let's say I want a "REALLY" open jaw, in most software I can extend the weight to 2.0 to achieve this. If you were simply using LERP, the head would scale up 1.5 times it's original size.

Both of these kinda indicate that using LERP is a no-no. A better idea is to subtract the target from the base mesh, and then sum the weighted offsets onto the base mesh.

#12 swiftcoder   Senior Moderators   -  Reputation: 9587

Like
0Likes
Like

Posted 28 October 2011 - 09:49 AM

You can only bind one VBO at once.

Say what? You can bind exactly as many VBOs as the number of vertex attribute streams your card supports.

(hint: glBindBuffer() is purely a client-side construct. glVertexAttribPonter() is what actually binds VBOs to the server side)

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#13 dpadam450   Members   -  Reputation: 842

Like
0Likes
Like

Posted 28 October 2011 - 12:08 PM

I was reading a thread that was back and forth, Its better to just put it all in one buffer anyway, 1 glBindBuffer call

That's 5 draw calls to make sure you aren't just filling your gpu memory with 50+ copies of exactly the same data.
That's 5 more draw calls to alleviate a potential memory bottleneck on low end GPUs (eg laptops/netbooks) [in cases where they utilise system memory]
That's 5 draw calls that will give a significant improvement in the way you are utilising the GPU and it's ram.

Remember: Always optimise for memory before you optimise for computational performance. If there was always zero overhead for performing a memory read, you might have a point. Since that's not the case, it's best not to make that assumption.

1.) Who cares about netbooks, if he does then your point might be valid.
2.) It might not be a significant improvement because 5 draw calls takes some time to get to the GPU, so if you split your model into 500 static verts and 500 morphable ones, then your going to be able to draw all 1,000 faster than drawing 500, GPU goes idle while waiting for your first command, still idle waits for second command, finally it gets the 5th command to actually draw something. So you are idling out your GPU for a small fraction of time.

Always optimise for memory before you optimise for computational performance.

What? 50 copies of his model at 2,000 verts is probably less than 10 MB. And secondly, everything is about speed so I dont know what you mean. If your card has 512 MB, your card is not going to be slower the more ram you fill up. Everything is about optimizing computations in gpus. The only things that talk about memory are the ones that actually fill up the whole card because of megatexturing or something. Hes only saving memory if his model is actually big, again if it is small, then memory and speed are low enough to that trying to split the model with be slower. I wouldn't optimize memory unless I was actually going to go over budget and then realize I need to trade performance going down for an increase in more free memory.

#14 swiftcoder   Senior Moderators   -  Reputation: 9587

Like
1Likes
Like

Posted 28 October 2011 - 12:35 PM

I was reading a thread that was back and forth, Its better to just put it all in one buffer anyway, 1 glBindBuffer call

Let me be very explicit about this: glBindBuffer() amounts to a pointer assignment in client memory - net performance impact: negligible.

If your model has (for example) 5 vertex attributes, then the server-side performance cost incurred by the necessary 5 calls to glVertexAttribPointer() will be the same regardless of whether those calls all source from the same VBO, or source from 5 different VBOs.

Where you can get a performance win by sourcing vertex attributes from a single VBO, is in cache coherency. If your vertex attributes are interleaved within a single VBO, and your vertex size is a multiple of a cache line, then you will see potentially significant performance gains. If you just cram vertex data into a VBO without carefully interleaving components and paying close attention to cache requirements, then you don't gain anything by using a single VBO.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#15 dpadam450   Members   -  Reputation: 842

Like
0Likes
Like

Posted 28 October 2011 - 12:48 PM

How can glBindBuffer not go to the GPU? glBufferData() wouldn't know what to manipulate on the GPU.

#16 swiftcoder   Senior Moderators   -  Reputation: 9587

Like
1Likes
Like

Posted 28 October 2011 - 01:19 PM

How can glBindBuffer not go to the GPU? glBufferData() wouldn't know what to manipulate on the GPU.

The OpenGL API is a state machine (bind buffer, send buffer data, unbind buffer, etc.). But on the server-side (i.e. the driver/GPU end of things), it doesn't work like a state machine at all.

Roughly speaking, glBindBuffer() just sets a pointer in client memory telling the OpenGL API which buffer you want to work with. It isn't until you call glBufferData() that an actual driver transaction is started. At that time, the API will generate a transaction, which conceptually looks a little like this: {'setBufferData', <buffer-id>, <data>}, where the <buffer-id> is the value last set by glBindBuffer(). This transaction is all that is sent to the GPU - the state machine calls never leave the CPU side of things.

Only a handful of OpenGL calls actually result in data moving across the CPU <--> GPU connection. A (non-exhaustive) list of those would be something like this:
- VBO/PBO/Texture/Framebuffer upload/readback
- Shader/Program upload
- Draw calls
- glClear
All the other API calls are just providing extra data that is bundled up into the transactions generated by one of those calls.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#17 dpadam450   Members   -  Reputation: 842

Like
0Likes
Like

Posted 28 October 2011 - 01:35 PM

Never read that anywhere, is there a specific spot you grabbed that info from? As for saying the GPU is not a state machine, its gotta be, glEnable(GL_TEXTURE_2D), I know that isn't sent down by the driver every time I draw something. Or glColor3f(), glEnable(anything) etc.

#18 swiftcoder   Senior Moderators   -  Reputation: 9587

Like
0Likes
Like

Posted 28 October 2011 - 02:12 PM

Never read that anywhere, is there a specific spot you grabbed that info from?

Any of the in-depth discussions of the DirectX/OpenGL graphics pipelines, or the discussions of bare-metal renderers written for consoles - I couldn't point to a single source off the top of my head. Any keep in mind that the exact details vary by platform/vendor/driver/etc. None of this is set in stone.

As for saying the GPU is not a state machine, its gotta be, glEnable(GL_TEXTURE_2D), I know that isn't sent down by the driver every time I draw something.

First off, you have to remember that texture bind state is separate for each texture unit (i.e. glActiveTexture). The GPU has no concept of textures being enabled or not - it's just a matter of whatever texture units the current fragment shader chooses to read from (and for the fixed function pipelline, its just a shader emulating the old pipeline).

Or glColor3f()

glColor3f() just sets the current vertex colour - that value isn't even used until the user calls glVertex() to submit the vertex, and that doesn't (generally) take effect until the user calls glEnd() to submit the entire primitive to the pipeline.

glEnable(anything) etc.

The glEnable() calls mostly set client state directly (pixel transfer state, that sort of thing), or they store state bits to be sent with future draw calls (GL_NORMALIZE, etc.). Some of them probably do set so-called 'server-side' driver state (i.e. GL_DEPTH_TEST), but I doubt that much of that state is actually stored on the GPU itself.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#19 dpadam450   Members   -  Reputation: 842

Like
0Likes
Like

Posted 28 October 2011 - 03:17 PM

The glEnable() calls mostly set client state directly (pixel transfer state, that sort of thing),

Blending, Multisampling, cull face, glBlendFunc, depthfunc, stencil the gpu definitely has a fairly big or equal state machine to the one on the cpu.

The GPU has no concept of textures being enabled or not

So your telling me every time I call glTexImage2D, glCopyTexImage2D, GenerateMipMaps, glTexParameter, that the client is also sending a handle to my texture? Why would they waste a 4 byte overhead to send the int to the GPU every time and not store that integer on the GPU? That just doesn't seem to make sense that they would even have a state machine. Why would I not just call the drivers functions: glBindTexture(handle, texture_unit) instead of glActiveTexture(texture_unit), glBindTexture(handle). Maybe it does, but it seems pretty stupid to send that integer for every single function I call, instead of just doing what it looks like is setting the int one time, and just manipulating the current texture.

#20 swiftcoder   Senior Moderators   -  Reputation: 9587

Like
0Likes
Like

Posted 28 October 2011 - 04:49 PM

Blending, Multisampling, cull face, glBlendFunc, depthfunc, stencil the gpu definitely has a fairly big or equal state machine to the one on the cpu.

It's not a state machine though - state machines assume a serial process, one operation after the other, which is fine for a single-threaded API like OpenGL. The GPU is an inherently parallel machine: there can be many threads, in many programs, even multiple simultaneous OS performing operations at the same time.

Now, it's certainly true that the driver does maintain some per-context state in GPU memory, but it isn't nearly as simple as there being a single blending function. There is no guarantee that operations occur on the GPU in the same order you specify them - the driver is free to reorder operations however it likes, so long as the necessary dependencies between operations are met, so the blending function (among other state) must be attached to each operation.

Why would they waste a 4 byte overhead to send the int to the GPU every time and not store that integer on the GPU?

You're uploading what, 100KB of texture data? 100MB? 4 bytes of overhead (and the command overhead is probably much more than that), is absolutely nothing - a standard PCI Express 2.0 bus can transfer 8 GB/s.

Why would I not just call the drivers functions: glBindTexture(handle, texture_unit) instead of glActiveTexture(texture_unit), glBindTexture(handle).

it would indeed be much more convenient to eliminate binding altogether, and that is one of the reasons that there are regular cries for an object-oriented interface to OpenGL...

In the early 90's (when the OpenGL API was developed), graphics cards may indeed have operated somewhat along the same lines as the OpenGL API, but the hardware has long since diverged. At this point, OpenGL is a (somewhat poor) high level abstraction - it has little or nothing to do with the way GPUs actually work.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS