• 14
• 15
• 11
• 10
• 9
• ### Similar Content

• By AndyCo
Im looking for some project to boost up my portfolio, Im not a pro but I`m not bad at all.
Feel free to contact me.

• I have a particle system with the following layout:
system / emitter / particle

particle is driven by particle data, which contains a range of over lifetime properties, where some can be random between two values or even two curves. to maintain a smooth evaluation between two ranges, i randomize a "lerp offset" on particle init and use that value when evaluating curves. the issue is that i'm using that same offset value for all properties (10ish) and as a result i'm seeing some patterns, which i'd like to remove. The obvious way is to just add more storage for floats, but i'd like to avoid that. The other way is to generate a seed of some sort and a random table, and use that to generate 10 values, ie: start with short/integer, mask it, then renormalize to float 0-1.

any other ideas?

• I want to calculate the position of the camera, but I always get a vector of zeros.

D3DXMATRIX viewMat; pDev->GetTransform(D3DTS_VIEW, &viewMat); D3DXMatrixInverse(&viewMat, NULL, &viewMat); D3DXVECTOR3 camPos(viewMat._41, viewMat._42, viewMat._43); log->Write( L"Camera Position: %f %f %f\n", camPos.x, camPos.y, camPos.z);

Could anyone please shed some lights on this?
thanks
Jack
• By bsudheer
Leap Leap Leap! is a fast-paced, endless running game where you leap from rooftop to rooftop in a computer simulated world.

This is a free run game and get excited by this fabulous computer simulated world of skyscrapers and surreal colors in parallax effect. On your way, collect cubes and revival points as many as you can to make a long run.

Features of Leap Leap Leap:
-Option of two themes: Black or White.
-Simple one touch gameplay.
-Attractive art.
-Effective use of parallax.
Appstore: https://itunes.apple.com/us/app/leap-leap-leap/id683764406?mt=8

• By isu diss
I'm following rastertek tutorial 14 (http://rastertek.com/tertut14.html). The problem is, slope based texturing doesn't work in my application. There are plenty of slopes in my terrain. None of them get slope color.
float4 PSMAIN(DS_OUTPUT Input) : SV_Target { float4 grassColor; float4 slopeColor; float4 rockColor; float slope; float blendAmount; float4 textureColor; grassColor = txTerGrassy.Sample(SSTerrain, Input.TextureCoords); slopeColor = txTerMossRocky.Sample(SSTerrain, Input.TextureCoords); rockColor = txTerRocky.Sample(SSTerrain, Input.TextureCoords); // Calculate the slope of this point. slope = (1.0f - Input.LSNormal.y); if(slope < 0.2) { blendAmount = slope / 0.2f; textureColor = lerp(grassColor, slopeColor, blendAmount); } if((slope < 0.7) && (slope >= 0.2f)) { blendAmount = (slope - 0.2f) * (1.0f / (0.7f - 0.2f)); textureColor = lerp(slopeColor, rockColor, blendAmount); } if(slope >= 0.7) { textureColor = rockColor; } return float4(textureColor.rgb, 1); } Can anyone help me? Thanks.

# 3D Combining Deferred rendering, Batching, Model Matrices, Skeletal animations, and shadow maps

## Recommended Posts

Hello all,

I am currently working on a game engine for use with my game development that I would like to be as flexible as possible.  As such the exact requirements for how things should work can't be nailed down to a specific implementation and I am looking for, at least now, a default good average case scenario design.

Here is what I have implemented:

• Deferred rendering using OpenGL
• Arbitrary number of lights and shadow mapping
• Each rendered object, as defined by a set of geometry, textures, animation data, and a model matrix is rendered with its own draw call
• Skeletal animations implemented on the GPU.
• Model matrix transformation implemented on the GPU
• Frustum and octree culling for optimization

Here are my questions and concerns:

• Doing the skeletal animation on the GPU, currently, requires doing the skinning for each object multiple times per frame: once for the initial geometry rendering and once for the shadow map rendering for each light for which it is not culled.  This seems very inefficient.  Is there a way to do skeletal animation on the GPU only once across these render calls?
• Without doing the model matrix transformation on the CPU, I fail to see how I can easily batch objects with the same textures and shaders in a single draw call without passing a ton of matrix data to the GPU (an array of model matrices then an index for each vertex into that array for transformation purposes?)
• If I do the matrix transformations on the CPU, It seems I can't really do the skinning on the GPU as the pre-transformed vertexes will wreck havoc with the calculations, so this seems not viable unless I am missing something

Overall it seems like simplest solution is to just do all of the vertex manipulation on the CPU and pass the pre-transformed data to the GPU, using vertex shaders that do basically nothing.  This doesn't seem the most efficient use of the graphics hardware, but could potentially reduce the number of draw calls needed.

Really, I am looking for some advice on how to proceed with this, how something like this is typically handled.  Are the multiple draw calls and skinning calculations not a huge deal?  I would LIKE to save as much of the CPU's time per frame so it can be tasked with other things, as to keep CPU resources open to the implementation of the engine.  However, that becomes a moot point if the GPU becomes a bottleneck.

##### Share on other sites
Posted (edited)
Quote

• Doing the skeletal animation on the GPU, currently, requires doing the skinning for each object multiple times per frame: once for the initial geometry rendering and once for the shadow map rendering for each light for which it is not culled.  This seems very inefficient.  Is there a way to do skeletal animation on the GPU only once across these render calls?

If you really want to save results, you could store the resultant transforms in an SSBO (or a texel storage unit or something) on your first pass, via vertex index, and grab them on your second. However, I get the feeling that the memory writes and reads will be slower than a few matrix multiplications, and that's not to mention you would need to have one of these objects per instance of your animated mesh.

This approach also introduces a dependency between shadow map passes and your general pipeline. If you don't do this, both shaders can be executing at the same time.

Typically, the majority of objects in a scene are not undergoing skeletal animation. For your general use case, I wouldn't worry about recalculating animations. Vertices are processed pretty fast.

Quote
• Without doing the model matrix transformation on the CPU, I fail to see how I can easily batch objects with the same textures and shaders in a single draw call without passing a ton of matrix data to the GPU

Don't worry about it. pcie x16 transfers at a rate of 4 GB/s, so ~67 mb/frame for a 60 fps target. A matrix is 64 bytes, and you're passing bone transforms. if we go ham and say you have 500 bones per model (YEESH!), you could still pass 10,000 full skeletons and have more than half of your PCIE bandwidth left over for the frame.

I'm also a bit confused here. If you do the model transforms on the cpu, you have to pass not only a bunch of transformed verts, but now you have to pass every instance of a transformed mesh as a -separate mesh-, meaning you can't do instancing for that mesh now.

Quote

(an array of model matrices then an index for each vertex into that array for transformation purposes?)

yes.

Edit: usually you have some float weights and integer bone indices (corresponding to the weights) per vert. You can store these as attributes (vec4+ivec4) or put them in a buffer and get them from a vert index attribute.

I personally have used the second to keep my mesh format consistent and to make attaching arbitrary vertex data less of a horror for future development, obviously at the cost of a bit of performance

Edited by Ugly

##### Share on other sites
17 hours ago, Ugly said:

a bit confused here. If you do the model transforms on the cpu, you have to pass not only a bunch of transformed verts, but now you have to pass every instance of a transformed mesh as a -separate mesh-, meaning you can't do instancing for that mesh now.

Indeed, another reason I would not like to go that route.

Thanks for all the insights, they will be an immense help moving forward.  I am currently passing in the bones as a uniform (actually as dual quaternions which I then convert to matrices in the shader) and the weights and indices as vert attributes for my skeletal animations, but I will look into trying it indexed to see if it fits my needs.

One related question I had just cropped up as I was doing some more reading.  Something I came across mentioned not to reuse buffers for write calls (e.g. a single fixed size VAO reused for batches) due to implicit synchronization killing performance, though some of what I have seen on batching does just that.

How would you perform view frustum culling of objects each frame if modifying the data in buffers can be a killer on performance?  I can't imagine you would want to submit/maintain a bunch of data to/on the GPU that isn't needed for rendering

##### Share on other sites

I think I figured part of this out myself.  Between frames shouldn't be an issue since all of those draw calls will need to be completed for the frame anyway.

Should you not reuse VAO for batching, if you need more space than a batch can handle, create a new VAO?  This seems like the number of buffers can grow significantly though if you are sending a lot of data

##### Share on other sites
5 minutes ago, kanageddaamen said:

if you need more space than a batch can handle, create a new VAO?

I'm assuming you mean VBO, rather than VAO?

##### Share on other sites
Posted (edited)
24 minutes ago, swiftcoder said:

I'm assuming you mean VBO, rather than VAO?

Wouldn't you need to create an entire new VAO, otherwise the other VBOs in the VAO you are rendering will be passed by the draw call, thereby increasing the batch size which you are trying to keep constant?  I must admit I am no expert on the various draw call options and their capabilities

EDIT: I suppose you would just bind different VBOs and make some glVertexAttribPointer calls for the next batch call

Edited by kanageddaamen

##### Share on other sites
10 minutes ago, kanageddaamen said:

Wouldn't you need to create an entire new VAO, otherwise the other VBOs in the VAO you are rendering will be passed by the draw call, thereby increasing the batch size which you are trying to keep constant?

VAOs are purely client-side state. They work exactly the same as making the individual glBindBuffer/glEnableVertexAttribArray/glVertexAttribPointer calls yourself.

As such they don't affect batching at all. You still have one batch per glDraw* call, regardless of how you bound the vertex buffers.

##### Share on other sites
Just now, swiftcoder said:

VAOs are purely client-side state. They work exactly the same as making the individual glBindBuffer/glEnableVertexAttribArray/glVertexAttribPointer calls yourself.

As such they don't affect batching at all. You still have one batch per glDraw* call, regardless of how you bound the vertex buffers.

Gotcha

##### Share on other sites

In my engine I am doing skinning in compute shaders before the rendering starts. This is very nice from a shader management point of view, because I have a single skinning shader, and every model can use a regular vertex shader, regular input layout in rendering, so the amount of vertex shader permutations is minimized. From a performance point of view, this is a trickier question and maybe not always results in the same answer. For example, I spawned a little conversation on twitter one day regarding performance implications on tile based architectures. And I wrote a small blog on the subject as well, take a look if interested.

##### Share on other sites

Actually, rather than spinning up a new VBO for each batch of the same state in a frame if one gets filled, would the following be a better approach:

For batch size N MB
Using a VBO allocated with N MB
For each chunk of N MB data with the same state
Fill VBO with chunk of data using glMapBufferRange with GL_MAP_INVALIDATE_BUFFER_BIT
Make draw call

From what I have read this should safely mitigate implicit synchronization while allowing for a single VBO handle to be used

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account