# Dark Helmet

Member

10

## Posts posted by Dark Helmet

1. ### reconstruct depth from z/w ?

&amp;amp;nbsp;

...in the postprocessing pass I try to reconstruct it using this method&amp;amp;nbsp;http://www.geeks3d.com/20091216/geexlab-how-to-visualize-the-depth-buffer-in-glsl/.

Re the geekslab page, yeah I tried that years ago and it doesn't work.

Try this (for an arbitrary perspective frustum, with glDepthRange of 0..1):

vec3 PositionFromDepth_DarkHelmet(in float depth)
{
vec2 ndc;             // Reconstructed NDC-space position
vec3 eye;             // Reconstructed EYE-space position

eye.z = near * far / ((depth * (far - near)) - far);

ndc.x = ((gl_FragCoord.x * widthInv) - 0.5) * 2.0;
ndc.y = ((gl_FragCoord.y * heightInv) - 0.5) * 2.0;

eye.x = ( (-ndc.x * eye.z) * (right-left)/(2*near)
- eye.z * (right+left)/(2*near) );
eye.y = ( (-ndc.y * eye.z) * (top-bottom)/(2*near)
- eye.z * (top+bottom)/(2*near) );

return eye;
}

Note: "depth" is your 0..1 window-space depth. Of course, if you assume a "symmetric" perspective frustum (but not necessarily one that is 90 deg FOV), the eye.x/.y lines simplify down to:

  eye.x = (-ndc.x * eye.z) * right/near;
eye.y = (-ndc.y * eye.z) * top/near;

Now of course for mere depth buffer visualization, all you really want from this is "eye.z", which is linear depth value. So nuke the rest. Just map this eye.z value from -n..-f to 0..1, use that as your fragment intensity, and you're done:

  intensity = (-eye.z - near) / (far-near);

2. ### 300 000 fps

glutSwapBuffers, on the other hand, is the real, true thing. It actually swaps buffers, so there is really a notion of "frame". It also blocks, but synchronized to the actual hardware update frequency, and in a somewhat less rigid way (usually drivers will let you pre-render 2 or 3 frames or will only block at the next draw command after swap, or something else).

When timing with just SwapBuffers though, be careful. The problem ends up being the driver typically queues up the request quickly on the CPU and returns immediately (i.e. CPU does not block), after which it lets you start queuing up render commands for future frames. At some random point in the middle of queuing one of those frames when the FIFO fills, "then" the CPU blocks, waiting on some VSYNC event in the middle of a frame. This causes really odd timing spikes leaving you puzzled as to what's going on.

If you want reasonable full-frame timings, after SwapBuffers(), put a glFinish(), and then stop your timer.
3. ### Questions about mesh rendering performance

Exactly! Furthermore, drivers pack your data in the optimal way along with all relevant information for later access. ... If you are happy with DLs continue to use them, but on the proper way, and they'll serve you well. If you want to switch to non-deprecated functionality abandon them. VBOs are certainly the right way to do things, but you have to know your hw better.

Right depends on your goals. For most of us, right isn't defined by core but most efficient use of the hardware (fastest performance).

So yes, agreed. If DL works for you use it. If you want more control over how your GPU memory is utilized, use static VBOs but you'll take a performance hit if you use them alone and you have to be smart about how you encode your data within them. If you happen to be running on NVidia and want VBOs with display list performance, use NV bindless to launch your batches with those VBOs. If not, then substitute VAOs in place of bindless -- it doesn't perform as well but it's better than nothing.

Also, the more data you pack in your VBOs (the larger your batches), the less likely you are to be CPU bound launching batches (which is for the most part what bindless and VAOs strive to reduce).
4. ### Vertex Array Object + Direct State Access

There is no viable alternative to VAOs though, which is why we are all so confused.

Cross-platform, no. But on NVidia, bindless can easily exceed the performance you get from VAOs, and the reason is intuitive.

You ideally just "enable" your 5 attribs, and then proceed to render 300 VBOs. You can't. It sucks.

Instead I have to create 300 VAOs, with 5 "enabled" attribs each. Then render 300 VAOs.

No you don't. Having a bazillion little VAOs floating around with all the cache misses that go with them isn't necessarily the best approach. Best case, use bindless (does an end-run around the cache issues, but is NV-only) or a streaming VBO approach with reuse (which keeps the bind count down, and works cross-platform).

If you have a number of separate, preloaded VBOs, you absolutely can't/refuse to make them large enough to keep you from being CPU bound for some reason, and you can't/won't use bindless, then fall back on (in the order of the performance I've observed on NVidia) 1) VBOs with VAOs, one per batch state combination, 2) client arrays, or 3) VBOs without VAOs or with one VAO for all.

In case it's not obvious, I do what gives me the best performance. I'm not a core purist.

Its really unfortunate AMD hasn't stepped up and supported bindless in their OpenGL drivers, at least for launching batches (vertex attribute and index list specification).
5. ### Should I Write a Book on Modern Real-Time Animations?

Originally posted by L. Spiro:

Should I Write a Book on Modern Real-Time Animations?

Definitely! I was just re-surveying the CG book scene to see if anything new had developed here with more "meat" than sources I already have. Alas, no such luck.

Here are some topics you might consider in addition to those mentioned above.

1. Skeletal Animation Math and Data Model used to pose a joint skeleton (clearly presented, in detail!)
-- there are very precious few sources that do this well, which is why I put it first
2. Basic Skinning: Survey/detail of the most useful skinning techniques (DQS, SBS, LBS, etc., including shader code with extensive treatment of the math and pros/cons (candy wrapper, joint collapse/bulging, performance))
3. Advanced Skinning (optional): Cutting-edge skinning techniques such as joint-based deformers.
4. Action State Machines (aka Animation State Machines; ASMs)
(artist/animator-driven blending and state transition management; pros/cons and survey of main ASM features such as in Granny3D, Morpheme, Havok Behavior discussing not just what each common feature (node/connection type) does but why and how they work. Also, provide tips and tricks for "rolling your own" ASM.)
5. Attachments - Detailing the math, and any special considerations
6. Animation synchronization - Interaction with other characters (e.g. multi-skeleton animations) and the world (open, throw, catch, push, handshake). Emitting game events to kick off other actions (e.g. attach/detach, sound). Also things like sync between blended animations (e.g. limb-sync for walk-to-run and how to set that up) but you'll likely cover that under animation blending.
7. Animation Transitions - Pre-modeled versus blended; engine abstraction; tips and examples.
8. Root motion extraction/application techniques. AI interaction.
9. Behaviors - The AI linkage. Driving characters around. Managing character interactions. Making behavior look realistic and not "computed".
10. IK Details (not just solving, but authoring/publishing constraints, tricks for making them look natural and not mechanical, blend IK vs. solver IK, examples)
11. Performance - Tips/tricks for taking this and optimizing it for maximum performance (SSE, threading, offloading to GPUs, sharing animation data, optimizing transitions, crowds)
12. Animation Compression/Storage
13. Collision Detection - With/between skeletally animated models. Techniques.
14. Engine Integration - Layered design. Strategies for exposing skeletal/ASM capability to renderer and AI; abstracting animation details from the engine.
15. 3rd party SDKs and Toolkits - Just a jumping-off points for folks that don't need to rewrite everything from scratch, including open source and commercial sources. Bonus: Table and/or discussion of features comparing capabilities.

A good bit of this can be dug out of a conference and journal papers, but that takes a lot of time, and you're still left with determining what's useful vs. what is "academic". Much of this I've never seen a good, consolidated reference for. For instance, I've never seen a great reference on #4 (ASM tech), save for vendor docs, and am very interested in this.

Re #1, by far the best source I've seen out there anywhere is Gregory's Game Engine Architecture; also touches on a few of the others such as #4 a bit but didn't have enough detail for my needs. Highly recommend reviewing this source first to give you ideas on how to add value to the Character Animation book scene. Character Animation With Direct3D" is OK but just too light on details if you really want to get down and understand how everything works well enough to implement your own skeletal system which was my goal. Have flipped through quite a few more books including Computer Animation, Third Edition: Algorithms and Techniques, but most are just high-level surveys that don't have enough detail to make them worthwhile to purchase.
6. ### Choosing specific GPU for OpenGL context?

Too bad we're talking Windows here not Linux. There you just setup one GPU per screen, create an X connection on the appropriate screen, and create a GL context on that connection (on NVidia at least). Pretty simple.
7. ### PBO to GL_BACK

GL_BACK_LEFT is a component buffer of the default (system) framebuffer. It is not the name of a buffer object.

I can easily see why you made this mistake though. Components of a framebuffer are historically called buffers (e.g. color buffers, depth buffer, stencil buffer; thus glDrawBuffer/glReadBuffer/etc.). These are different than buffer objects (arbitrary blocks of driver memory you can create).

With framebuffer objects, these component buffers are called "attachment points" to help disambiguate these concepts.
8. ### Skeletal Animation Using Dual Quaternions

ChildInverseDQ is a DualQuat created by negating the bind pose starting position.

Doesn't sound right. This should be the full inverse of the bind pose transform for the specified joint (rotation and translation). For the root joint this might just be a negative translation (i.e. 0 deg rotation), but for child joints in general, this is not the case.

What I'm doing is obviously not working for me, but is the theory sound? How far off am I?

Sounds like you're close. If trouble persists, would just do your transform compositing using matrices, and then just do a matToDQ on the tail end. Then later you can flip to DQs.
9. ### Writing to Render Target from Itself

Check out:

* GLSL : common mistakes#Sampling and Rendering to the Same Texture

NV_texture_barrier might be useful to you on NVidia specifically, but I don't know of a cross-vendor way to support this.

OpenCL IMO is a non-starter except for limited use cases, as IIRC flipping back and forth requires a full pipeline flush/sync (that is, in the absense of cl_khr_gl_event / ARB_cl_event). An OpenGL Compute Shader is much more interesting in terms of avoiding that overhead, but I'm not an expert on those yet.