Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 22 May 2004
Offline Last Active Yesterday, 09:07 AM

#4983924 Math API performance: saving CPU cycles?

Posted by clb on 26 September 2012 - 03:14 AM

Answering the original question..

For example, it's my understanding than multiplication is slightly faster than division, and saves a few CPU cycles here and there

Is this correct/true, and should I be doing it this way? And what other optimizations might I use in general to make my math code blazing fast and efficient?

Assuming that the SSE instruction set instead of the old FP87 stack is used, then a single-precision float scalar division (DIVSS instruction) has a latency of 14-32 cycles and a processing time of 14-32 cycles, depending on the architecture. Double-precision float scalar division (DIVSD) has a latency of 22-39 cycles and a processing time of 20-39 cycles.

Compare to multiplication: a single-precision float scalar multiplication (MULSS) has a latency of 4-7 cycles, and a processing delay of 1-2 cycles, and double-precision scalar multiplication (MULSD) has a latency of 5-7 cycles and a processing time of 1-2 cycles.

The figures were taken from Intel Intrinsic Guide.

So, multiplication is about 20 times faster (assuming perfectly pipelined instructions).

I'm ignoring here the fact that you're not using C/C++ and direct SSE asm/intrinsics, and instead use C#, but the point is that 'yes, division is considerably slower *for the CPU* to execute even on modern CPUs than multiplication'. Whether that can be seen in C# execution environment, is then a matter of profiling.

MathGeoLib uses this 'multiplication by inverse' form, as do most of the game math libraries I've seen as well. Note that x / s versus x * (1/s) are not arithmetically identical, since first computing the inverse as a float and multiplying by it does lose some precision.

And what other optimizations might I use in general to make my math code blazing fast and efficient?

It should be noted that in C/C++ both a single function call, or an 'if' statement are far slower than performing a single division. However, again, in the context of C#, I recommend profiling in your real application hotspot to see what kind of effects these are, since that's quite a different context than low-level C code on the assembly/intrinsic level.

#4982592 Rotating Hitboxes, images and more

Posted by clb on 22 September 2012 - 12:28 AM

It is a standard technique/feature that you can render your 2D sprite rotated to an arbitrary angle in realtime. They just use a rotation matrix that they apply to the sprite rectangle vertices when rendering, and the GPU deals with the rotation and filtering in realtime, no problem. Rendering a sprite axis-aligned versus rendering it in an arbitrary angle does not even carry a performance penalty, it's the same performance for the GPU. As for the assets, you only need a single animation sequence for the effect in the axis-aligned position. Googling for "2d rotating a sprite" finds some good hits, e.g. this is an example of how to do it in SDL.

For hitboxes, the case is the same. They probably have defined a rectangle, or a polygon that marks the hit area of the effect. This vector shape is rotated to the appropriate angle from its default axis-aligned orientation before testing for collision against the vector shapes of the other objects. Since there's only a very few points needed to represent such a shape (perhaps 10 at most?), it's very cheap and could be done in realtime without performance implications even on a mobile device.

#4980861 OpenGL or Directx for learner?

Posted by clb on 17 September 2012 - 07:00 AM

I develop and maintain a graphics engine that works on a number of platforms, and it abstracts Direct3D11, OpenGL3, GLES2 and WebGL. If line counts are any measure, my Direct3D-specific codebase is 3094 lines, and my OpenGL-specific codebase is 3576 lines.

My opinion is that Direct3D11 is way easier to work with and I think developing the Direct3D codepath was considerably easier and I faced way fewer cryptic bugs and issues when implementing Direct3D11 path than the OpenGL path. Some notes:
- Direct3D11 actually has a debug layer that gives out *very* helpful error messages in the debug console while developing. These are diagnostics like "hey, you don't seem to have a render target bound." and "hey, your vertex buffer did not have a position stream.". The typical "oops, I'm an idiot" stuff is very often detected by D3D11 debug layer. Additionally, the returned HRESULT error codes are descriptive and give out good differentiated error messages.
- In OpenGL, there is no debug layer. There is no diagnostics. Silent failure and "hmm why is this just rendering black" are very common. There is only like 5 or 6 different error codes returned by OpenGL, and they are generally unhelpful to getting a clue of what is wrong.
- Direct3D11 is typesafe. In OpenGL it's all GLints and GLenums. Strongly typed API always trumps a weakly typed one.
- The Direct3D way of using vertex declarations, input layouts and vertex buffers is an explicit and easy-to-comprehend way to structure the renderable data. In OpenGL, the corresponding terms are VBOs (Vertex Buffer Objects) VAOs (Vertex Array Objects) and there's also the terms GL_ARRAY_BUFFER and GL_ELEMENT_ARRAY_BUFFER. I think the OpenGL nomenclature alone is something that is easily confused.
- In Direct3D, you have explicit guarantee of the features that work when you initialize a certain feature level. In OpenGL, you'll have to work with enumerating extensions, and hunting down extension specifications. You'll need to bind function entry points, or incorporate libraries like GLEW or GLEE or similar. There's an annoying amount of overlap with the extensions, i.e. the extensions tend to be vendor-oriented, and not feature-oriented (see e.g. how many different extensions there are for enabling VAO support for different vendors).
- In OpenGL, the VAOs and FBOs (FrameBuffer Object) are oddly functioning cache objects. The API for applying and defining a VAO and a FBO is the same. It's easy to forget a VAO or an FBO bound to the device, and overwrite their state accidentally when specifying new state. This is a very common bug I've often seen beginners get caught with.

Note however that as a 3D developer, you probably won't have the luxury of choosing, but you'll have to learn both, as there's a market segment for both APIs. Whichever you choose, be sure to familiarize yourself with PIX, nVidia Parallel nSight, Direct3D11 debug layer for D3D, and gDebugger for OGL, since they'll save you a ton of grief when things go wrong.

#4980852 OpenGL troubleshooting

Posted by clb on 17 September 2012 - 06:24 AM

I recommend using gDebugger, AMD CodeAnalyst and Intel vTune to profile performance issues. The "app needs to be reinstalled" may be caused by a bad DLL dependency, so use e.g. depends22.exe to see if there are missing DLL dependencies. Also, have a look at the .embed.manifest file generated by Visual Studio to check what kind of dependencies were generated by the build, and see if there are any debug DLLs there.

#4978595 Finding point on edge of circle in 3-D space

Posted by clb on 10 September 2012 - 09:08 AM

Here's how I do it in MathGeoLib: Circle::GetPoint. If you have requirements on what direction the angle of 0 degrees corresponds to, you'll have to be more precise how you define the directions BasisU and BasisV in that codepath.

#4978502 Effective way to detect whether my 3D cubes are outside the 2D screen boundar...

Posted by clb on 10 September 2012 - 01:33 AM

What I would do is I'd compute the out-of-screen check in 2D using bounding spheres instead of the cubes.

Offline: Compute the bounding sphere of the cube (the radius of the sphere that encloses the whole cube)

1. Transform the center point P of the cube to 2D screen coordinates P'.
2. Transform the radius R of the bounding sphere at the distance of the cube to 2D screen coordinates R'.
3. Compare if the 2D point P' is farther than R' outside of one of the four edges of the screen and if so, discard it.

This method should be considerably more lightweight than SAT of 3D AABB vs 3D frustum.

Another even simpler method would be to just check when the cube passes the camera near plane, and kill it there.

#4977613 Platform-agnostic renderer

Posted by clb on 07 September 2012 - 07:20 AM

+1 for ignoring any design decision Ogre3D did.

As for the discussion "should a mesh draw itself, i.e. should I have a function Mesh::Draw()?" the answer is strongly no. In a typical 3D scene, you never do anything just once. You never just draw a single mesh, you never just update the physics of a single rigid body, and so on. The most important design decision for a performant 3D engine is to allow batch, batch, batch! This means that you will need a centralized renderer that can control the render process, query the renderable objects, sort by state changes, and submit render calls as optimally as possible. The information required to do that is scene-wide, and inside a single Mesh::Draw() function you can't (or shouldn't) make these state optimization decisions. It's the renderer's role to know how to draw a mesh, and it's also the renderer's role to know how to do that as efficiently as possible with respect to all other content in the scene.

#4976847 Gimbal Lock. Trying to describe mathematically.

Posted by clb on 05 September 2012 - 09:12 AM

I remember writing about gimbal lock in a thread some time ago, does this thread help?

#4971445 What are the downsides to using Unity?

Posted by clb on 20 August 2012 - 06:10 AM

Unity has some serious memory management issues. A lot of people get hit by performance spikes (5-50msecs) caused by poor Mono garbage collector. There are no good mechanisms to profile memory usage and detect where your memory is leaking/consumed. The GUI system behaves badly with this respect. These issues are worsened by crash bugs in Mono 2.6, googling for "too many root sets" and "too many heap sections" will find some links [1], [2], [3], [4], [5]. The common practices are to avoid allocating new objects while your game is running, to avoid the risk of fragmenting the Mono heap to the point where it crashes.

Not happy to say that so far (about three years of Unity development), we haven't been able to write an application for Unity that could sustain a stable uptime of a week without crashing in Mono/GC. Some apps we've built do run ok for about 24h, but they've mostly been an exception. Whenever using a closed source middleware/engine, one naturally is afraid of finding these kind of showstoppers, but I think the above issue is the only one we've met with Unity. If writing server software/long uptime is not a particularly important goal, then I think Unity's gonna be ok.

#4964755 Generating spheres with unifrom vertex density

Posted by clb on 31 July 2012 - 02:04 AM

If geospheres might be adequate for this purpose, see e.g. this and this.

#4960524 JavaScript Games on Android Phones

Posted by clb on 18 July 2012 - 09:57 AM

I did a quick evaluation of using JavaScript + WebGL on Android and iOS, with the prospect of writing cross-platform games for desktop, web, iOS and Android using the same JavaScript codebase. The brief report can be read here. The summary is that I found JavaScript+WebGL to be too slow on Android phones, and WebGL virtually unsupported by mobile browsers, both Android and iOS. I do not know how much of the slowness was caused by WebGL portion specifically, and how JavaScript without WebGL would work on Android.

After that, I switched the strategy to developing a cross-platform C++ codebase, which works with native performance on all of the platforms I am interested in. To bring the C++ codebase over to web, I use the emscripten compiler. It is not perfect, but results so far have been positive. Some tests: QuadTree.html: click anywhere with a mouse, Geometry.html, aabb_obb_sphere.html, SceneView.html: use WSAD and arrows to move around.

Most JS engines nowadays have some form of JIT support. Even then, I am most certain that JavaScript on web browsers is not JITted to native code that would have the same performance as when the code had been written natively to start with. The above samples all run at 60fps on my Android Tegra3 device (when compiled to native Android applications), but if I try to run them in a web browser on that Tegra3, I get performance ranging at 2-10fps (try yourself if you have an Android phone, I'm interested in hearing how other devices fare). The mobile Opera web browser was the only one I could find with WebGL support.

I tried to find more information in the web site you pasted, but the links seemed to be locked behind a password.

#4959973 Put these 3D graphics terms into relation for me

Posted by clb on 17 July 2012 - 05:58 AM

There are no separate 'constant registers' and 'constant buffers'. All constant variables are assigned into some constant buffer, either explicitly my the shader author, or automatically by the shader compiler. All constant variables outside any constant buffers go into a global $Global constant buffer. From http://msdn.microsoft.com/en-us/library/windows/desktop/bb509581(v=vs.85).aspx : "There are two default constant buffers available, $Global and $Param. Variables that are placed in the global scope are added implicitly to the $Global cbuffer, using the same packing method that is used for cbuffers."

The layout inside a Constant Buffer consists of a list of Constant Registers, each of which are 4xfloat vectors, and the variables in each Constant Buffer are laid out to these Constant Registers.

Textures and Texture Samplers are separate from Constant Buffer Slots, and these three are all assigned separately.

#4959887 Interleaved Arrays

Posted by clb on 16 July 2012 - 11:59 PM

Had a look over your code snippets, but couldn't find anything obviously wrong. Do you have glGetError() checks in place to detect if a call is failing?

#4959884 Handling object collisions

Posted by clb on 16 July 2012 - 11:51 PM

I use this QuadTree implementation as the broad-phase structure to detect collisions in my asteroids game. While developing I didn't sweat about this too much, at first I just had a naive O(n^2) iteration over all object pairs, and only when I got to having so many objects that I could see that in the performance profile, I started to implement something smarter. The current QuadTree code is also there as something that's swappable out if it proves out to be problematic, but I think I'm quite contend with it, since I also use the structure for PVS determination for rendering and other logic.

#4959306 How to simulate daytime in directx?

Posted by clb on 15 July 2012 - 11:23 AM

For this purpose, I recommend Game Engine Gems 1 chapters 15 and 16. It describes a practical implementation of the commonly used Preetham sky model and related items, like how to compute the position of the sun and the moon. To me, these articles are "the" reference on the topic.