Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

clb

Member Since 22 May 2004
Offline Last Active Yesterday, 08:12 AM
***--

#4971445 What are the downsides to using Unity?

Posted by clb on 20 August 2012 - 06:10 AM

Unity has some serious memory management issues. A lot of people get hit by performance spikes (5-50msecs) caused by poor Mono garbage collector. There are no good mechanisms to profile memory usage and detect where your memory is leaking/consumed. The GUI system behaves badly with this respect. These issues are worsened by crash bugs in Mono 2.6, googling for "too many root sets" and "too many heap sections" will find some links [1], [2], [3], [4], [5]. The common practices are to avoid allocating new objects while your game is running, to avoid the risk of fragmenting the Mono heap to the point where it crashes.

Not happy to say that so far (about three years of Unity development), we haven't been able to write an application for Unity that could sustain a stable uptime of a week without crashing in Mono/GC. Some apps we've built do run ok for about 24h, but they've mostly been an exception. Whenever using a closed source middleware/engine, one naturally is afraid of finding these kind of showstoppers, but I think the above issue is the only one we've met with Unity. If writing server software/long uptime is not a particularly important goal, then I think Unity's gonna be ok.


#4964755 Generating spheres with unifrom vertex density

Posted by clb on 31 July 2012 - 02:04 AM

If geospheres might be adequate for this purpose, see e.g. this and this.


#4960524 JavaScript Games on Android Phones

Posted by clb on 18 July 2012 - 09:57 AM

I did a quick evaluation of using JavaScript + WebGL on Android and iOS, with the prospect of writing cross-platform games for desktop, web, iOS and Android using the same JavaScript codebase. The brief report can be read here. The summary is that I found JavaScript+WebGL to be too slow on Android phones, and WebGL virtually unsupported by mobile browsers, both Android and iOS. I do not know how much of the slowness was caused by WebGL portion specifically, and how JavaScript without WebGL would work on Android.

After that, I switched the strategy to developing a cross-platform C++ codebase, which works with native performance on all of the platforms I am interested in. To bring the C++ codebase over to web, I use the emscripten compiler. It is not perfect, but results so far have been positive. Some tests: QuadTree.html: click anywhere with a mouse, Geometry.html, aabb_obb_sphere.html, SceneView.html: use WSAD and arrows to move around.

Most JS engines nowadays have some form of JIT support. Even then, I am most certain that JavaScript on web browsers is not JITted to native code that would have the same performance as when the code had been written natively to start with. The above samples all run at 60fps on my Android Tegra3 device (when compiled to native Android applications), but if I try to run them in a web browser on that Tegra3, I get performance ranging at 2-10fps (try yourself if you have an Android phone, I'm interested in hearing how other devices fare). The mobile Opera web browser was the only one I could find with WebGL support.

I tried to find more information in the web site you pasted, but the links seemed to be locked behind a password.


#4959973 Put these 3D graphics terms into relation for me

Posted by clb on 17 July 2012 - 05:58 AM

There are no separate 'constant registers' and 'constant buffers'. All constant variables are assigned into some constant buffer, either explicitly my the shader author, or automatically by the shader compiler. All constant variables outside any constant buffers go into a global $Global constant buffer. From http://msdn.microsoft.com/en-us/library/windows/desktop/bb509581(v=vs.85).aspx : "There are two default constant buffers available, $Global and $Param. Variables that are placed in the global scope are added implicitly to the $Global cbuffer, using the same packing method that is used for cbuffers."

The layout inside a Constant Buffer consists of a list of Constant Registers, each of which are 4xfloat vectors, and the variables in each Constant Buffer are laid out to these Constant Registers.

Textures and Texture Samplers are separate from Constant Buffer Slots, and these three are all assigned separately.


#4959887 Interleaved Arrays

Posted by clb on 16 July 2012 - 11:59 PM

Had a look over your code snippets, but couldn't find anything obviously wrong. Do you have glGetError() checks in place to detect if a call is failing?


#4959884 Handling object collisions

Posted by clb on 16 July 2012 - 11:51 PM

I use this QuadTree implementation as the broad-phase structure to detect collisions in my asteroids game.  While developing I didn't sweat about this too much, at first I just had a naive O(n^2) iteration over all object pairs, and only when I got to having so many objects that I could see that in the performance profile, I started to implement something smarter. The current QuadTree code is also there as something that's swappable out if it proves out to be problematic, but I think I'm quite contend with it, since I also use the structure for PVS determination for rendering and other logic.


#4959306 How to simulate daytime in directx?

Posted by clb on 15 July 2012 - 11:23 AM

For this purpose, I recommend Game Engine Gems 1 chapters 15 and 16. It describes a practical implementation of the commonly used Preetham sky model and related items, like how to compute the position of the sun and the moon. To me, these articles are "the" reference on the topic.


#4958928 GPU skinning, passing indices, weights and joints

Posted by clb on 13 July 2012 - 03:32 PM

They passed the joint matrices as uniforms and  vertex weights and indices as attributes using VBO's

I was wondering, is this how it is commonly done?


Yes, this is the standard. Since each vertex is affected by a different set of bones, you will need random access to the bone matrix palette in the vertex shader. The nice thing with this approach is that the VBO data stays always constant, and only the uniforms change.

The other sometimes seen option is to store the bone matrices in textures, and sample the textures from the vertex shader to read the bone matrices. This is done just to avoid the upper limit on the number of constants one can have, since texture memory can hold much more data. This method requires Vertex Texture Fetch support.

If you need more than four bone influences per vertex, you can use multiple vertex attributes for that, but that naturally increases processing costs. Using four influences is a kind of middle ground, and fits nicely to 4D vectors.


#4958529 matematics library

Posted by clb on 12 July 2012 - 02:08 PM

There are dozens of them around.

I am the author of MathGeoLib. While it is not a library that aims to be a direct plug-and-play solution (I don't actively maintain build systems for different platforms for it), you can try to see if it's something of use.

When I was pondering on whether I wanted to undertake the project of writing MathGeoLib, I did some research on some existing libraries. If MathGeoLib is not suitable, see the Alternatives page in MathGeoLib docs, that lists what I think are the most commonly used math libraries for games.


#4956141 Is rendering to a texture faster than rendering to the screen?

Posted by clb on 05 July 2012 - 05:16 PM

Render operations targeting an off-screen texture are not any different (on any sane platform) from rendering to the backbuffer that's going to be shown on the screen. Although one should consider:
- the backbuffer is sometimes constrained to be the size of the native display resolution. Rendering to a smaller offscreen surface first and then blitting that to the main backbuffer can reduce fillrate requirements. This kind of approach was done e.g. in the nVidia Vulcan demo, in which they rendered the particles to a smaller offscreen render target to reduce fillrate & overdraw impact caused by the particles.
- for UI, rendering to an offscreen surface first can give performance improvements, as it can function as a form of batching & caching. Since you can reuse the contents of an offscreen render target between frames, you can draw multiple UI elements to it and reuse the offscreen texture between frames as long as the image doesn't change. One approach could be to store an offscreen cache texture for each individual window in your UI system, and render the contents of each window to these offscreen surfaces only when the window content changes, and each frame, you will then only need to render the UI cache textures to the main backbuffer. This helps to reduce #render call counts and repeated rendering of identical content each frame.
- rendering to an screen-sized offscreen texture first, and then blitting that texture to the main backbuffer later can also dramatically impact performance. E.g. Tegra2 Android devices can do about 2.5x fullscreen blits per frame, if they are targeting 60fps. That is not much, and rendering algorithms that require temporary offscreen surfaces (glow/bloom/hdr) can often be a no-go due to device fillrate restrictions.


#4955082 Using both TCP and UDP?

Posted by clb on 02 July 2012 - 04:24 PM

Sure, it's possible to simultaneously use both UDP and TCP.

We use that technique, but mostly for compatibility fallback purposes: clients primarily connect through UDP, but if firewalls/NATs don't allow, then go through TCP (and finally very brutishly through TCP 80). Our networking uses the kNet library, which abstracts the underlying TCP/UDP choice and exposes a message-based API.

To solve sending messages larger than the IP MTU, we manually fragment the messages to several UDP datagrams, and reassemble them on the receiving end. It's doable, and an alternative to using TCP.

If you're looking to utilize two socket connections, one TCP and one UDP simultaneously (also the same if you used two TCP ports), perhaps the first problem is that you won't have ordering, (or atomicity, when TCP is involved) guarantees across the two ports. E.g. sending to port A, then to port B, might be seen on the other end as receiving (even partially!) first from port B, then (again partially!) from port A, then from port B, port A, and so on.


#4954857 Optimal representation of direction in 3D?

Posted by clb on 02 July 2012 - 04:35 AM

The problems with using normalized 3D vectors to represent direction are:
- takes 3xfloats when only two are needed.
- after manipulating the direction, the result can drift to be non-normalized.

A more optimal compact representation of directions that doesn't suffer from the above problems is to use spherical coordinates: https://github.com/juj/MathGeoLib/blob/master/src/Math/float3.h#L284 . These are a set of two scalars (azimuth, inclination) which are rotation angles that represent the direction. This is similar to using polar coordinates in the 2D case. This reduces the space restriction to 2xfloats, and there's no way to drift the values to produce invalid non-directions. Using spherical coordinates is very effective e.g. when streaming data through network, to minimize data requirements (e.g. kNet uses these).

The problems with spherical coordinates are
- you can't do arithmetic directly on them without converting them to vector form first. E.g. rotating a direction represented by spherical coordinates by a quaternion/matrix is convoluted.
- conversion between spherical<->euclidean is much slower than re-normalizing a direction vector.

Because you'd have to the costly conversion between spherical<->euclidean at every point where you do math on them, it's far easier and faster to just use normalized euclidean direction vectors, and re-normalize at key points to avoid drifting.


#4954480 Software renderer: write 4 colors at once without reading old pixels, how?

Posted by clb on 01 July 2012 - 04:25 AM

In AVX, there's the _mm_maskstore_ps/vmaskmovps instruction. In SSE2, there's the _mm_masmoveu_si128/maskmovdqu instruction, but note that this instruction is in the class of byte-wide integer instructions, so it can generate few cycles of stall in the pipeline when used (profile?) if a transition from float mode to int mode occurs.

If you are doing manual load-blend-store, there's the _mm_blend_ps/blendps and _mm_blendv_ps/blendvps instructions in SSE4.1, which can aid the process, although that kind of load followed by a store can be a large performance impact. For earlier than SSE4.1, that kind of blend between registers can be achieved by a sequence of and+andnot+or instructions.

I recommend the Intel Intrinsic Guide, which has the instructions in an easily searchable format.


#4954115 Cross-Platform and OpenGL ES

Posted by clb on 29 June 2012 - 04:11 PM

I am developing a cross-platform graphics API abstraction that runs against D3D11, OpenGL3, GLES2 and WebGL (gfxapi, and its platform coverage matrix). OpenGL3, GLES2 and WebGL are so close to each other, that one can comfortably share the codebase for each. In my codebase there are minimal #ifdef cases for GLES2, mostly related to added checks for potentially unsupported features (e.g. non-pow2 mipmapping).

NaCl, iOS and Android all use GLES2, so you can get to a lot of platforms with the same API.


#4953470 VBO what does GPU prefers?

Posted by clb on 27 June 2012 - 04:03 PM

The first example is called an 'interleaved' format, and the second example is called a 'planar' format.

I am under the belief that interleaved formats are always faster, and I cannot remember any source ever that would have recommended planar vertex buffer layout for GPU performance reasons. There are tons of sources that recommend using interleaved data, e.g. Apple OpenGL ES Best Practices documentation. On every platform with a GPU chip I have programmed for (PC, PSP, Nintendo DS, Android, iOS ..), interleaved data has been preferred.

There is one (possibly slight) benefit for planar data, namely that it compresses better on disk than interleaved data. It is a common data compression technique to group similar data together, since it allows compressors to detect similar data better. E.g. the crunch library takes advantage of this effect in the context of textures and reorders the internal on-disk memory layout to be planar before compression.




PARTNERS