Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 22 May 2004
Offline Last Active Apr 27 2015 09:07 AM

#4865462 GPGPU vs classic shader pipeline: rendering performance

Posted by clb on 24 September 2011 - 06:53 AM

GPGPU techniques are most often used to perform generic (non-rendering) computation, not for rendering.

Though, there is some amount of research that has been done to implement rendering using GPGPU methods. For a very recent one, see a paper by Laine and Karras: High-Performance Software Rasterization on GPUs (in High-Performance Graphics 2011). Their measure is that the traditional rendering APIs beat a CUDA-implemented renderer by a factor of 2-8x.

Also, other sources exist. Perhaps a more interesting one is this: Alternative Rendering Pipelines in CUDA. The prospects for implementing a renderer in CUDA might be to solve the order-independent-transparency problem (see e.g. this), or to enable real-time raytracing.

I have no experience with GPGPU but the argument put forward was, that GPGPU will always be faster because the API layer is thinner while accessing the exact same hardware and because of shared memory also.

In the light of what I linked to above, I don't think this statement is true at all.

If someone knows about published papers similar to Laine and Karras, please share a link. This is an interesting topic to me as well.

#4859921 Quaternion To Local x-, y-, and z-axes (rotation matrix)

Posted by clb on 10 September 2011 - 02:50 AM

So when i want to test ray vs my obb i need my U vector not my Rotation?

Yes, the code in the book you mention will use the U vector array to detect intersections, and not the quaternion.

The code you posted looks good.

#4859730 Quaternion To Local x-, y-, and z-axes (rotation matrix)

Posted by clb on 09 September 2011 - 02:25 PM

The u vectors are normalized direction vectors which specify in world space the directions of the OBB local axes.

Meaning, u[0] points to the direction the OBB local X axis maps to in world space, u[1] for Y axis, and u[2] for Z axis.

If you have a quaternion that represents your OBB orientation, you can compute these world space axes by transforming the elementary basis vectors by the quaternion, like

u[0] = quat*(1,0,0)

u[1] = quat*(0,1,0)

u[2] = quat*(0,0,1)

(where * denotes the conjugation of a vector by a quaternion, as normal).

The representation of your OBB using a center point, half-size extent vector and the u array is "sufficient", i.e. you don't need to store the original quaternion you derived the OBB from after that.

#4853544 Need some advice on a algorithm building a lookup list

Posted by clb on 25 August 2011 - 01:09 AM

Sleeping over the night, I suspect I got the rounding off on the conversion. This might work correcly?


y=Y - X

and the opposite:

That gives the wanted (7,12) on the example you gave me, but I didn't try out other values, so be sure to test it out properly.

#4853404 Need some advice on a algorithm building a lookup list

Posted by clb on 24 August 2011 - 03:51 PM

Then you could try covering the map with a double coordinate system, like follows: http://imageshack.us...2/unledefs.jpg/

Call the black coordinates CARTESIAN coordinates, in which it is easy to move in different compass directions just by incrementing or decrementing x and y.

Denote by STRANGE the red coordinates you are using.

Then you have a bijective mapping from a pair of STRANGE coordinates (x,y) to CARTESIAN coordinates (X,Y) as follows:

 X=x-y/2 (using integer division)
 Y=x+y/2 (using integer division)

and the reverse

x=(X+Y)/2 (using integer division)

Now, to compute movements on the map, transform your input STRANGE coordinates (x,y) to CARTESIAN (X,Y), then move e.g. 5 units north to get (X+5,Y), then convert this back to STRANGE (which gives in this case ( (X+Y+5)/2, y-5) ).

This is also constant time, and note that it doesn't require looping, since you can add arbitrary values to cartesian (X,Y), e.g. 4 units north and 7 units east is (X+4,Y+7).

#4853393 MSVC++ paths in #pragma

Posted by clb on 24 August 2011 - 03:23 PM

Try escaping the '\' characters, or try using forward-slashes, e.g.

#pragma comment( lib, "C:\\Folder\\SomeKindOfLib.lib" ) // or

#pragma comment( lib, "C:/Folder/SomeKindOfLib.lib" )

#4853385 Need some advice on a algorithm building a lookup list

Posted by clb on 24 August 2011 - 03:05 PM

I am using a Isometric slide map style to draw my game, one of the problems with doing this is the coordinate system is a bit different than a standard X / Y axis for figuring out movement etc.

To figure out distance you need to do tilewalks because as shown in this picture the coordinate system is a bit strange!

I wrote a function to do a standard tilewalk given a set of coordinates and returns the next space you would be on depending on what direction you move (compass directions)

But the problem is as you go up in movements it takes way more processing, exponentially more, ...

I really would recommend re-doing your coordinate system to what you referred to as a "standard X/Y axis" format. That will allow you to optimize the distance function from exponential time to a constant time!

Don't let that 45 degrees rotation in rendering distract the logic part of it. Just imagine your map rotated back 45 degrees, that means your NORTH is towards up-right on the map, EAST to down-right, SOUTH is down-left, and WEST is up-left. So if you go NORTH, you just do y++, EAST = x++, SOUTH = y--, and WEST = x--.

#4851822 Constrain a point within a cicle with sliding beahviour

Posted by clb on 21 August 2011 - 01:14 AM

Its a 2D problem involving a simple point, with direction and speed, that must be contained within a (very) large circle. If in a single frame the point collides/intersects with the circumference of the circle then any 'distance' to travel remaining should 'slide' it around the circumference, with respect to its original direction.

I'm contemplating perhaps using the tangent to calculate the final position (which would be outside of the circle) then use the line from the final position to the center of the circle to intersect with the circle circumfernece to move the point back within the circle. This although not accurate would be perfectly suitable for my needs as its not a simulation, just a mechanic to keep a game object within a circle and let it slide around.

I think you can use this method and solve exactly the distance to travel. If we denote by A the point before movement, B the point after movement (outside the circle), O the circle with position P and radius R, then

1. Adapt this to compute the intersection of the line segment (A,B) and the circle O. Denote the intersection point by C.
2. Determine whether you need to go clockwise or counterclockwise along the circle: Take the perp-dot product (e.g. here, but don't invoke sin(), but use the method of replacing (x,y) with (-y, x) to generate a perpendicular vector to dot against) between the vectors normalized(B-C) and normalized(C-O) and depending on the sign of the result, you go CW or CCW.
3. Compute d=|C-B| to get the distance you need to travel along the circle perimeter. Generate a standard 2D rotation matrix M to rotate about origin. Use as the rotation angle the angle +/- d/R, depending on the sign of the perp-dot product.
4. Now the final point that has the sliding applied is M * (C-P)+P.

#4851545 Bounding Sphere from OBB

Posted by clb on 20 August 2011 - 04:47 AM

So you are saying that with any minimum OBB, taking the length of the vector formed by the 3 extents will give a minimum bounding sphere?

This statement is slightly ambiguous, and wrong or right depending on how you interpret it. If you have an object M (such like the character mesh in your picture), and the minimal enclosing OBB O for the object M, then the following statements hold:

1) The sphere S which minimally encloses O is computed by taking the center of O as the center of sphere, and using as the radius half the length of the diagonal of O.

2) Even if S computed like above minimally encloses O, it most often is not the minimal enclosing sphere for the object M. (It can be quite bad, as you noticed)

So you see that the "minimally enclosing" property is not transitive. If S minimally encloses O, and O minimally encloses M, then it does not follow that S also minimally encloses M.

To produce a minimal enclosing sphere, see for example the Miniball code, or Welzl's algorithm. For a fast and very good approximation, see the Ritter's algorithm.

#4851502 Android C++ Development

Posted by clb on 20 August 2011 - 12:45 AM

Nowadays you can do almost everything just from the C++ side using the Android NDK. The amount of stuff you can do depends somewhat on which version of the platform you are targeting. There is a native-activity sample in the NDK demos, which is an example on how to develop an application completely without the use of a Java activity wrapper. (See the folder android-ndk\samples\native-activity in where you installed the NDK) Read the files in the docs folder of the NDK to see what can be done. Also, unfortunately they haven't yet documented all of the stable APIs in the NDK, so you'll have to dig through the header files directly to find out which library functions exist (see android-ndk\platforms\android-9\arch-arm\usr\include). Fortunately the .h files are somewhat documented themselves.

In the worst case, if there is no API to do something directly from the C++ side, you can fall-back and write the missing functionality using Java, and then interop with those functions from the C++ side, so you are not blocked completely from accessing any API in the system.

For C++ side development, they have stlport to provide stl support, and I think rtti also came in recently. See android-ndk\docs\CPLUSPLUS-SUPPORT.html for more information on the C++ features support.

The NDK uses its own build script system on top of the gcc toolchain. A good step to start is to try to get the NDK samples to build and run on your device. native-activity and native-plasma are ones which do not contain any java code. Go through the different build script files and try to figure out what commands you need to run to build and start the applications on your device (documented at the Android NDK page).

I am using vs-android for development, which does have some oddities, but all-in-all works very nice, and makes the workflow a lot easier.

#4851498 help: advancing part of a frame

Posted by clb on 20 August 2011 - 12:26 AM

Having detected that a collision has happened, I need to back up the objects to somewhere between the previous frame (where the objects were not in collision) and the current frame (where they are in collision). I have the transformation matrix for last frame and for this frame. The question is, can I somehow create a new transformation matrix that is effectively a part way "interpolation" between the previous frame transformation matrix and the current frame transformation matrix (for some arbitrary fraction of 1 frame time)?

The usual way to approach this problem is to not do this backing up at the matrix level, but to go back to the physics subsystem, and apply a physics update with half the delta-time of the whole frame. Of course you can interpret the data from matrices, but it will require some conversions.

The problem is this. The transformation matrix is usually modified by only one or two transformations (one rotation and/or one translation). However, there is no limitation on how many rotations and translations can be performed between frames. Therefore, to try the alternative to "interpolating the matrix" I would have to queue up some representation of all rotations and translations (at least) between frames and keep them available for all objects, just in case they encountered a collision next frame.

Clearly it would be more convenient to interpolate matrices if that works.

The usual way to interpolate between matrices which represent affine orthogonal bases is to decompose the matrix to a series of translation (float3), rotation (quaternion) and scale (float, or float3 depending on if you allow nonuniform scale). Then, you linearly interpolate the float3's, and slerp the quaternions. To decompose a matrix to scale, rotate and translate (assuming no shear), you read the 3x1 column vector from the top three elements of the fourth column (transpose that if working on v*M notation, like in D3D), then extract the magnitudes of the three first columns for scale, and finally convert the normalized 3x3 upper part to a quaternion (see Q55 at this page).

After interpolating the components individually, you compose the interpolated data back to a matrix form.

I guess the alternative is to interpolate the intermediate matrix, if that is more practical. In other words, during any frame, all rotations and translations are applied to a clean [identity] matrix to accumulate the series of rotations and translations. When it is time to translate the vertices, that intermediate matrix is multiplied by the main transformation matrix that transforms local-coordinates into world-coordinates. Since that intermediate matrix ONLY contains information about the transformations happening this frame, perhaps that intermediate matrix is an easier item to interpolate.

I don't think working on an identity matrix for the duration of that frame makes it any more easier, since if we denote by A the local->world matrix at the start of the frame, and by B the local->world matrix at the end of the frame, then you can compute B * A^{-1} to get the difference between A and B, which is exactly the series of transformations you applied to A during this frame, to get B. So you don't need to explicitly store a separate clean identity matrix on top of which to apply the per-frame transformations, you can always compute it if you know A and B.

#4837329 FBX Models

Posted by clb on 19 July 2011 - 05:56 AM

I've heard countless comments of "it's slow" "it's not meant for that" etc. Can anyone point out why it's "slow" for loading or "not meant for use
in a 3d game engine" over say .x format?  
You can save FBX as ASCII and  look at the contents in notepad which don't look any more "complex" than .x. (yet no "don't use .x!!!")

Another question I have is XNA uses FBX, Unity has a default FBX importer built in (so you can just drop fbx models into unity and they'll work properly). Maybe
I'm just misinformed ..   Also browsing through some top games game content packs (the model viewers) e.g. for. WOW or TorchLight I see it's littered with FBX models.
So uh ... these games are actually saving the FBX model to disc, loading them and then saving them again in another format?  That doesn't sound very "AAA" company like ..(?)

Performance is relative. Something that's fast for someone can mean slow for someone else.

FBX stored in ascii format, (just like .x can be ascii or .obj can be ascii) has the advantage that it is easy to parse (== easy'ish to write a loader for, and can be debugged/modified using notepad to some extent), but for runtime loading purposes, ascii formats are suboptimal compared to binary formats.

A good development file format can be different than a good end-use release file format. At development time, it is useful if you can load files of any types. For Unity and other middleware, it is advantageous to have loaders for about everything, so that the user has the flexibility to produce content in whatever format their CAD packages can provide. In Unity, for the shipped release version, the input files are run through a content builder, which packages the data to a format that is optimized for loading into Unity mesh buffers.

To get an optimal performance format, the usual simple mechanism is to generate a file format that is just a dump of the raw data you feed to your internal Mesh/Texture/Shader/... data structures. To load these binary files, you don't do much else than a single fread() to stream in the header, followed by a fread() to read in the actual per-vertex data, and that's it. Because this method will not involve any kind of per-element parsing (e.g. ascii->binary) and is not filled with small-sized file loads (e.g. fscanf to get individual lines at a time), loading is an order-of-magnitude faster.

Binary formats do have their drawbacks, e.g. supporting multiple versions can be more difficult than with ascii data.

Someone mentioned 0.2s loading times for .fbx meshes. If that sounds adequate, then it probably is. If you're looking to stream in meshes at runtime, that's most likely not enough (0.2s == 12.5 application frames if running at 60Hz).

For FBX, I'll restate these two comments from the previous posts:

"Some of the "bigger complaints" I've read is that it's not an "open" source."
"I also should mention that some malformed files caused the SDK to crash."

This can be a huge deal for some cases, or depending on your game, it might not mean a thing.

#4810999 Passing vector to a new thread

Posted by clb on 15 May 2011 - 02:34 AM

If you are referring to the Win32 API CreateThread() function, then you can use the fourth parameter lpParameter to pass in arbitrary data into the thread as a pointer to void. This is a far more preferrable way than using global variables to pass data to a new thread.

Since you can only pass in a pointer to a single object, you have to group the data to a single struct if you need to pass in multiple variables. What you can do is create something like
struct ThreadCreationParameters
   std::vector<int> myIntArray;
   int someOtherData;
   std::string someMoreData;

void StartThread()
   ThreadCreationParameters *params = new ThreadCreationParameters;
   // Populate contents of params here.
   CreateThread(..., params, ...);

   ThreadCreationParameters *params = reinterpret_cast<ThreadCreationParameters *>(param);
   // Now can access parameters with params->xxx

   // Remember to delete params after we are done.
   delete params;

Note that it is important that you dynamically allocate the ThreadCreationParameters structure instead of placing it into the stack, since after exiting StartThread(), the stack would be cleared, and MyThreadMain would be accessing freed memory.

Also, remember that any variables you pass in ThreadCreationParameters by pointer or by reference need to be appropriately guarded with a synchronization primitive if both threads will be accessing the data.

#568749 Bin packing, texture atlasing, glyph caching, how do you do it?

Posted by clb on 19 April 2010 - 12:29 PM


the topic of bin packing comes up quite often at GameDev that we have a thread roughly every month or two about it. Rectangular bin packing comes up in contexts of light mapping, texture atlasing, font glyph caching, sprite packing and RPG-like tetris inventory organization. Most of the veterans here are surely familiar with this topic.

Last August I was in turn to approach this problem, and around the same time, I had to write a smallish academic study about a topic of my choice. Now that the stuff is finally done, I thought I'd share the results here, as so many people work on these things. Hopefully someone can find this material useful.

The results of my study are available here: A Thousand Ways to Pack the Bin - A Practical Approach to Two-Dimensional Rectangle Bin Packing (pdf).

Along the way, I've written three blog posts to accompany the paper:
Rectangle Bin Packing:The recursive GUILLOTINE algorithm, a sample video.
More Rectangle Bin Packing:Videos and explanatory pictures about the SHELF, (iterative) GUILLOTINE, MAXRECTS and SKYLINE variants.
Even More Rectangle Bin Packing:Downloadable source code, links to resources.

Any feedback on the contents is greatly appreciated.

Here's a short catalog of some of the more informative threads that have come up here at GameDev:
6/15/2003, Algorithm to pack items into container: Algorithm discussion, Shelf method.
5/11/2006, Rectangle packing: jyk shares his code that implements the Guillotine method.
4/28/2007, Fitting small rectangles into a large rectangle: utilae shares his implementation of a bin packer. Some discussion about optimally solving bin packing.
11/2/2007, Need AI to fit rectangles in a larger square: Algorithm discussion.
2/5/2008, Automatic Texture Atlas Generation: Discussion on rectangle packing atlases.
11/22/2008, Packing rectangles into a rectangle: Q&A on rectangle packing and jyk's code.
12/23/2008, How to merge multi small pngs into a single one with smallest size?: Performance considerations.
5/24/2009, Allocating a texture large enough to hold all font glyphs: Font glyph caching.
8/17/2009, Lightmap generation with Radiosity: Bin packing in the context of lightmapping.
8/23/2009, Freetype: Font glyph caching. My initial results on bin packing.

The motivation of this post is not only to sum up a list of links from my behalf, but to ask all you guys, how do you do it?

Now, there's quite a lot of links in this post. No need to bother going through all of them before replying, as that could take quite a lot of time! Just if you have your material on bin packing, please share it here! Code, knowledge, or pretty pictures at least? :)