Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 22 May 2004
Offline Last Active Jul 29 2015 11:27 AM

#5242504 Understanding Emscripten

Posted by clb on 24 July 2015 - 04:31 PM

I still can't wrap my head around how Fastcomp fits into the picture if asm.js also generates JS. Is Fastcomp used by asm.js to actually generate the JS? It sounds like Fastcomp was a replacement for the compiler core by Emscripten.


Historically, the final Emscripten JavaScript code generation pass was written as a JavaScript command line application (executed in node.js shell). After a while, this became a performance bottleneck (mostly due to having to parse LLVM IR bytecode files in JS), so we needed to optimize the JavaScript code generation pass. This was done by implementing the final JS code generation in C/C++, and directly as a backend for LLVM (note that this is not currently a tablegenning backend, but similar to the LLVM CppBackend pass). The project lived in the github branch "fastcomp", and the name stuck around as the colloquial term for the portion of the Emscripten compiler that is implemented as part of the LLVM toolchain, as opposed to the Emscripten frontend, which a bunch of other python and JS scripts that comprise the whole Emscripten toolchain.


Currently "fastcomp" is the only supported code generation mode, but for a while back, we supported both old and new code generation modes in parallel. Also, fastcomp is only able to target asm.js style of JavaScript. The old pre-fastcomp JS compiler backend was able to target both asm.js and non-asm.js style of JavaScript, in order to give projects a managed migration path from the very old non-asm.js codegen to the more performant asm.js codegen. Today there is no reason to use the old non-fastcomp compiler backend anymore, and it has long since been deprecated and removed from Emscripten code tree, as the fastcomp backend has matured enough to be the only codegen path.

#5242356 Capture what is Rendered in OpenGL

Posted by clb on 24 July 2015 - 04:16 AM

When you say "capture", there's still a question of where you want to capture it to? Do you want to capture it to GPU memory for further GPU processing, or do you want to capture it back to CPU memory for e.g. saving to disk, or for other CPU processing?


If you want to capture the data to GPU memory for further GPU operations, the best way is to create a texture or a renderbuffer, and attach that to an FBO, and directly render the content to that FBO. After that you have the contents in a texture and can do whatever GPU side operations to it you want.


If you want to capture the data to CPU memory, then glReadPixels is your only choice if you are looking to use only core OpenGL ES 2 without assuming any extensions. glReadPixeling the default backbuffer can be very heavy, so it might be useful to use a FBO here as well, and glReadPixel the contents of the FBO instead of the default backbuffer.


If you are looking to use OpenGL ES 3, or have the OpenGL ES 2 https://www.khronos.org/registry/gles/extensions/NV/NV_pixel_buffer_object.txt extension, then you can use PBOs (pixelbuffer objects) as an optimized alternative to glReadPixels. (See the section "Asynchronous glReadPixels" in the description of that extension)

#5241940 How to find projection plane for 3d points

Posted by clb on 22 July 2015 - 09:24 AM

"Nicely" is a bit informal, but if somehow maximizing the area of the distances between the points is important, then two approaches that come to mind:

  - Use PCA to compute the direction of maximum spread and use the axis information to form the plane, e.g. https://en.wikipedia.org/wiki/Principal_component_analysis and http://setosa.io/ev/principal-component-analysis/

  - First compute the diameter of the 3D point set, http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/intro.htm to find the first direction for one of the axes of the plane, and then form the second axis by finding the diameter of the 2D point set that is formed by projecting along the first chosen axis.


These might give different kinds of planes compared to Newell's method.

#5240143 Computex 3D box coordinates from min() and max() points

Posted by clb on 13 July 2015 - 03:55 PM

Like said above, if you have an oriented box in 3D space, i.e. one where the edges don't line up with the coordinate axes, then if you only know two corners (diagonally opposite or not), there is still not enough information to compute the complete box. Not even when the box is a cube (all side lengths the same).


If you do have a rotation matrix or a quaternion representing the orientation of the box, in addition to knowing two corner coordinates that are diagonally opposite, then you can compute other six corners of the oriented box. One way to do this is to compute the three world axis directions of the box from the rotation matrix (the columns or rows of the matrix, depending on your math conventions), and then project the two corner points to each of these directions in turn, and taking the extents.

#5240142 Picking a Random Uniformly Distributed Point in a Circle

Posted by clb on 13 July 2015 - 03:48 PM

For some random sampling problems, rejection sampling ends up being the fastest and most elegant solution in practice. Related to this, the article states


You can use a while loop if you prefer but there is no way to know how long this operation will take. This might be acceptable if you are using the box method during an initialization process but if you need to pick random points in a circle per frame or in an update loop, there are much better ways of doing it.



however I would have liked to see a proof of why using a square root is better than using a loop. The first intuition is most often, like presented in the article, that the loop must be slow, since there's no way to know how many iterations it will take, but in practice, the probability that one takes e.g. more than 20 iterations is A(box)/A(circle)^20 = 1/ ( pi * 0.25)^20 < 0.8%, and the rate diminishes here very fast. It is not immediately obvious that using a square root is faster than using a loop which most of the time takes very few iterations, so it would be nice to see a practical benchmark to prove the superiority of the square root.


There exists other distributions which can be analytically reshaped from the uniform distribution like shown here, but very often they end up too complicated at runtime that simpler iterative methods outrun them in performance.


One practical remark on using rejection sampling, not just to solve this problem but in general, is that it is a good idea to have a fail-safe max number of iterations, e.g. 1000 or similar, instead of blindly doing a for(;;), in the case if there is a chance that the program will later be worked to interoperate with unknown random number generators, that might have a bug and generate faulty rng streams under some conditions (e.g. generating the same number again and again, like a stream of zeroes or similar), or perhaps if (external) calling code has a bug that one passes in a circle radius of 0 or negative, or something alike.

#5240137 Compute cube edges in screenspace

Posted by clb on 13 July 2015 - 03:28 PM

What I'd do is transform all the 8 corner points of the cube to 2D screen space, and then use a 2D convex hull algorithm to recompute which edges are the outside edges of the projected silhouette. That solution has the appeal of using only generic "stock" algorithms (3D->2D projection, and 2D convex hull), so it will be very easy to reason about and maintain in the long run.

#5235493 recomputing an AABB for a model (opengl)

Posted by clb on 18 June 2015 - 08:55 AM

Hey all,


I have been working on creating an AABB class that can recompute the AABB of an object with the original AABB points when the model is first loaded in.


To clarify: you have a set of points in 3D, and you want to compute the minimum axis aligned box that encloses them? Then you can easily compute that by first making the AABB span an "infinitely negative" size, and then iterating over the points, enclosing them one by one. Here is some code that does that: https://github.com/juj/MathGeoLib/blob/master/src/Geometry/AABB.cpp#L93


I'm not sure how rotations, euler angles and quaternions relate to that however, so perhaps that is not the problem, but instead you want to rotate an AABB, and then recompute a new AABB that encloses the rotated AABB? If so, then you can achieve that as shown by the code in here: https://github.com/juj/MathGeoLib/blob/master/src/Geometry/AABB.cpp#L471

#5232539 An Exact Algorithm for Finding Minimum Oriented Bounding Boxes

Posted by clb on 03 June 2015 - 04:12 AM

You say you're not an academic, but the standard of work seems up to it. Anything in particular inspire you to present your findings so rigorously?




Thanks! I do have a history of studying a Master's degree in Mathematics in algebra and group theory, but then took a software engineering path instead of pursuing an academic research career. Several of my friends are researchers in mathematics and robotics, so I guess the atmosphere is still around :) Initially I figured it would have been nice as a personal merit to get a publication out of it, but looking at it more, in some journals one has to pay a fee, they might limit your rights and sharing options to the work you did, plus I was shocked how long the review periods are. If I had wanted to pursue that option again, I think the paper would have been out around the end of the year earliest, and suddenly it did not feel that interesting anymore. The journal I initially posted it did conditionally accept the paper for later publication, if I would have shortened it, while adding a significant new body of benchmark comparisons to the existing heuristic and approximation algorithms out there. It was a valid point, since for new techniques there are never enough comparisons to begin with, but I think the paper already does have solid data to be worthwhile of reporting and I have to draw the line somewhere, especially since I had time to work on this only outside my day job on weekends and holidays. There is plenty room for further studies on the tradeoffs of optimality versus speed, but I wanted to keep the focus in this paper and not get the theoretical treatise bogged down with such comparisons. In conclusion I figured that given that my earlier bin packing survey did also "get out there" simply by posting the results to my website, I am sure that this one will as well. Plus this way, I am able to publish an interactive program along with the paper, which one could not do with a traditional media :)

#5232274 An Exact Algorithm for Finding Minimum Oriented Bounding Boxes

Posted by clb on 01 June 2015 - 07:01 PM



I'd like to present a research paper that I worked on earlier this year. It relates to a common problem in game/graphics dev so it might be useful to some of you guys as well.


In short: I have implemented a new algorithm for finding tight-fitting enclosing OBBs for 3D meshes. For suitably "nice" inputs, it runs in O(n * sqrt(n) * log^2(n)) time compared to previous best O(n^3) from O'Rourke. For degenerate/adversarial inputs, it runs in worst case O(n^3 * log n) time. A formal proof of the optimality of this new algorithm is unfortunately beyond reach, but for all tested cases, the algorithm has been able to find the optimal minimum volume OBB.


Find the links here:

I hope you find this interesting and useful!

#5212574 Collision resolution with walls

Posted by clb on 23 February 2015 - 05:51 PM

Definitely decouple collision testing from keyboard input, and compute those two completely separate from each other.


That is, when processing keyboard input, use the pressed keys to define acceleration/impulse/velocity/position update to your object (whatever the type of motion you want to model is), and then in collision detection and response, don't read the keys at all that the user pressed. If necessary, you could store the previous position of the object, but just don't read the keys.

#5212573 Non axis aligned collision detection

Posted by clb on 23 February 2015 - 05:48 PM

http://www.realtimerendering.com/intersections.html is a good source for links on intersection tests. Also since you say that you know how to test for object intersection when the objects are axis aligned, I presume you instead refer to testing intersection against axis aligned bounding boxes (AABB), or otherwise that statement makes little sense. There exists fast and simple algorithms for performing collision detection between oriented bounding boxes as well, see e.g. http://clb.demon.fi/MathGeoLib/nightly/docs/OBB_Intersects.php . For intersecting more complex shapes, typically one uses convex polyhedrons, for which the GJK and SAT algorithms apply, see e.g. https://github.com/juj/MathGeoLib/tree/master/src/Algorithm . Intersecting arbitrary concave polyhedrons in 3D becomes an art by itself, for which heavy use of spatial query structures like OBBTree and RAPID can be used, but for real-time uses, these are not very fast, and therefore not recommended.

#5158908 glDrawElements invalid operation

Posted by clb on 07 June 2014 - 08:00 AM

Try adding an assert(vao && numIndices >= 0 && numIndices % 3 == 0);  in 'void Mesh::Draw()'. If everything is drawing ok, but the error is being spammed, I assume it's something like you have one or two Meshes in the scene for which vao is null or similar.


If that doesn't help, I'd spin up AMD CodeXL (works on all GPUs, not just AMD) and take a trace of the GL calls with that. It will show the stream clearly and the state of the GL context at the time of the error.

#5158900 glFlush/Finish Uses

Posted by clb on 07 June 2014 - 07:02 AM

The short answer is: never.


A bit longer answer: All GL functions are properly guarded to ensure that if you e.g. glReadPixels or swap buffers similar, you will always get the data back after all previous operations are finished. Instead, In some environments, you have access to companion APIs (e.g. with EGL) that integrate OpenGL with some other environment, which could be a very low-level compositing library provided by the platform, or some other GPU-accelerated API. While the built-in synchronization primitives in GL properly guard all access to the data via the GL functions themselves, they often are unable to do so across different APIs or when low-level system access is performed. That is the reason why the GL specification added glFlush() and glFinish() commands to be able to synchronize across APIs. It is quite rare that game developers would be using either, so the normal operation in a game is to avoid flushing or finishing at all, they just stall the CPU for nothing.


Now, sometimes there are arguments of people saying that it improves performance if you flush right after you have finished the last rendering command of the frame. That sounds like bs, or a bad driver in action. Neither glFlush or glFinish are performance primitives.


Also, sometimes people do a glFinish() right before starting a gpu micro-benchmark, to guarantee that the GPU would be idle when the benchmark starts. That's probably the only semi-decent use of a finish that I've heard of.

#5158485 Determining if a 3D point is in a 3D polygon

Posted by clb on 05 June 2014 - 01:01 PM

One needs to be very careful with the definitions. Usually a polyhedron is a closed volume of space defined by polygonal faces, which subdivides a space into two disjoint regions ("inside" and "outside"). Navigation meshes aren't usually like this - they don't form a closed volume, but instead they are just a connected set of planar polygon faces in a 3D space.


For keeping a point-polygon containment test numerically robust, one usual solution is to compute the closest point on such a polygon list to the target point, and the compare the distance of the closest point to the target point, and use that a threshold value. Another way is to imagine the polygons have a "thickness" and they get extruded in both positive and negative direction by this thickness amount by the purposes of the test. This is what MathGeoLib does in the link I posted above.

#5158479 Why do large engines use their own string class

Posted by clb on 05 June 2014 - 12:53 PM

In addition to the points mentioned above, I develop my own string class mostly because the std:: api is just plain stupid. I don't want to write over-verbose iterator madness or things like std::string::npos for operating the api. You can get a very good feeling of how obtuse it is to work with it when you search for most voted questions in StackOverflow related to std::string work: http://stackoverflow.com/questions/tagged/stdstring?sort=votes&pageSize=15 . With my own string class, I can make the functionality work as comfortably as I like. Compared to C#, JavaScript or Python, C++ has probably the weakest string api in existence.


C++11 improves handling of different unicode formats, but working with utf8/utf16/utf32 sources with just the std::string and std::wstring is a pain (see -fshort-wchar et al.). Having my own with explicit awareness for different encodings helps in my case.


Of course when you work on something like this, it should be associated with a good unit testing battery, and development will invariably take up time. It's not a "I'll just get it done in a weekend" type of project.