Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 22 May 2004
Offline Last Active Apr 24 2016 11:45 AM

#5267134 What will change with HDR monitor ?

Posted by clb on 20 December 2015 - 06:01 AM

I have a BenQ BL3201PT 32" UHD display that supports 10bit color channels on Windows desktop. In nVidia control panel settings, it shows up as a output color depth 10bpc in addition to 8pbc. Here is a screenshot of the control panel setting (googled from the web, that one has even 12bpc option it looks like)


As mentioned above, agree that simply having 10bpc over 8bpc doesn't mean much, unless the actual color space that the monitor can output would increase. This BenQ does not support the Rec.2020 colorspace (or even close), since I don't see any difference in anything with 10bpc vs 8bpc, i.e. 8bpc does not produce banding, nor does 10bpc mode produce any more brightness/luminance/colors/(dynamic) contrast. The TomsHardware test confirms that the display covers only 71.84% of Adobe RGB 1998 color space (and Rec.2020 is way larger than Adobe RGB 1998), and the 10bpc mode is implemented with FRC (frame rate control), so the "supports 10bpc" label on a display is not worth much alone.


Looking forward to seeing a display first hand that would support the full Rec.2020 space. Anyone knows if such displays are actually available for purchase yet?

#5261173 Extracting basis vectors from a quaternion

Posted by clb on 09 November 2015 - 12:02 PM

Code snippets like the above should definitely come with a comment disambiguating the used conventions. Inherently a quaternion (nor a matrix) does not have forward, right or up vectors, but it is the semantic conventions that give such meanings. What is 'Up' in one engine can be 'Forward' in the conventions of another engine. Quickly glancing, in the conventions used by the above snippet, the function ComputeRightVector is the image of (1,0,0) or +X vector when transformed (=conjugated) by the quaternion, and ComputeUpVector is the image of (0,1,0)/+Y vector, and ComputeForwardVector is the image of (0,0,1)/+Z vector transformed by the quaternion. The coordinate system is assumed to be left-handed. Were those the assumed conventions?

#5250167 What is latest version of DXSDK

Posted by clb on 01 September 2015 - 02:15 PM

If you are still targeting old Windows XP and such, then DirectX SDK June 2010 was the last released separate SDK, available here: http://www.microsoft.com/en-us/download/details.aspx?id=6812 . See also this: http://blogs.msdn.com/b/chuckw/archive/2011/12/09/known-issue-directx-sdk-june-2010-setup-and-the-s1023-error.aspx

#5248148 issues with binary files to VBO

Posted by clb on 21 August 2015 - 06:41 PM

When you examine in a hex editor the file you created on disk, is the data correct?


Did the vertex data get loaded correctly upon examination after the ifstream::read() function returns?


Does glGetError() give any errors?


Do you correctly render geometry that did not come from a file? (I assume so, since you specifically point out an issue with rendering from content from a file)


What does your rendering code look like?

#5247961 Fast sqrt for 64bit

Posted by clb on 20 August 2015 - 05:27 PM

There's benchmarks of different sqrt versions in MathGeoLib benchmark suite. Experience gained from toying around there:
    - the x87 fpu sqrt is slow, since modern math is all sse, and there's a penalty in mixing.
    - the "Greg Walsh's"/"Carmack's (inv)Sqrt" method should never be used, it is slower and more imprecise than what SSE gives.
    - don't assume std::sqrt or sqrtf would be fast, practice shows they might, or might not, and it depends on compiler+OS. Good compilers make these calls down to an intrinsic that gates on -msse or similar, bad ones perform a function call.
    - for SSE1-capable hardware there are exactly two methods to consider: _mm_sqrt_ss(x), or _mm_mul_ss(x, _mm_rsqrt_ss(x)).
    - _mm_rcp_ss(_mm_rsqrt_ss(x)) is inferior to mul+rsqrt in both performance and precision, on Intel at least.
In general, there is no free lunch, and when measuring what is fast, you are blind if you ignore the other side of the equation: precision. So whenever benchmarking for performance, remember to also benchmark for precision, since there's always a tradeoff. Custom Newton-Rhapson steps don't yield enough improvement to make any of the hacks better than SSE.
The results of the above sqrt variants on a rather old Core 2 Quad and 64-bit Win 8.1 and Visual Studio 2013 targeting 64-bit with SSE2:
Carmack's: time 10 clocks, max relative error (within x=[0, 1e20] range) to std::sqrt(): 1.7e-3
sqrtss: time 12 clocks, relative error 8.4e-8
mulss(rsqrtss): time 4.5 clocks, relative error 3.3e-4


Therefore the rule of thumb is that (on Intels at least) for precise sqrt, use _mm_sqrt_ss(x), and for the fastest possible sqrt, use _mm_mul_ss(x, _mm_rsqrt_ss(x)).

#5242504 Understanding Emscripten

Posted by clb on 24 July 2015 - 04:31 PM

I still can't wrap my head around how Fastcomp fits into the picture if asm.js also generates JS. Is Fastcomp used by asm.js to actually generate the JS? It sounds like Fastcomp was a replacement for the compiler core by Emscripten.


Historically, the final Emscripten JavaScript code generation pass was written as a JavaScript command line application (executed in node.js shell). After a while, this became a performance bottleneck (mostly due to having to parse LLVM IR bytecode files in JS), so we needed to optimize the JavaScript code generation pass. This was done by implementing the final JS code generation in C/C++, and directly as a backend for LLVM (note that this is not currently a tablegenning backend, but similar to the LLVM CppBackend pass). The project lived in the github branch "fastcomp", and the name stuck around as the colloquial term for the portion of the Emscripten compiler that is implemented as part of the LLVM toolchain, as opposed to the Emscripten frontend, which a bunch of other python and JS scripts that comprise the whole Emscripten toolchain.


Currently "fastcomp" is the only supported code generation mode, but for a while back, we supported both old and new code generation modes in parallel. Also, fastcomp is only able to target asm.js style of JavaScript. The old pre-fastcomp JS compiler backend was able to target both asm.js and non-asm.js style of JavaScript, in order to give projects a managed migration path from the very old non-asm.js codegen to the more performant asm.js codegen. Today there is no reason to use the old non-fastcomp compiler backend anymore, and it has long since been deprecated and removed from Emscripten code tree, as the fastcomp backend has matured enough to be the only codegen path.

#5242356 Capture what is Rendered in OpenGL

Posted by clb on 24 July 2015 - 04:16 AM

When you say "capture", there's still a question of where you want to capture it to? Do you want to capture it to GPU memory for further GPU processing, or do you want to capture it back to CPU memory for e.g. saving to disk, or for other CPU processing?


If you want to capture the data to GPU memory for further GPU operations, the best way is to create a texture or a renderbuffer, and attach that to an FBO, and directly render the content to that FBO. After that you have the contents in a texture and can do whatever GPU side operations to it you want.


If you want to capture the data to CPU memory, then glReadPixels is your only choice if you are looking to use only core OpenGL ES 2 without assuming any extensions. glReadPixeling the default backbuffer can be very heavy, so it might be useful to use a FBO here as well, and glReadPixel the contents of the FBO instead of the default backbuffer.


If you are looking to use OpenGL ES 3, or have the OpenGL ES 2 https://www.khronos.org/registry/gles/extensions/NV/NV_pixel_buffer_object.txt extension, then you can use PBOs (pixelbuffer objects) as an optimized alternative to glReadPixels. (See the section "Asynchronous glReadPixels" in the description of that extension)

#5241940 How to find projection plane for 3d points

Posted by clb on 22 July 2015 - 09:24 AM

"Nicely" is a bit informal, but if somehow maximizing the area of the distances between the points is important, then two approaches that come to mind:

  - Use PCA to compute the direction of maximum spread and use the axis information to form the plane, e.g. https://en.wikipedia.org/wiki/Principal_component_analysis and http://setosa.io/ev/principal-component-analysis/

  - First compute the diameter of the 3D point set, http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/intro.htm to find the first direction for one of the axes of the plane, and then form the second axis by finding the diameter of the 2D point set that is formed by projecting along the first chosen axis.


These might give different kinds of planes compared to Newell's method.

#5240143 Computex 3D box coordinates from min() and max() points

Posted by clb on 13 July 2015 - 03:55 PM

Like said above, if you have an oriented box in 3D space, i.e. one where the edges don't line up with the coordinate axes, then if you only know two corners (diagonally opposite or not), there is still not enough information to compute the complete box. Not even when the box is a cube (all side lengths the same).


If you do have a rotation matrix or a quaternion representing the orientation of the box, in addition to knowing two corner coordinates that are diagonally opposite, then you can compute other six corners of the oriented box. One way to do this is to compute the three world axis directions of the box from the rotation matrix (the columns or rows of the matrix, depending on your math conventions), and then project the two corner points to each of these directions in turn, and taking the extents.

#5240142 Picking a Random Uniformly Distributed Point in a Circle

Posted by clb on 13 July 2015 - 03:48 PM

For some random sampling problems, rejection sampling ends up being the fastest and most elegant solution in practice. Related to this, the article states


You can use a while loop if you prefer but there is no way to know how long this operation will take. This might be acceptable if you are using the box method during an initialization process but if you need to pick random points in a circle per frame or in an update loop, there are much better ways of doing it.



however I would have liked to see a proof of why using a square root is better than using a loop. The first intuition is most often, like presented in the article, that the loop must be slow, since there's no way to know how many iterations it will take, but in practice, the probability that one takes e.g. more than 20 iterations is A(box)/A(circle)^20 = 1/ ( pi * 0.25)^20 < 0.8%, and the rate diminishes here very fast. It is not immediately obvious that using a square root is faster than using a loop which most of the time takes very few iterations, so it would be nice to see a practical benchmark to prove the superiority of the square root.


There exists other distributions which can be analytically reshaped from the uniform distribution like shown here, but very often they end up too complicated at runtime that simpler iterative methods outrun them in performance.


One practical remark on using rejection sampling, not just to solve this problem but in general, is that it is a good idea to have a fail-safe max number of iterations, e.g. 1000 or similar, instead of blindly doing a for(;;), in the case if there is a chance that the program will later be worked to interoperate with unknown random number generators, that might have a bug and generate faulty rng streams under some conditions (e.g. generating the same number again and again, like a stream of zeroes or similar), or perhaps if (external) calling code has a bug that one passes in a circle radius of 0 or negative, or something alike.

#5240137 Compute cube edges in screenspace

Posted by clb on 13 July 2015 - 03:28 PM

What I'd do is transform all the 8 corner points of the cube to 2D screen space, and then use a 2D convex hull algorithm to recompute which edges are the outside edges of the projected silhouette. That solution has the appeal of using only generic "stock" algorithms (3D->2D projection, and 2D convex hull), so it will be very easy to reason about and maintain in the long run.

#5235493 recomputing an AABB for a model (opengl)

Posted by clb on 18 June 2015 - 08:55 AM

Hey all,


I have been working on creating an AABB class that can recompute the AABB of an object with the original AABB points when the model is first loaded in.


To clarify: you have a set of points in 3D, and you want to compute the minimum axis aligned box that encloses them? Then you can easily compute that by first making the AABB span an "infinitely negative" size, and then iterating over the points, enclosing them one by one. Here is some code that does that: https://github.com/juj/MathGeoLib/blob/master/src/Geometry/AABB.cpp#L93


I'm not sure how rotations, euler angles and quaternions relate to that however, so perhaps that is not the problem, but instead you want to rotate an AABB, and then recompute a new AABB that encloses the rotated AABB? If so, then you can achieve that as shown by the code in here: https://github.com/juj/MathGeoLib/blob/master/src/Geometry/AABB.cpp#L471

#5232539 An Exact Algorithm for Finding Minimum Oriented Bounding Boxes

Posted by clb on 03 June 2015 - 04:12 AM

You say you're not an academic, but the standard of work seems up to it. Anything in particular inspire you to present your findings so rigorously?




Thanks! I do have a history of studying a Master's degree in Mathematics in algebra and group theory, but then took a software engineering path instead of pursuing an academic research career. Several of my friends are researchers in mathematics and robotics, so I guess the atmosphere is still around :) Initially I figured it would have been nice as a personal merit to get a publication out of it, but looking at it more, in some journals one has to pay a fee, they might limit your rights and sharing options to the work you did, plus I was shocked how long the review periods are. If I had wanted to pursue that option again, I think the paper would have been out around the end of the year earliest, and suddenly it did not feel that interesting anymore. The journal I initially posted it did conditionally accept the paper for later publication, if I would have shortened it, while adding a significant new body of benchmark comparisons to the existing heuristic and approximation algorithms out there. It was a valid point, since for new techniques there are never enough comparisons to begin with, but I think the paper already does have solid data to be worthwhile of reporting and I have to draw the line somewhere, especially since I had time to work on this only outside my day job on weekends and holidays. There is plenty room for further studies on the tradeoffs of optimality versus speed, but I wanted to keep the focus in this paper and not get the theoretical treatise bogged down with such comparisons. In conclusion I figured that given that my earlier bin packing survey did also "get out there" simply by posting the results to my website, I am sure that this one will as well. Plus this way, I am able to publish an interactive program along with the paper, which one could not do with a traditional media :)

#5232274 An Exact Algorithm for Finding Minimum Oriented Bounding Boxes

Posted by clb on 01 June 2015 - 07:01 PM



I'd like to present a research paper that I worked on earlier this year. It relates to a common problem in game/graphics dev so it might be useful to some of you guys as well.


In short: I have implemented a new algorithm for finding tight-fitting enclosing OBBs for 3D meshes. For suitably "nice" inputs, it runs in O(n * sqrt(n) * log^2(n)) time compared to previous best O(n^3) from O'Rourke. For degenerate/adversarial inputs, it runs in worst case O(n^3 * log n) time. A formal proof of the optimality of this new algorithm is unfortunately beyond reach, but for all tested cases, the algorithm has been able to find the optimal minimum volume OBB.


Find the links here:

I hope you find this interesting and useful!

#5212574 Collision resolution with walls

Posted by clb on 23 February 2015 - 05:51 PM

Definitely decouple collision testing from keyboard input, and compute those two completely separate from each other.


That is, when processing keyboard input, use the pressed keys to define acceleration/impulse/velocity/position update to your object (whatever the type of motion you want to model is), and then in collision detection and response, don't read the keys at all that the user pressed. If necessary, you could store the previous position of the object, but just don't read the keys.