Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 02 Feb 2011
Offline Last Active Feb 27 2016 06:52 PM

#5274678 Uber-shader approach

Posted by on 06 February 2016 - 12:23 PM

Mitigate the cost of glUseProgram() by sorting objects with the same shaders such that they can be drawn without calling glUseProgram() (render queues).
Branches inside shaders are almost free as long as all pixels in a block take the same branch. Still there is a small cost for branching, and it never makes sense to keep a branch if you have to swap shaders anyway (such that the branch is always taken in one and never in the other). For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".
A branch is costly if in a block some pixels will take one path and others the other. In that case each branch waits for the other, meaning all pixels ran both branches.
When to permutate is decided based on the balance between these costs.
L. Spiro

BTW, if one block of the branch (or even branch chain) requires high register usage, it will reduce the number of warps in flight, which can have an overall negative impact on performance (your super simple "sub-shader" may be running with the same register allocation as your super complex one).

So as a rule of thumb, it is often better to group your complex shaders together in one uber shader and your simpler shaders in another - so slightly less uber ;)

#5056292 QuickSort algorithm

Posted by on 24 April 2013 - 01:30 AM

- Is it possible to have better performances ? How ?

In addition to Hodgman's implemtation changes:
1) Choose a random pivot element, rather than the first. This greatly reduces the probability of performance leaning in the O(n^2) direction for poorly distributed (semi-sorted to fully sorted) input data. You may want to also compute the median of the first, second and middle elements, or the median of a random subset (trade off between better medians, and more computation to get them).
2) Drop to a simpler sort when the data gets small enough. E.g., insertion sort (at approx. 8-16 elements) in order to reduce overhead. A less conservative switch (which IIRC std::sort does), is a switch to a heap sort after either a certain memory footprint size is reached or stack depth is reached. This is because this approach has a bounded stack depth, and HS becomes more efficient (due to it not being inhibited by it's cache unfriendly memory access patterns on larger sets) for smaller data sets.
3) If you are using primitive types, use a SIMD sorting network when the data set for a particular recursion is small enough.
4) Separate your sorting keys from your data for cache and swap efficiency.
5) Sort multi-threaded.
6) Sort distributed.

#5048193 How does tan work exactly?

Posted by on 29 March 2013 - 06:58 PM

First you get some sun, then you get a tan :)

Tsk, tsk. Don't you know that puns about trigonometry are a sin? Why? 'Cos I said so!

#5030883 Anyone here a self-taught graphics programmer?

Posted by on 10 February 2013 - 07:30 PM

I started graphics dev towards the end of 1983.  My Acorn Electron tapes had all stretched out from overuse, so all my games were broken.  This encouraged me to read the basic and system programming manuals (at that age, I didn't really comprehend that writing software was different to playing a game -- I thought that everything done on a PC was a game :-) ).   I wrote a few half baked games (the first one was a lunar lander type - partly pulled out of code samples written for the BBC Micro), and then a few ground up little games.


I got my first x86 PC in 1986 (4.77mhz FTW and a Hercules card, no HDD), and started with GW Basic, and then later on Pascal, C and Assembler.  At some point in the late 80's to early 90's, I had an epiphany:  I don't really like making games at all, I just like to play them -- what I really enjoyed was making cool graphics via code (for games or otherwise).  Through a bunch of BBS's and friend's sharing coding "secrets", I got involved in the Demo Scene in the early 90's, which fitted my interests perfectly.  I wrote some software 3D renderers (386 DX no-FPU era), a bunch of old Mode X type special effects, etc.  I remember those days fondly: every year from 1986-2000, I would see a new effect, or feature that I had never seen before, or even imagined possible (my imagination was possibly, a bit limited :-) ).


Then I did a degree in Math and CS, which was pretty cool, because it helped a lot of the math I was doing fall into place.  Then I did a PhD in CS (graphics/computational geometry), and worked in visualization and VR for a few years, and then I spent the better part of a decade at NVIDA.


Now I work for a hedge fund.  Go figure. :-)

#4940948 Branching & picking lighting technique in a Deferred Renderer

Posted by on 17 May 2012 - 09:31 AM

My biggest concern is that your shader's register usage will be determined by your most complex lighting function, and will reduce your warp occupancy for simpler shading. The approach Hodgman mentioned will avoid this pitfall (which badly affects many engines out there tha have tried it).

#4932670 Intersection of two CONVEX polyhedrons

Posted by on 18 April 2012 - 08:33 PM

Yeah, fortunately the BSP approach isn't necessary here. The thing that makes convex polyhedra unique is that they can be represented entirely as the intersection of a set of half spaces (constraints), and so can the intersection of an arbitrary set of convex polyhedron. To compute the intersection of two polyhedra, you just need to generate intersection of the union of the set of half-spaces.

The library approach would be to do the following: You start with the convex polyhedrons A and B, convert them to h-reps (set of half-spaces - in 3D, these will just be (oriented) plane equations), hrep(A) and hrep(B), then computer their union and convert back to the v-rep (set of vertices). So intersection(A,B) = vrep( union( hrep(A), hrep(B) ) ). Depending on what the library does, you may need to convert the resulting vertices into a trianguled hull (any convex hull implementation should do this for you).

cdd converts between hreps and vreps ( http://www.ifor.math...home/index.html ). Maybe the newer version also tesselates (I don't know - I used it last 11 years ago). Something like QHull will generate the hull mesh http://www.qhull.org/.

The non-library/direct approach would be to just take polyhedron A, and iteratively refine it, by intersecting it with the half-spaces that make up polyhedron B. So, start with A, then for each triangle of B, compute the correctly oriented plane equation, and intersect A with the half space. The fundamental algorithmic primitive here is "polyhedron-halfspace intersection". This is actually, really easy to do - triangles of A completely on the outside of the half-space, are trivially discarded, triangles of A completely on the inside of the half-space are trivially kept. Triangles that cross the plane need to be clipped.

Triangle-plane clipping is easy, and implementations are widely available. The last piece of the puzzle is to take the new vertices, resulting from the clipping, and generate a "cap" polyhedron out of them. This is also easy, just project a copy of them onto a primary plane (use the largest component of the plane normal to decide which plane to prevent degenerate projections - i..e, if |x| is the largest, then project to the YZ plane), and sort them around their centroid by angle (use atan2).

Once you have the order of vertices, you can generate triangles from one vertex to all the others, and you're done.

Once you have gone through all triangles of B, you will be left with the intersection of A and B.

#4932134 Intersection of two CONVEX polyhedrons

Posted by on 17 April 2012 - 08:00 AM

Thanks for cross posting and reposting.


#4929795 Intersection of two 3D Polyhedrons

Posted by on 10 April 2012 - 12:48 AM

Sorry for the late answer, yes the polyhedra are convex

Then it's much easier: there are libraries that convert between vertex representations and hyperplane representations (and back), you just need to convert both polyhedra to hyperplane (just planes actually in 3d), take the union of these sets planes and convert it to vertex representation.

If you want a more direct approach, in 3d, you can take one polyhedron, and for each triangle, generate a plane using the triangle normal and one of it's vertices. Take these planes, and intersect it with the other polyhedron, keeping only the inside half space. Specifically, throw away all triangles whose vertices are all in the outside half space, and keep all the triangles, whose vertices are in the inside half space. Then for all the triangles that cross the plane, clip them to the inside half space. The new vertices that result from the clipped edges will now form a convex polygon embedded within the the intersection plane. Triangulate this. Repeat this for each triangle/plane of the first polyhedron, and you're done!

#4923682 Concurrent fragments writes

Posted by on 20 March 2012 - 11:19 AM

This is implementation specific (and I do know of cases where this is and is not true), but yes, on most high end graphics hardware this will happen. Post-shader, there is the requirement that fragments must appear to be rendered as though they were in triangle order.

#4923283 A question about pagetables and paging

Posted by on 19 March 2012 - 07:39 AM

Functionally, you can split it any way you like, but generally you want to minimize your page walk iterations on TLB misses (usually requires a 50/50 split), and make sure that all the other system specific constraints are optimized for.

#4922706 Windows/Linux Flexible performance counter

Posted by on 16 March 2012 - 05:56 PM

How about just calling rdtsc from an asm block (code on wiki for Linux and Windows)? Since you just want relative times, you don't even have to do the unit normalization math. Should be accurate to well under a microsecond, and there shouldn't be any clock related issues unless you are running a really old PC.

BTW, why not use Atlas or MKL?

#4914460 Generating a bounding quad from a set of planar 3D vertices?

Posted by on 19 February 2012 - 02:37 AM

The depth information is implicit in the order that you traverse the cell portal graph. I.e. You first need to see through the portals of your current cell in order to see through the portals in the next cell. Mathematical induction does the rest.

No subfrustum is needed - just the refined portal in image space which should be the 2D intersection of whichever portals were traversed to get to it.

#4899458 high-end graphics engine

Posted by on 03 January 2012 - 08:01 PM

The exact answer largely depends on the number of attributes per triangle, the average size of the triangles and your pixel complexity (and a gaziliion other things). 200hz at 1M triangles is 200million triangles/sec, which is easy for any high end card today. The rates for geometry are in the 1+Billion/s range (up to 2+Billion/s if you include Tesselation) on something like a GTX580.