• Content count

  • Joined

  • Last visited

Community Reputation

194 Neutral

About Crowley99

  • Rank
  1. OpenGL Uber-shader approach

    BTW, if one block of the branch (or even branch chain) requires high register usage, it will reduce the number of warps in flight, which can have an overall negative impact on performance (your super simple "sub-shader" may be running with the same register allocation as your super complex one). So as a rule of thumb, it is often better to group your complex shaders together in one uber shader and your simpler shaders in another - so slightly less uber ;)
  2. QuickSort algorithm

    In addition to Hodgman's implemtation changes: 1) Choose a random pivot element, rather than the first. This greatly reduces the probability of performance leaning in the O(n^2) direction for poorly distributed (semi-sorted to fully sorted) input data. You may want to also compute the median of the first, second and middle elements, or the median of a random subset (trade off between better medians, and more computation to get them). 2) Drop to a simpler sort when the data gets small enough. E.g., insertion sort (at approx. 8-16 elements) in order to reduce overhead. A less conservative switch (which IIRC std::sort does), is a switch to a heap sort after either a certain memory footprint size is reached or stack depth is reached. This is because this approach has a bounded stack depth, and HS becomes more efficient (due to it not being inhibited by it's cache unfriendly memory access patterns on larger sets) for smaller data sets. 3) If you are using primitive types, use a SIMD sorting network when the data set for a particular recursion is small enough. 4) Separate your sorting keys from your data for cache and swap efficiency. 5) Sort multi-threaded. 6) Sort distributed.
  3.   I think that you will find that you need to ray-cast from a lot more than the corners to prevent popping/false-invisibility.  If you subdivide far enough, this won't be an issue, but then you land up with an extremely dense octree that takes ages to compute (most of the computation is redundant since you are mostly sampling the same lines again and again). For best results you will likely need to densely sample the full line space between your source and destination nodes.  This can be done by ray-casting from all over the surface of your source node (to all over the surface of your destination node, stopping if visibility is established).   This problem has been well researched.  You should check out: and for CPU based sampling approaches tha attempt to minimize the number of rays cast.   You can also use the GPU to do the visibility determination for you, or compute it exactly: Chapter 4 talks about using the GPU for sampling visibility.  The hardware at the time was dated: I expect that today you could avoid the readback by using occlusion queries or compute shader processing of the z-buffer.  Chapters 5 and 6 talks about computing the exact visibility set, but the math is a bit hairy, and it is probably overkill for mose use cases.
  4. I suggest splitting your static data on node boundaries after you have your visibility graph, since this allows you to associate data with nodes of your tree partition. Then you just draw the static geometry associated with that node if the node is visible. If you have instanced geometry in a node, you can just conservatively make the whole object visible - just be sure to do a scoreboard/mailbox check so that you don't draw it once for each visible node it intersects. You may want to do the same for non-instanced geometry if your subdivision is very fine and results in a lot of splits (since this will result in more draw calls). For dynamic objects, you can easily (via a hierarchical test) figure out which nodes it intersects, and if it intersects a visible node, then it is treated as visible. BTW, How do you intend to compute the PVS for the nodes? (which approach/algorithm - it's a difficult problem) For best results, I suggest a general axially aligned BSP tree, since you ideally want to keep your leaf nodes cubic.
  5. How does tan work exactly?

    Tsk, tsk. Don't you know that puns about trigonometry are a sin? Why? 'Cos I said so!
  6. I started graphics dev towards the end of 1983.  My Acorn Electron tapes had all stretched out from overuse, so all my games were broken.  This encouraged me to read the basic and system programming manuals (at that age, I didn't really comprehend that writing software was different to playing a game -- I thought that everything done on a PC was a game :-) ).   I wrote a few half baked games (the first one was a lunar lander type - partly pulled out of code samples written for the BBC Micro), and then a few ground up little games.   I got my first x86 PC in 1986 (4.77mhz FTW and a Hercules card, no HDD), and started with GW Basic, and then later on Pascal, C and Assembler.  At some point in the late 80's to early 90's, I had an epiphany:  I don't really like making games at all, I just like to play them -- what I really enjoyed was making cool graphics via code (for games or otherwise).  Through a bunch of BBS's and friend's sharing coding "secrets", I got involved in the Demo Scene in the early 90's, which fitted my interests perfectly.  I wrote some software 3D renderers (386 DX no-FPU era), a bunch of old Mode X type special effects, etc.  I remember those days fondly: every year from 1986-2000, I would see a new effect, or feature that I had never seen before, or even imagined possible (my imagination was possibly, a bit limited :-) ).   Then I did a degree in Math and CS, which was pretty cool, because it helped a lot of the math I was doing fall into place.  Then I did a PhD in CS (graphics/computational geometry), and worked in visualization and VR for a few years, and then I spent the better part of a decade at NVIDA.   Now I work for a hedge fund.  Go figure. :-)
  7. Graphics Pipeline Stages

    The above link will certainly answer your question, but the short form is: [list] [*]My question though, is at what point does the GPU stop working on one stage and then move on to the next? [/list] As long as there is data available for a particular stage, the GPU will work on it. In general, all these stages are active at once (some warps are running vertex shaders, some warps are running pixel shaders (and obviously geometry, hull/domain shaders, etc.), and the fixed function hardware doing raster, clipping raster ops, etc are also running simultaneously), and working on different triangles.[list] [*]For example, does the GPU immediately begin rasterizing a triangle the moment it has three vertices fully processed, or does it finish processing the entire mesh and THEN begin rasterizing? [/list] Like most things in the GPU, workloads are batched. The GPU will finish work on a batch of vertices (in the order of ~32), and then pass them down to the set up and rasterization stages along with their connectivity information.[list] [*]Similarly, does the GPU start running the fragment shader on each fragment as soon as it is generated by rasterization, or does it create a batch of fragments and then shade the whole set? [/list] The rasterizer will try to create a batch of fragments that are available "simultaneously". This gets processed in chunks by the pixel shader.
  8. The previous post on duality is correct - IIRC, the planes do need to contain the origin for this to work (you can translate the planes by any point inside if it's not). For a direct solution If you take any 4 non-parallel planes in your set, there are four ways to choose three planes. The intersection of these 3 planes will give you a vertex. This will give you four vertices - turn this into a tetrahedron (4 triangles, 4 vertices, 6 edges). Then iterate through the remaining planes clipping this volume at each iteration. At the end you have the resulting polyhedron. If you search my posts, I gave a lot more detail on one of my last posts about how to do the clipping (as part of a convex polyhedron intersection algorithm).
  9. [quote name='InvalidPointer' timestamp='1337279772'] Nope, you're going to take all paths and pick the result you would get at the end. CPUs can generally jump around in the instruction stream to skip work, but that's not really a functionality GPUs have-- you will have to settle for a conditional move that copies a register value if a certain condition evaluates to true. Obviously, this ends up being rather inexpressive. You could use dynamic branching, but that pretty much just rearranges the masking so that you run each group of BRDF'd pixels in sequence. It's better, but is still skirting the 'unacceptable' performance range. GPUs work best when computing lots of instruction/data streams in parallel, and as soon as you start introducing sequencing (transparent as it may be in the initial code) you reduce work opportunities. [/quote] GPUs can and do do dynamic branching to skip work. If only 3 possibilities are realized within a warp, those will be the only 3 that are executed. Predicated moves will only occur if the branched code is short enough to not justify the jump.
  10. My biggest concern is that your shader's register usage will be determined by your most complex lighting function, and will reduce your warp occupancy for simpler shading. The approach Hodgman mentioned will avoid this pitfall (which badly affects many engines out there tha have tried it).
  11. [quote name='Outthink The Room' timestamp='1336334674' post='4937858'] But since the empty space between polygon points would now be replaced with atoms, wouldn't the need for textures be replaced as well? [/quote] Yes, the surface textures could be replaced by atoms - many, many atoms. :-) He was setting up a straw-man: it may (arguably) be more efficient to represent certain surfaces that way, but then it becomes unclear how to use standard techniques such as coloring, displacement mapping or normal mapping, on top of this - you can of course represent the surface detail achieved by these techniques with these "atoms" directly, but then the atom count explodes, and is likely far less efficient of a representation, not more.
  12. Intersection of two CONVEX polyhedrons

    Good to hear it :-)
  13. [quote name='MJP' timestamp='1335427587' post='4934994'] You also have to watch out for increased register pressure from having too many branches, since the compiler will need to allocate enough registers to handle both sides of the branch. [/quote] Just highlighting this. If you mix shaders that do a lot of complex lighting math with shaders that are relatively simple, the register requirements of the complex case will kill your warp occupancy (and hence performance) in the simple case.
  14. Intersection of two CONVEX polyhedrons

    Yeah, fortunately the BSP approach isn't necessary here. The thing that makes convex polyhedra unique is that they can be represented entirely as the intersection of a set of half spaces (constraints), and so can the intersection of an arbitrary set of convex polyhedron. To compute the intersection of two polyhedra, you just need to generate intersection of the union of the set of half-spaces. The library approach would be to do the following: You start with the convex polyhedrons A and B, convert them to h-reps (set of half-spaces - in 3D, these will just be (oriented) plane equations), hrep(A) and hrep(B), then computer their union and convert back to the v-rep (set of vertices). So intersection(A,B) = vrep( union( hrep(A), hrep(B) ) ). Depending on what the library does, you may need to convert the resulting vertices into a trianguled hull (any convex hull implementation should do this for you). [b]cdd[/b] converts between hreps and vreps ( [url=""]http://www.ifor.math...home/index.html[/url] ). Maybe the newer version also tesselates (I don't know - I used it last 11 years ago). Something like [b]QHull [/b]will generate the hull mesh [url=""][/url]. The non-library/direct approach would be to just take polyhedron A, and iteratively refine it, by intersecting it with the half-spaces that make up polyhedron B. So, start with A, then for each triangle of B, compute the correctly oriented plane equation, and intersect A with the half space. The fundamental algorithmic primitive here is "polyhedron-halfspace intersection". This is actually, really easy to do - triangles of A completely on the outside of the half-space, are trivially discarded, triangles of A completely on the inside of the half-space are trivially kept. Triangles that cross the plane need to be clipped. Triangle-plane clipping is easy, and implementations are widely available. The last piece of the puzzle is to take the new vertices, resulting from the clipping, and generate a "cap" polyhedron out of them. This is also easy, just project a copy of them onto a primary plane (use the largest component of the plane normal to decide which plane to prevent degenerate projections - i..e, if |x| is the largest, then project to the YZ plane), and sort them around their centroid by angle (use atan2). Once you have the order of vertices, you can generate triangles from one vertex to all the others, and you're done. Once you have gone through all triangles of B, you will be left with the intersection of A and B.
  15. Intersection of two CONVEX polyhedrons

    That's not what concerns me. You should always give a link to what you have so far to prevent someone else wasting their time, giving the same, similar or less complete answers.