Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

Eric Lengyel

Member Since 25 Nov 2003
Offline Last Active Today, 06:35 PM
-----

#4853084 Normal mapping? Tangents? Binormals?

Posted by Eric Lengyel on 23 August 2011 - 11:20 PM

The best article I've found on tangents so far is this one, but even it assumes that I already want a tangent space and want it aligned a certain way.


I wrote that. As Hodgman describes, the tangent frame provides a matrix that allows you to transform light and view directions from object or world space into the local axis-aligned coordinate system of the normal map, or vice-versa. Basically, the tangent and bitangent at each vertex point in the directions that the x and y axes of the normal map point at those locations if you were to pull the texture off of the mesh and project it onto the tangent plane.

Btw, the term binormal is still used in a lot of places, but it's not correct in the context of normal mapping. The proper term for the direction perpendicular to both the normal and tangent is the bitangent.


#4852688 Normal mapping? Tangents? Binormals?

Posted by Eric Lengyel on 23 August 2011 - 01:47 AM

Note that using the derivative instructions to get a screen-space basis results in a severe quality reduction. It's much better to use precomputed per-vertex tangents.

Here are some comparison images. In each one, the region left of the white line uses the derivative instructions as described by the paper above. The region right of the white line uses the conventional method in which tangents are precomputed per-vertex and bitangents are generated in the vertex shader.

Posted Image

Posted Image

Posted Image

Posted Image

Posted Image

Posted Image


#4851413 Voxel Algorithm

Posted by Eric Lengyel on 19 August 2011 - 04:35 PM

Ah, I see that the location numbering in Figure 4.16 is not the numbering for which the tables were generated. This would certainly cause some confusion -- sorry about that -- I will update the paper. The case code for a transition cell should be constructed using the following location numbering:

0x040 --- 0x020 --- 0x010
  | 		| 		|
0x080 --- 0x100 --- 0x008
  | 		| 		|
0x001 --- 0x002 --- 0x004

The tables are known to be absolutely 100% correct for this numbering.  Each of the 512 cases has been manually created in a test environment and individually inspected by experts to make sure that the correct triangles are produced. The algorithm has also been tested in a real-world production environment by thousands of people.

Using the numbering above, your first case is 0x18F, and your second  case is still 0x007. These both map to class #4 in the tables, which has the same topological layout as class #12 (i.e, the same numbers of  vertices and triangles, and the same triangle layout). The class numbers in the data tables have been remapped in order to collapse topologically equivalent classes even though the vertex positions are different.


#4842640 Voxel Algorithm

Posted by Eric Lengyel on 30 July 2011 - 03:34 PM

The 3-bit direction code just tells the triangulation code where to go to find a previously generated vertex that can be reused in the current cell. Direction codes are provided in the data tables and don't need to be generated by your program. As an example, suppose you've determined that a vertex needs to appear on the edge of a cell between corners 4 and 6, then you know that vertex was already created for the cell edge between corners 5 and 7 for the preceding cell in the same row (unless this is the first cell in the row, which is why there's a mask). In this case, the binary direction code is 001, indicating that you subtract 1 from the x-coordinate of the cell. The hex code 0x11 shown in figure 3.8(b) for this case means that you apply the direction code 1 (the high 8 bits) and then choose vertex number 1 (the low 8 bits) from that cell.


#4830523 Frostbite rendering architecture question.

Posted by Eric Lengyel on 02 July 2011 - 10:17 PM

as an example, D3D10/11 hardware does not have alphatest, that's why the api also does not support it, but you can run dx9 software, that need obviously a new shader).


The alpha test is still implemented directly in the hardware on DX10/11 GPUs, and turning it on or off in DX9 or OpenGL does not cause the driver to recompile a shader. In general, however, you are correct that there are states that require the driver to recompile a shader, but they usually involve things like texture formats and framebuffer formats. (I personally feel that it was a mistake for Microsoft to remove the alpha test state from the API, and an even bigger mistake for the ARB to remove it from the "core" OpenGL.)


#4830240 glDrawElements() terrible lag, glDrawArrays fine?

Posted by Eric Lengyel on 02 July 2011 - 12:16 AM


The performance problem is due to the fact that you are not using vertex buffer objects (VBOs). Without VBOs, the driver has to copy all of your vertex data into the command buffer when you call glDrawArrays() or glDrawElements() before those functions can return. In today's multithreaded drivers, that can get very expensive, especially because putting all that data in the command buffer can cause a lot of flushes to happen, which cause the driver threads to have to synchronize with each other. The performance is worse with glDrawElements() because the driver is actually de-indexing your vertex data.

For any kind of reasonable performance, you absolutely must put the vertex data and index data in VBOs. Then, the glDrawArrays() and glDrawElements() functions can return immediately, no data has to be copied by the driver, and no thread synchronization has to occur in the driver.


I don't think that's quite true.  There are plenty of practical real-world examples that don't use VBOs but which can still get reasonable performance despite that (even on today's multithreaded hardware), and - in fact - if you're using a dynamic VBO that you must refill every time you draw from it, then client-side arrays are most likely going to be the fast path anyway.  Plus, while glDrawElements certainly has some additional overhead, it's nowhere near that much.  A drop to 1 FPS is certainly not symptomatic of not using VBOs.  (Interesting to note that if you do use VBOs but don't have a good understanding of how the hardware and CPU/GPU synchronisation works then you will lose performance with them when compared to client-side arrays, typically from multiple GPU-stalls per frame.)

There's something else wrong here.  I wonder what would happen if the OP split his drawing into two batches of 8k quads each.


I didn't see the mention of 1 fps in the first post before. Yeah, there is something else going on. However, everything I said is valid. I don't mean that not using VBOs will perform terribly, but you should always be able to beat client-side arrays by using VBOs, even for dynamic vertex data that changes every frame. Client-side arrays are never faster unless you're talking about a very small number of vertices. In either case (dynamic VBO or client-side arrays), the driver has to copy the vertex data to an internal buffer, and the hardware has to read that data through a DMA. The difference is that by using a VBO, you don't risk filling up the command buffer and causing a flush that could stall the CPU, and you don't pay for the CPU copy if the vertex data doesn't change. And don't forget that the driver has to de-index the vertex data if you use client-side arrays with glDrawElements(). There's no way that's ever going to be faster than using VBOs.


#4828857 glDrawElements() terrible lag, glDrawArrays fine?

Posted by Eric Lengyel on 28 June 2011 - 04:37 PM

The performance problem is due to the fact that you are not using vertex buffer objects (VBOs). Without VBOs, the driver has to copy all of your vertex data into the command buffer when you call glDrawArrays() or glDrawElements() before those functions can return. In today's multithreaded drivers, that can get very expensive, especially because putting all that data in the command buffer can cause a lot of flushes to happen, which cause the driver threads to have to synchronize with each other. The performance is worse with glDrawElements() because the driver is actually de-indexing your vertex data.

For any kind of reasonable performance, you absolutely must put the vertex data and index data in VBOs. Then, the glDrawArrays() and glDrawElements() functions can return immediately, no data has to be copied by the driver, and no thread synchronization has to occur in the driver.


#4824205 GameEngine for building on 3D concepts..

Posted by Eric Lengyel on 16 June 2011 - 01:50 PM

If you're looking for C++, a solid architecture, OpenGL-based rendering, physics, and everything else found in a full-featured game engine, then C4 might be of interest to you:

http://www.terathon.com/c4engine/index.php


#4771574 Marching cubes, Octree, and LOD seams

Posted by Eric Lengyel on 08 February 2011 - 04:37 PM

The look-up tables that I said I would release are now available here:

http://www.terathon.com/voxels/


#4753715 Marching cubes, Octree, and LOD seams

Posted by Eric Lengyel on 02 January 2011 - 02:00 PM

Quote:
Original post by c_olin
Has anyone successfully implemented seamless LOD with marching cubes before?
Quote:
Original post by XeonXT
I've read a substantial amount of literature on MC, and don't believe I have ever encountered someone doing chunked LOD terrain with it.


The C4 Engine has been rendering voxel-based terrain with Marching Cubes and chunked LOD for about a year now.

Quote:
Original post by swiftcoder
Welcome to my (least) favourite topic!
Quote:
Original post by PolyVox
http://www.terathon.com/lengyel/Lengyel-VoxelTerrain.pdf

I haven't looked in detail, but I believe he is introducing new MC cases which straddle the different LOD levels.
I can't download the 50 MB PDF over my ghetto internet connection, but if I recall correctly, C4 does all of its voxel tessellation offline, which gives you a lot more choices in how you resolve inter-tile edge dependencies. Eric is somewhere around the forums though - perhaps he will weigh in at some point.


Here's a version with low-resolution images -- it's 4.87 MB:

http://www.terathon.com/lengyel/Lengyel-VoxelTerrain-Low.pdf

The C4 terrain is designed for real-time interactive modification, so the triangles generated along the transition between LODs depend only on very localized data.

To implement all of this requires some look-up tables of nontrivial size. I plan to release mine in the near future.


#4483438 Why non-linear depth buffer?

Posted by Eric Lengyel on 01 July 2009 - 01:19 PM

Z is nonlinear because perspective-correct rasterization requires linear interpolation of 1/z -- linear interpolation of z itself does not produce the correct results. The hardware must calculate 1/z at each vertex and interpolate it across a triangle, so it's convenient to just write that value to the depth buffer instead of performing an expensive division at every pixel to recover z.

The fact that you get more z precision closer to the near plane is just a side effect and has nothing to do with the motivation behind 1/z interpolation.



#2750408 Integer factorization, the fast way?

Posted by Eric Lengyel on 02 November 2004 - 05:37 PM

The following pseudocode is called the Brent variation of the Pollard Rho algorithm.  For an explanation, I highly recommend chapter 5 of this book:

Bressoud, David M., Factorization and Primality Testing, Springer-Verlag, 1989.

This algorithm is very nice for factoring moderate-size numbers for which trial division would be prohibitively slow.  It's a probabilistic algorithm, however, and could have a very long running time for some input numbers.

A typical generalized factoring program would first perform trial division by all of the primes less than, say, a million.  (You'll probably want to have a table of these primes available as a data file that you've precomputed using the sieve of Eratosthenes.)  Once trial division has removed all of the small factors, the program would move on to more powerful, but still relatively simple, algorithms such as Pollard Rho.  If those fail to completely factor the input number, then you have to pull out the big guns like the Multiple Polynomial Quadratic Sieve.

Anyway, here's the Pollard Rho algorithm:


n = number to be factored
c = constant value (see comment below)
max = maximum iterations (depends on how long you're willing to wait)
check = how often to check gdc (something around every 10 iterations or so)

x1 = 2
x2 = 4 + c
range = 1
product = 1
terms = 0

for (int i = 0; i < max; i++)
{
   for (int j = 0; j < range; j++)
   {
      x2 = (x2 * x2 + c) mod n
      product *= (x1 - x2) mod n
      if (++terms == check)
      {
         g = gdc(product, n)
         if (g > 1) return (g)
         product = 1
         terms = 0
      }
   }

   x1 = x2
   range *= 2
   for (int j = 0; j < range; j++)
   {
      x2 = (x2 * x2 + c) mod n
   }
}

return (0)




The input value of c needs to be an integer that makes the polynomial x^2 + c irreducible. Ordinarily, you just start with c=1 and count up for cases 1 and 2 below.

This algorithm will return one of three possible values:

1) 0 means that the maximum number of iterations ran without a factor being found. Try choosing another value of c.
2) The number n itself means that the gcd did not find a proper factor. Try choosing another value of c.
3) Any other number means you found a nontrivial factor of n.

-- Eric Lengyel




PARTNERS