Dual contouring implementation on GPU

Started by
7 comments, last by BlackJoker 7 years, 3 months ago

Hi. I`ve recently started to study dual contouring/manifold dual contouring algorithms and see that they are using an octree.

But because of the fact that HLSL/GLSL has no pointers, it is impossible to implement octree in shaders at least the same way it exists on CPU.

So I wanted to ask does someone already faced with such issue and could say how to implement such algorithms completely on GPU side?

Maybe there is a way to replace octree with data structure more suitable for shaders or something like that?

Advertisement

They store the pointers in textures, as indexes to nodes.

Check out how they store octrees in here:

http://on-demand.gputechconf.com/gtc/2012/presentations/SB134-Voxel-Cone-Tracing-Octree-Real-Time-Illumination.pdf

Thanks for that.

After googling a little I found this article from GPU Gems:

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter37.html

Will study this more detail.

Hey!

I suggest you use linear octrees:

0) Build a full-res, maximally subdivided octree (basically, the same as uniform grid), simplify it by collapsing leaf nodes and create linear octree cells (store only leaf nodes with their address, locational codes or Morton codes).

I've (probably) invented the following simple approach to Adaptive Dual Contouring:

1) Sort cells by size in increasing order (so that the smallest cells (or the deepest) are placed at the beginning of the array).

2) Create a mesh vertex for each cell.

3) Iterate each cell and create quads for each active edge of the cell, using binary search for finding edge-adjacent cells.

As a powerful optimization, you can mantain an active edge mask for each cell as in "Out-of-core adaptive iso-surface extraction from binary volume data, R. Uribe Lobello et al. / Graphical Models 76 (2014), PP. 593–608: 3.2.3.1. Linear connectivity generation (P. 598)"; this will also allow you to sort cells in any order, or not sort them at all and using a hash-table for cell searching.

Advantages:

- super simple (10x less code that the standard, recursive top-down impl);

- the resulting meshes are more vcache-friendly.

Disadvantages:

- somewhat slower (but OK for chunks of 32^3 cells);

- may create isolated, unconnected vertices.

I've been using this algo in my DC/DMC engine since may 2016. It's implemented on the CPU, but it can be easilty ported to GPU.

Here's the reference code for ADC:


/// Signed linear octrees are well-suited for out-of-core isosurface extraction.
struct LinearOctree : LinearOctreeBase
{
	struct Cell
	{
		Morton32	morton;	//!< the (unique) locational code of this node

		// from LSB to MSB:
		// X8,Y8,Z7 - quantized position of the representative vertex (relative to the cell's AABB)
		// S8 - signs at corners (1 = inside, 0 = outside), cannot be zero or 0xFF
		// F1 - is this a sharp-feature vertex?

		U32			data;	//8,8,7, 1, 8 - XYZ, sharp?, signs

	public:
		void PackData( const V3f& xyz01, const U8 signs, const bool sharp )
		{
			//mxASSERT(vxNONTRIVIAL_CELL(signs));
			this->data =
				(TQuantize<8>::EncodeUNorm( xyz01.x ) << 0 )|
				(TQuantize<8>::EncodeUNorm( xyz01.y ) << 8 )|
				(TQuantize<7>::EncodeUNorm( xyz01.z ) << 16)|
				(signs << 23)|
				(sharp << 31);
		}
		const V3f UnpackPosition() const
		{
			const U32 V = this->data;
			return V3f(
				TQuantize<8>::DecodeUNorm( V & 0xFF ),
				TQuantize<8>::DecodeUNorm( (V >> 8) & 0xFF ),
				TQuantize<7>::DecodeUNorm( (V >> 16) & 0x7F )
			);
		}
		const UINT UnpackSigns() const
		{
			return (this->data >> 23) & 0xFF;
		}
		bool HasSharpFeature() const
		{
			return (this->data >> 31) & 0x1;
		}
		bool operator == ( const Cell& other ) const {
			return this->morton == other.morton;
		}
		bool operator != ( const Cell& other ) const {
			return this->morton != other.morton;
		}
		bool operator < ( const Cell& other ) const
		{
			// smaller(=deeper) nodes must be placed first in the array
			return this->morton > other.morton;
		}
	};
public:
	DynamicArray< Cell >	cells;	//!< both keys and data are in the same array for simplicity

public:
	LinearOctree( AHeap & heap );

public:
	/// Builds a linear octree directly from the isosurface.
	ERet FromSignedDistanceField(
		const SDF::Isosurface* _volume,
		const BuildOptions& _options,
		const AABB& _worldBounds,	//!< octree bounds
		const UINT _resolution		//!< octree resolution
	);

	ERet FromHermiteData(
		const AVolumeSampler& _volume,
		const BuildOptions& _options,
		const V3f& _localBoundsSize,//!< size of octree AABB
		const UINT _resolution	//!< octree resolution
	);

	ERet FromSignedOctree(
		const SignedOctree& source,
		const AABB& octreeBounds,
		OctreeStats &stats
	);

	void Empty();
};

ERet LinearOctree::Contour_ReferenceImpl(
	TriMesh & _mesh,
	const ContourOptions& _options
)
{
	// The smallest leaves (or the deepest) are placed at the beginning:
	std::sort( cells.begin(), cells.end() );

	// create vertices
	{
		// Scene bounds for de-quantizing vertex positions.
		const OctreeBoundsF rootBounds = {
			V3f::Zero(),
			_options.rootSize
		};
		mxDO(_mesh.vertices.SetNum(cells.Num()));

		// calculate world-space _mesh bounds
		AABB meshBounds;
		AABB_Clear( &meshBounds );

		for( UINT iCell = 0; iCell < cells.Num(); iCell++ )
		{
			const Cell& cell = cells[ iCell ];
			//mxASSERT(vxNONTRIVIAL_CELL(cell.UnpackSigns()));

			const U32 cellDepth = Morton32_GetCellDepth( cell.morton );
			mxASSERT( cellDepth < MORTON32_MAX_OCTREE_LEVELS );

			const AABB cellAABB = Calculate_Bounds_From_Morton_code( cell.morton, rootBounds ).ToAABB();
			const V3f vertexPosition = AABB_GetOriginalPosition( cellAABB, cell.UnpackPosition() );
			_mesh.vertices[ iCell ].xyz = vertexPosition;

			AABB_AddPoint( &meshBounds, vertexPosition );
		}

		_mesh.localAabb = meshBounds;
	}

	{
		// node offsets for each edge of the cube (I'm using a right-handed Z-up coordinate system)
		static const Int3 adjacentCellOffsets[12][3] =
		{
			Int3( 0, -1,  0 ), Int3( 0,  -1, -1 ), Int3( 0,  0, -1 ),	// 0 (X axis)
			Int3( 0,  0, -1 ), Int3( 0,   1, -1 ), Int3( 0,  1,  0 ),	// 1 (X axis)
			Int3( 0,  1,  0 ), Int3( 0,   1,  1 ), Int3( 0,  0,  1 ),	// 2 (X axis)
			Int3( 0,  0,  1 ), Int3( 0,  -1,  1 ), Int3( 0, -1,  0 ),	// 3 (X axis)

			Int3( 0,  0, -1 ), Int3( -1,  0, -1 ), Int3( -1, 0,  0 ),	// 4 (Y axis)
			Int3(-1,  0,  0 ), Int3( -1,  0,  1 ), Int3(  0, 0,  1 ),	// 5 (Y axis)
			Int3( 0,  0,  1 ), Int3(  1,  0,  1 ), Int3(  1, 0,  0 ),	// 6 (Y axis)
			Int3( 1,  0,  0 ), Int3(  1,  0, -1 ), Int3(  0, 0, -1 ),	// 7 (Y axis)

			Int3(-1,  0,  0 ), Int3( -1, -1,  0 ), Int3(  0,-1,  0 ),	// 8 (Z axis)
			Int3( 0, -1,  0 ), Int3(  1, -1,  0 ), Int3(  1, 0,  0 ),	// 9 (Z axis)
			Int3( 1,  0,  0 ), Int3(  1,  1,  0 ), Int3(  0, 1,  0 ),	// 10 (Z axis)
			Int3( 0,  1,  0 ), Int3( -1,  1,  0 ), Int3( -1, 0,  0 ),	// 11 (Z axis)
		};

		for( UINT iCurrentCell = 0; iCurrentCell < cells.Num() - 1; iCurrentCell++ )
		{
			const Cell& currentCell = cells[ iCurrentCell ];

			const U32 cellDepth = Morton32_GetCellDepth( currentCell.morton );	// get position of the depth bit
			const U32 cellDepthBitMask = (1U << (cellDepth * 3));	// the index of the depth bit
			const U32 getBitsBelowDepthBit = cellDepthBitMask - 1;	// mask to extract bits below the depth bit
			const Morton32 cellCodeWithoutDepthBit = currentCell.morton & getBitsBelowDepthBit;
			const U32 getBitsAboveDepthBit = ~getBitsBelowDepthBit;	// for checking overflow

			const Int3 cellCoords = Morton32_Decode( cellCodeWithoutDepthBit );

			const UINT cornerSigns = currentCell.UnpackSigns();
			//mxASSERT(vxNONTRIVIAL_CELL(cornerSigns));

			U32 edgeMask = CUBE_EDGES_LUT.edgeMask[ cornerSigns ];
			while( edgeMask )
			{
				U32	iCurrentEdge;
				edgeMask = ExtractNextEdge( edgeMask, &iCurrentEdge );

				U32	adjacentCells[3];	//!< indices of the other three cells sharing the edge
				UINT numAdjacentCells = 0;

				UINT octantMask = (1U << topMostOctant);

				for( UINT iAdjacentCell = 0; iAdjacentCell < 3; iAdjacentCell++ )
				{
					mxOPTIMIZE("use dilated integers");
					const Int3 edgeNeighborPos = cellCoords + adjacentCellOffsets[iCurrentEdge][iAdjacentCell];

					// no need to check for underflow, because negative indices have the sign bit set and will fail the overflow check
					// check for underflow
//					if( (edgeNeighborPos.x >= 0) && (edgeNeighborPos.y >= 0) && (edgeNeighborPos.z >= 0) )
					{
						const Morton32 neighborCode = Morton32_Encode_NoOverflowCheck( (U32)edgeNeighborPos.x, (U32)edgeNeighborPos.y, (U32)edgeNeighborPos.z );
						// check for overflow:
						// after addition, Morton codes can 'wrap' around, connecting cells across the octree boundary and causing stretched polys.
						if( neighborCode & getBitsAboveDepthBit ) {
							// the neighbor cannot lie deeper in the octree than this cell
							continue;
						}
						// add the depth bit
						const Morton32 neighborCodeWithDepthBit = neighborCode | cellDepthBitMask;
						const U32 neighborIndex = FindLeafIndex( neighborCodeWithDepthBit, iCurrentCell+1/*skip this cell and all smaller*/ );
						if( neighborIndex != CELL_NOT_FOUND )
						{
							adjacentCells[ numAdjacentCells++ ] = neighborIndex;  // found an adjacent leaf
						}
						else {
							continue;	// neighbor not found - no quads should be formed
						}
					}
				}//For each edge neighbor.

//NOTE: may contain a duplicate cell (which will correspond to the largest neighbor)
if( numAdjacentCells == 3 )
{
					// determine orientation of the quad from the signs of the edge's endpoints
					// signs must be different => only one point should be examined
					const UINT iPointA = CUBE_EDGE[ iCurrentEdge ][0];
					//const UINT iPointB = CUBE_EDGE[ currentEdge ][1];
					// signs can only be 0 or 1
					const UINT signA = (cornerSigns >> iPointA) & 1;
					//const UINT signB = (cornerSigns >> iPointB) & 1;

					//if( signA > signB ) {
					if( signA ) {
						// The first corner is inside the surface, the second is outside
						detail::AddQuad( iCurrentCell, adjacentCells[0], adjacentCells[1], adjacentCells[2], _mesh );
					} else {
						// The second corner is inside the surface, the first is outside
						detail::AddQuad( adjacentCells[2], adjacentCells[1], adjacentCells[0], iCurrentCell, _mesh );
					}
				}//If the edge is shared by 4 cells.
			}//For each active edge of the current cell.

		}//For each cell.
	}//Create Quads

	return ALL_OK;
}

static int FindCellIndex(
								 const U32 _numLeaves, const LinearOctree::Cell* _leaves,
								 const U32 _startIndex,
								 const Morton32 _code
									)
{
	int lo = _startIndex, hi = _numLeaves - 1;
	while( lo <= hi )
	{
		int mid = lo + (hi - lo) / 2;
		if( _code == _leaves[mid].morton ) {
            return mid;
		}
        else if( _code < _leaves[mid].morton )
            lo = mid + 1;
		else {
			hi = mid - 1;
		}
	}
	return -1;
}

/// searches for the deepest (i.e. the smallest) leaf containing the given point
U32 LinearOctree::FindLeafIndex( Morton32 _code, U32 _startIndex ) const
{
	mxASSERT(Morton32_GetCellDepth(_code) > 0);

	while( _code > 3 )
	{
		const int iCell =
			FindCellIndex
			(
			this->cells.Num(), this->cells.Raw(),
			_startIndex, _code
			);
		if( iCell != -1 ) {
			return iCell;
		}
		_code = _code >> 3;	// search one level higher
	}

	return CELL_NOT_FOUND;
}

Cube/Edge relations:



/*
	   Z
	   |  /Y
	   | /
	   |/_____X
	 (0,0,0)

Face orientation (normal) - apply right-hand rule to the edge loop.

         UP  NORTH
         |  /
         | /
   WEST--O--EAST
        /|
       / |
  SOUTH  DOWN

LABELING OF VERTICES, EDGES AND FACES:

Vertex enumeration:
        6___________7
       /|           /
      / |          /|
     /  |         / |
    4------------5  |
    |   2________|__3
    |   /        |  /
    |  /         | /
    | /          |/
    0/___________1

or using locational (Morton) codes (X - lowest bit, Y - middle, Z - highest):

       110__________111
       /|           /
      / |          /|
     /  |         / |
  100-----------101 |
    |  010_______|__011
    |   /        |  /
    |  /         | /
    | /          |/
  000/___________001

(see Gray code, Hamming distance, De Bruijn sequence)

NOTE: two vertices are connected by an edge, if their indices differ by one and only one bit.

Cube edge enumeration
(edges are split into 3 groups by axes X,Y,Z - this allows: edge_axis = edge_index / 4;
(numbered using right-hand rule):
        ______2______
       /|           /|
      5 |11        6 |
     /  |         /  |10
    |------3-----|   |
    |   |_____1__|___|
    |   /        |   /
   8|  4        9|  7
    | /          | /
    |/___________|/
           0
*/


static const U32 CUBE_EDGES_AXIS_X = (BIT(0) | BIT(1) | BIT(2) | BIT(3));
static const U32 CUBE_EDGES_AXIS_Y = (BIT(4) | BIT(5) | BIT(6) | BIT(7));
static const U32 CUBE_EDGES_AXIS_Z = (BIT(8) | BIT(9) | BIT(10) | BIT(11));

/// maps the given cube edge to the diagonally opposite one, e.g. 0 -> 2, 3 -> 1, 10 -> 8, etc.
static mxFORCEINLINE UINT DiagonallyOppositeEdge( UINT iEdge )
{
	return ((iEdge + 2) % 4) + (iEdge & ~3);
}

/// Returns the index of the edge on the given edge-adjacent cube 'iAdjCube',
/// where iAdjCube is the index [0..3] of the cube around the edge, listed in circular order.
/// E.g. for a cube edge 2 the returned values will be 2, 3, 0, 1;
/// for a cube edge 8 the returned values will be 8, 9, 10, 11;
/// for a cube edge 6 the returned values will be 6, 5, 4, 7.
static mxFORCEINLINE UINT EdgeOnNextAdjacentCube( UINT iEdge, UINT iAdjCube )	// iAdjCube <E [0..3]
{
	const UINT iEdgeOnNeighbor = (iEdge + iAdjCube) % 4 + (iEdge & ~3);
	return iEdgeOnNeighbor;
}

mxFORCEINLINE U32 ExtractNextEdge( U32 edgeMask, U32 *edgeIndex )
{
	DWORD iEdge;
	_BitScanReverse( &iEdge, edgeMask );
	edgeMask &= ~(1u << iEdge);	// remove the edge bit from mask
	*edgeIndex = iEdge;
	return edgeMask;
}

/// Edge-table gives all the edges intersected in a given cube configuration:
struct ActiveEdges
{
	U16	edgeMask[256];	//!< set bits correspond to active edges
};

alignas(128) extern ActiveEdges	CUBE_EDGES_LUT;

// precomputes a set of intersecting edges for each cube/sign configuration
void GenerateActiveCubeEdgesTable( ActiveEdges & LUT );

Anfaengar, how are you placing your vertices on the dual grid? The papers usually mention some least squares approximation, but never show code for that part. A lot of code I've seen seems to use ad-hoc methods. I'd be quite interested to hear your approach.

Anfaengar, how are you placing your vertices on the dual grid? The papers usually mention some least squares approximation, but never show code for that part. A lot of code I've seen seems to use ad-hoc methods. I'd be quite interested to hear your approach.

I use the standard approach:
find the vertex which position minimizes the squared distance to the tangent planes,
i.e. planes built from intersection points (of the cell's edges with the surface) and the surface normals at those points.
If the minimizer lies outside the cell, I clamp it to the cell's bounds or place it into the mass point (average) of all intersection points
(which causes unsightly dimples and notches in the reconstructed surface, e.g. see MergeSharp, SHREC papers).
DC/DMC are far from perfect - a sharp cone-like feature gets clamped if it passes through a face, but doesn't intersect any of the cell's edges.
You can try to reconstruct a rotated cube, with one of its corners pointing up, and most likely that corner will get 'chamfered'.
In my framework, I use different 'QEF' solvers (using a common 'virtual' interface):
1) QEF_Solver_Bloxel - simply places the vertex in the cell's center (for blocky, Minecraft-style / Bloxel / Boxel / Cuberille worlds).
2) QEF_Solver_Simple - simply places the vertex into the mass point (i.e. averages all intersections - smooth, Surface-Nets style).
3) QEF_Solver_ParticleBased - uses Leonardo Augusto Schmitz's easy-to-understand method with exact normals at intersection points to reduce complexity,
can reconstruct smooth surfaces with sharpish features: http://gamedev.stackexchange.com/a/83757, "Efficient and High Quality Contouring of Isosurfaces on Uniform Grids, 2009, Particle-based minimizer function". I think, some google summer-of-code Java impl used this method with ADC.
4) QEF_Solver_Direct_AtA - tries to solve the Least Squares problem using normal equations: A^T*A*x = A^T*b, simple and fast, but unstable. For an example you can see that swedish guy's impl on github, Emil (forgot his name), his blog is called "I love tits".
5) QEF_Solver_QR - Least Squares Solver from original authors (Scott Schaefer et al., 2011).
6) QEF_Solver_SVD_Eigen - uses the (excellent) Eigen math lib to solve LLS, e.g. as here: https://github.com/mkeeter/ao/blob/master/kernel/src/render/octree.cpp
7) QEF_Solver_SVD2 - based on the code by Nick Guildea: https://github.com/nickgildea/qef

Anfaenger

It would be cool if QEF_SOLVER_SVD2 have at least few comments to understand what is going on in code.

I assume that there is no good enought commented code for QEF in whole Internet

Anfaenger

It would be cool if QEF_SOLVER_SVD2 have at least few comments to understand what is going on in code.

I assume that there is no good enought commented code for QEF in whole Internet

IIRC, Ronen Tzur's (Uniform) Dual Contouring sample contains some well-commented numerical code.
For the underlying theory, I can recommend the excellent book "Matrix Analysis & Applied Linear Algebra" [2000] by Carl D. Meyer,
apart from classics such as "Matrix computations" and "Numerical Recipes in C".
(I'm in no way a math geek, but the first book explains the theory very well.)

Thanks for the link!

This topic is closed to new replies.

Advertisement