Jump to content

  • Log In with Google      Sign In   
  • Create Account

On the path with a ramblin' man

AWOL Update

Posted by , 28 June 2009 - - - - - - · 316 views

afternoon all!
Writing this from my Nokia 5800 mobile internet so apologies in advance for any typos or bad formatting.
Mobile internet has certainly improved since I worked in the field (`06-`07) but i'm not yet convinced its quite there yet. I still get the occasional urge to throw said mobile out the window [smile]
Anyway, my PC is still broke and I don't really know why. I've tried a few combinations of OS's and HDD's and it still isn't happy. In particular the performance is very poor, VERY poor. I'm wondering if the mobo is at fault, maybe some sort of heat/power issue?? I am overclocking a Q6600 quite a long way after all...
Consequently I doubt i'll be active online for a while, factoring in a day job, social life and replacing hardware/rebuilding a PC [sad]
It's particularly irritating as i've just been on "holiday" for a week walking the 2nd section of the Pennine Way-90 miles in 6 days, 270 miles for the full route. There were so many occasions in the last week where I was miles from civilization and the only noise was wildlife and the wind. Reminded me of how much I enjoy getting away from big cities and technology every now and then!! The mind naturally wanders whilst walking and i thought up various crazy ideas and neat solutions to programming puzzles... To not have a PC to play with upon my return may see these forgotten...

once I've got a working PC i'll put my photos up for anyone interested in the more remote and picturesque parts of England look like.


Windows 7

Posted by , 19 June 2009 - - - - - - · 342 views

Well, I sort of have a PC working again.

Saturday last week my machine was working absolutely fine. Sunday morning and the boot disk apparently failed and the machine wouldn't even boot. Great.

I'd previously tried dual-booting Windows 7 on this machine (it's Vista x64 normally) and it wasn't happy at all. Then a week or two later the whole machine broke, which says to me that either Win7 is very good at spotting an error early or that it itself caused the error.

So having lost everything (maybe, I still need to try disk recovery tricks) I try installing Windows 7 to just try and get any OS up and running - just happened to have the 7 disk closer to hand.

Anyway, that simply wouldn't go anywhere - the installer randomly crashes, or it won't allow me to pick a disk...

Gave up and stuck a Vista x64 disk in... worked first time.

In short, I now have zero confidence/trust in Win7 and I will most likely stick with Vista for the foreseeable future. Kinda seems backwards to the general consensus, but I never really had any objections to Vista so I don't much care [grin]


P.S. I'm off on holiday tomorrow. I'll rebuild my machine in July and get back to some D3D11/SlimDX dev work then...


SlimDX11 and stuff

Posted by , 08 June 2009 - - - - - - · 440 views

Evening all,

Been quite busy lately hence the recent drought of journal updates. Not entirely sure where the time has been going, but I hope to get back on the case soon! Although, what with the RMT paralysing London for the next 3-4 days I doubt this week will be any more than a total write off (which I think is totally unreasonable on their part, but hey-ho I just have to deal with it...)

Anyway, Mike Popoloski recently asked me to have a look at the SlimDX API with their adaptation of the new Direct3D 11 API. I've been wanting to make time to check out their work for a while and this seems like a suitable opportunity.

Grabbed the latest version of TortoiseSVN and pointed it to their repository, built SlimDX.sln in VS'08 with the latest DXSDK successfully and I think I'm good to go.

In fact, the download speed of TortoiseSVN from SourceForge was the slowest part! I'm quite impressed that I could just pull down all the code and hit "build solution" and get only 27 seemingly unimportant warnings, no errors and a shiny new SlimDX.dll waiting to be used [cool]

Getting a little late to do any more now, but I intend to implement the Curved Point Normal Triangles (aka ATI TruForm from the D3D8 era) sometime this week using their API.


Windows 7

Posted by , 25 May 2009 - - - - - - · 314 views

hmmm.

Cleared out a spare partition and installed the Windows 7 Release Candidate. Took 90 minutes before I saw the desktop which wasn't impressive given Vista takes all of 30 minutes to do the same.

But that's nothing compared to the fact that it's almost totally unresponsive once it gets to the desktop. Even trying to get task manager and My Computer loaded failed!

I hope for its sake that it was just busy doing some 1-time initialization or whatever. A fast quad core with plenty of RAM really shouldn't be struggling with viewing the start menu [lol]


Another stab at Manged DirectX?!

Posted by , 20 May 2009 - - - - - - · 455 views

Was just reading the private MVP newsgroups and ZMan posted a link to a Win7 developer blog: Windows 7 Managed Code APIs.

I must admit I've not looked into it in much detail, but Andy flagged up the "Support for Direct3D 11.0 and DXGI 1.0/1.1 APIs" comment near the top. I get the impression its not "MDX 3.0" or any sort of official successor to the now dead MDX API, but from a functional standpoint it sounds like it might fit in the same space...

Figured you guys might well find that interesting [smile]


More Compute Shader Goodness

Posted by , 19 May 2009 - - - - - - · 1,482 views

Evening all,

Was ill most of last week so haven't really done all that much.

  • Downloading Windows 7 RC x64, probably dual boot that on my Vista 64 machine over the weekend if I'm not too hungover
  • Also downloading Visual Studio 2010 Beta 1. No particular reason, but figured I might as well pair it up with Win7 [smile]

    What follows is part of my ongoing article (ongoing...and going...and going...and going... 40+ A4 pages and counting). Took me f'in ages to upload these images (I <3 GDNet [razz]) so you'd better appreciate it [wink]




    Pre-processing the height map

    Despite being a pre-processing step in this context the approach that will be taken is very similar to the idea of post-processing which has been common in real-time graphics for several years.

    The input texture will be divided up into kernels; each kernel area will have its pixels read in and four values generated from the raw data which can then be stored in a 2D output texture. This 2D output texture will then be indexed by the Hull Shader to enable it to have the previously described context when making LOD computations.
    The key design decision is how to map the 16x16 height map samples down to a single per-patch output value.

    It is relatively straight forward to compute a variety of statistics from the source data, but really all that the Hull Shader cares about is having a measure of how much detail the patch requires. Is this piece of terrain flat? If yes, generate less detail. Alternatively, is this piece of terrain very bumpy and noisy? If yes, generate more detail.
    A good objective for this pre-pass is to find a statistical measure of coplanarity – to what extent do the 256 samples lie on the same plane in 3D space?

    Consider the following two diagrams:



    The left hand side diagram shows a relatively uniform slope, possibly the side of a hill or valley for example. However the right hand side diagram is much more erratic and noisy that originates from a more complex section of terrain. Ideally the Hull Shader would give the right-hand side example a much higher level of detail as it, quite simply, requires more triangles to represent it.



    The above diagram is the same two examples but with a plane inserted into the dataset. Whilst Direct3D isn’t capable of rendering quads natively, the plane is the best possible surface if there were no tessellation involved and only a single primitive used to represent this piece of landscape. Notice that the plane in the left-hand diagram is a much closer match to the surface than in the right-hand diagram.



    This next diagram shows a side-on view of a terrain segment with plane and lines indicating how far each sample is from the plane. It is from this basis that we can measure coplanarity – the shorter the lines between the samples and the plane the more coplanar the data is.
    Picking the plane to base these calculations off requires a ‘best fit’ approach as it needs to be representative of the overall shape of the patch yet it is unlikely that any plane generated will be a perfect match to the real data.



    The above diagram demonstrates one computationally efficient method of getting an acceptable ‘best fit’ plane. On the left is the original patch geometry introduced earlier and on the right is the same geometry but with only the four corners joined together. Whilst this simplified primitive appears coplanar in this case there is no guarantee that this will always be the case.

    For each of the four corners the two adjacent neighbours are also known, and from here it is trivial to generate the pairs of vectors denoted in red. The cross-product of each pair of vectors results in a normal vector for that corner, denoted in blue. Combining and normalizing these four raw normal vectors will result in a single unit length normal vector for the patch, one that is generally representative of the underlying surface. By taking any of the four corner positions it is possible to derive a standard plane equation:

    Ax + By + Cz + D = 0
    A = Nx
    B = Ny
    C = Nz
    D = -(N•P)

    Where N is the normal vector, P is a corner point
    With this plane equation known, the compute shader can evaluate each height map sample for the distance between it and the plane.

    Implementing with a Compute Shader



    Notation and indexes in the compute shader are not immediately obvious; the above diagram introduces two of the key variables in the context of a terrain rendering pre-pass.
    The core HLSL shader has an entry point with the [numthreads(x,y,z)] attribute attached to it:

    [numthreads(16,16,1)]
    void csMain()
    {
    /* shader body here */
    }

    This attribute defines a group, aka a kernel, and in the above context it is defining a 16x16 array of threads per group. The body of the csMain method is for a single thread, but via system generated values it is able to identify which of these 256 (16x16) threads it actually is. With the ability to know which thread it is the code can be written to ensure each thread reads from and writes to the correct location.

    In the preceding diagram the Dispatch(x, y, z) call is also introduced. This is made by the application and is essentially a draw call as it begins execution of the compute shader. At this level the parameters indicate how many groups of 16x16 threads to create. For this particular algorithm the application simply divides the input height map texture dimensions by 16 and uses this as the number of kernels.

    For a 1024x1024 height map there will be 64x64 kernels, each kernel being 16x16 threads. Conceptually this would imply a very large number of threads, one per pixel in this case, but it is up to the implementation quite how these tasks are scheduled on the GPU and how many actually execute concurrently.
    A key detail omitted till now is how an invocation is able to identify itself relative to its group as well as the entire dispatch call. Direct3D defines four system generated values for this purpose:

    1. SV_GroupID
      This uint3 returns indexes into the parameters provided by ID3D11DeviceContext::Dispatch(). It allows this invocation to know which group it is relative to all others being executed. In particular, this value is useful for determining the output location in a many:one relationship. In this algorithm it is the index into the output texture where the results for the whole group are written.

    2. SV_GroupThreadID
      This uint3 returns indexes local to the current group – the parameters provided at compile-time as part of the [numthreads()] attribute. In this algorithm it is used to know which threads represent corner pixels for the current 16x16 area.

    3. SV_DispatchThreadID
      This uint3 is a combination of the previous two. Whereas they index relative to only one set of input parameters (::Dispatch() or [numthreads()]) this is a global index, essentially the two axis multiplied together. For a 64x64 dispatch of 16x16 threads this system value will vary between 0 and 1023 in both axis (64*16=1024) thus for this algorithm it provides the thread with the address of the source pixel to read from.

    4. SV_GroupIndex
      This uint gives the flattened index into the current group. For a 16x16 area this value will be between 0 and 255 and for the purpose of this algorithm it is essentially the thread ID, used only to coordinate work across the group.


    The final piece in the puzzle is the ability for threads to communicate with each other. This is done via a 4kb chunk of shared memory and synchronization intrinsics. Variables defined at the global scope with the ‘groupshared’ prefix can be both read from and written to by all threads in the current group:

    groupshared float groupResults[16 * 16];
    groupshared float4 plane;
    groupshared float3 rawNormals[2][2];
    groupshared float3 corners[2][2];

    Synchronization is done via a choice of six barrier functions. The code can be authored with either a *MemoryBarrier() or *MemoryBarrierWithGroupSync() call – the former blocks until memory operations have finished, but progress can continue before remaining ALU instructions complete; the latter blocks until all threads in the group have reached the specified point – both memory and arithmetic instructions must be complete. The barrier can either be ‘All’, ‘Device’ or ‘Group’ – with decreasing scope at each level. Thus an AllMemoryBarrierWithGroupSync() is the heaviest intrinsic to employ whereas a GroupMemoryBarrier() is more lightweight. In this algorithm only GroupMemoryBarrierWithGroupSync() is used.



    The first phase of the algorithm utilizes four threads, one for each corner of the 16x16 pixel group. Each of the four threads reads in a single sample and stores the height in groupResults[] and then a 3D position in corners[][]. All other threads are idle at this point. The code for this is as follows:

    if(
    ((GTid.x == 0) && (GTid.y == 0))
    ||
    ((GTid.x == 15) && (GTid.y == 0))
    ||
    ((GTid.x == 0) && (GTid.y == 15))
    ||
    ((GTid.x == 15) && (GTid.y == 15))
    )
    {
    // This is a corner thread, so we want it to load
    // its value first
    groupResults[GI] = texHeightMap.Load( uint3( DTid.xy, 0 ) ).r;

    corners[GTid.x / 15][GTid.y / 15] = float3(GTid.x / 15, groupResults[GI], GTid.y / 15);

    // The above will unfairly bias based on the height ranges
    corners[GTid.x / 15][GTid.y / 15].x /= 64.0f;
    corners[GTid.x / 15][GTid.y / 15].z /= 64.0f;
    }

    // Block until all threads have finished reading
    GroupMemoryBarrierWithGroupSync();




    The next phase sees the same four threads continuing to process the corner points. In this instance they need to know about their neighbouring corners so that they can generate the cross-product and hence a normal vector for each corner – entirely ALU work. Concurrently the other 252 threads can be reading in the remaining height map samples.

    if((GTid.x ==  0) && (GTid.y ==  0))
    {
    rawNormals[0][0] = normalize(cross
    (
    corners[0][1] - corners[0][0],
    corners[1][0] - corners[0][0]
    ));
    }
    else if((GTid.x == 15) && (GTid.y == 0))
    {
    rawNormals[1][0] = normalize(cross
    (
    corners[0][0] - corners[1][0],
    corners[1][1] - corners[1][0]
    ));
    }
    else if((GTid.x == 0) && (GTid.y == 15))
    {
    rawNormals[0][1] = normalize(cross
    (
    corners[1][1] - corners[0][1],
    corners[0][0] - corners[0][1]
    ));
    }
    else if((GTid.x == 15) && (GTid.y == 15))
    {
    rawNormals[1][1] = normalize(cross
    (
    corners[1][0] - corners[1][1],
    corners[0][1] - corners[1][1]
    ));
    }
    else
    {
    // This is just one of the other threads, so let it
    // load in its sample into shared memory
    groupResults[GI] = texHeightMap.Load( uint3( DTid.xy, 0 ) ).r;
    }

    // Block until all the data is ready
    GroupMemoryBarrierWithGroupSync();




    Phase four is where the next big chunk of work takes place, but prior to this the group must have a plane from which to measure offsets. This only requires a single thread and simply implements the plane-from-point-and-normal equations as shown below:

    if(GI == 0)
    {
    // Let the first thread only determine the plane coefficients

    // First, decide on the average normal vector
    float3 n = normalize
    (
    rawNormals[0][0]
    + rawNormals[0][1]
    + rawNormals[1][0]
    + rawNormals[1][1]
    );

    // Second, decide the lowest point on which to base it
    float3 p = float3(0.0f,1e9f,0.0f);
    for(int i = 0; i < 2; ++i)
    for(int j = 0; j < 2; ++j)
    if(corners[i][j].y < p.y)
    p = corners[i][j];

    // Third, derive the plane from point+normal
    plane = CreatePlaneFromPointAndNormal(n,p);
    }

    GroupMemoryBarrierWithGroupSync();




    With a plane available it is necessary to process each of the raw heights as originally loaded from the height map. Each thread takes a single height and computes the distance between this sample and the plane previously computed and replaces the original raw height value.

    // All threads now translate the raw height into the distance
    // from the base plane.
    groupResults[GI] = ComputeDistanceFromPlane(plane, float3((float)GTid.x / 15.0f, groupResults[GI], (float)GTid.y / 15.0f));

    GroupMemoryBarrierWithGroupSync();




    The final phase of the algorithm takes all of the height values and computes the standard deviation from the surface of the plane. This single value is a good metric of how coplanar the 256 individual height samples are – lower values imply a flatter surface and higher values a noisier and varying patch. This single value and the plane’s normal vector is written out as a float4 in the output texture – 256 height map samples reduced down to four numbers.

    if(GI == 0)
    {
    float stddev = 0.0f;

    for(int i = 0; i < 16*16; ++i)
    stddev += pow(groupResults[i],2);

    stddev /= ((16.0f * 16.0f) - 1.0f);

    stddev = sqrt(stddev);

    // Write the normal vector and standard deviation
    // to the output buffer for use by the Domain and Hull Shaders
    bufferResults[uint2(Gid.x, Gid.y)] = float4(plane.xyz, stddev);
    }


    Two utility functions were referenced in the above fragments, for completeness they are as follows:

    float4 CreatePlaneFromPointAndNormal(float3 n, float3 p)
    {
    return float4(n,-dot(n,p));
    }

    float ComputeDistanceFromPlane(float4 plane, float3 position)
    {
    return dot(plane.xyz,position) - plane.w;
    }


    Integrating the Compute Shader

    The previous section details the actual Compute Shader implementing the algorithm, but it is still necessary for the application to coordinate this work.
    Firstly the output texture needs to be created. This will be bound as an output to the Compute Shader but later used as an input into the Hull Shader. The underlying type is a regular 2D texture with an important detail of having a D3D11_BIND_UNORDERED_ACCESS as one of its bind flags:

    D3D11_TEXTURE2D_DESC outputDesc;

    ZeroMemory( &outputDesc, sizeof( D3D11_TEXTURE2D_DESC ) );

    outputDesc.ArraySize = 1;
    outputDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
    outputDesc.Usage = D3D11_USAGE_DEFAULT;
    outputDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
    outputDesc.Width = TERRAIN_WIDTH;
    outputDesc.Height = TERRAIN_LENGTH;
    outputDesc.MipLevels = 1;
    outputDesc.SampleDesc.Count = 1;
    outputDesc.SampleDesc.Quality = 0;

    if( FAILED( hr = g_pd3dDevice->CreateTexture2D( &outputDesc, NULL, &g_pPrePassResults ) ) )
    {
    LOG( L"Failed to create 2D pre-pass results texture!" );
    return hr;
    }

    // Create a SRV on to the output buffer so the HS can read it
    if(FAILED( hr = g_pd3dDevice->CreateShaderResourceView( reinterpret_cast<ID3D11Resource*>(g_pPrePassResults), NULL, &g_pPrePassResultsView ) ) )
    {
    LOG( L"Failed to create a SRV for the pre-pass results texture!" );
    return hr;
    }
    Next an unordered access view needs to be created so that the Compute Shader can read and write to the texture that has just been created:
    ID3D11UnorderedAccessView* pUAV = NULL;
    D3D11_UNORDERED_ACCESS_VIEW_DESC outputUAV;

    outputUAV.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
    outputUAV.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D;
    outputUAV.Texture2D.MipSlice = 0;

    if( FAILED( hr = g_pd3dDevice->CreateUnorderedAccessView( g_pPrePassResults, &outputUAV, &pUAV ) ) )
    {
    LOG( L"Failed to create unordered access view for CS output!" );

    SAFE_RELEASE( pUAV );

    return hr;
    }


    At this point the necessary resources have been created so they simply need to be bound to the pipeline and the Compute Shader initiated:

    g_pContext->CSSetShaderResources( 0, 1, &g_pHeightMapView );

    ID3D11UnorderedAccessView* outputView[ 1 ] = { pUAV };
    g_pContext->CSSetUnorderedAccessViews( 0, 1, outputView, (UINT*)(&outputView) );

    g_pContext->CSSetShader( g_pPrePassComputeShader, NULL, 0 );

    g_pContext->Dispatch( xCount, yCount, 1 );

    SAFE_RELEASE( pUAV );
    ID3D11ShaderResourceView* nullEntry = NULL;
    g_pContext->CSSetShaderResources( 0, 1, &nullEntry );
    ID3D11UnorderedAccessView* nullView[ 1 ] = { NULL };
    g_pContext->CSSetUnorderedAccessViews( 0, 1, nullView, (UINT*)(&nullView) );



    The code after the Dispatch() call is particularly important. Without this being executed the UAV will still be bound to the pipeline referencing the 2D output texture; Direct3D will then stop it being bound as an input to the Hull Shader as it is illegal to have a resource set as both an input and output at the same time!

    At this point the work is done and the texture can be used by the Hull Shader. However, there is one additional piece of work that can greatly improve the quality of results – normalizing the standard deviations. Currently the values stored in the texture are raw as-is deviation values from the per-patch plane. The range of these values across the entire dataset can be very small, often between 0.0 and 0.4, which has the later effect of ensuring a close proximity between the flattest and the bumpiest terrain segments. By post-processing the Compute Shader results the values can be stretched out to the full 0.0 to 1.0 range and getting a much better spread of detail when the Hull Shader executes.

    outputDesc.BindFlags = 0;
    outputDesc.Usage = D3D11_USAGE_STAGING;
    outputDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ | D3D11_CPU_ACCESS_WRITE;

    ID3D11Texture2D *pStaging = NULL;
    if( FAILED( hr = g_pd3dDevice->CreateTexture2D( &outputDesc, NULL, &pStaging ) ) )
    {
    LOG( L"Failed to create staging resource to copy CS output data to!" );

    return hr;
    }

    g_pContext->CopyResource( pStaging, g_pPrePassResults );


    The above code creates a CPU accessible staging resource and copies the GPU results to it. This copy of the data can be normalized by the application code using the following construct:

    D3D11_MAPPED_SUBRESOURCE data;
    if( SUCCEEDED( g_pContext->Map( pStaging, 0, D3D11_MAP_READ_WRITE, 0, &data ) ) )
    {
    D3DXVECTOR4 *pResults = (D3DXVECTOR4*)data.pData;

    float minStdDev = 1e9f;
    float maxStdDev = -1e9f;

    for( int i = 0; i < outputDesc.Width * outputDesc.Height; ++i )
    {
    if(pResults[i].w > maxStdDev)
    maxStdDev = pResults[i].w;

    if(pResults[i].w < minStdDev)
    minStdDev = pResults[i].w;
    }

    float scalar = maxStdDev - minStdDev;
    if(scalar <= 1e-5f) // avoid divide-by-zero
    scalar = 1.0f;

    for( int i = 0; i < outputDesc.Width * outputDesc.Height; ++i )
    {
    pResults[i].w = (pResults[i].w - minStdDev) / scalar;
    }

    g_pContext->Unmap( pStaging, 0 );
    }


    Because the above operation has had to be performed on a CPU accessible staging resource it is necessary to copy the staging resource back over the GPU accessible texture. Failure to do this would mean that the GPU would simply use the results directly from the Compute Shader.

    g_pContext->CopyResource( g_pPrePassResults, pStaging );

    Results

    The following images, based on height map data for Puget Sound, Washington State USA ([Georgia Institute, 01]), demonstrate the difference that the new algorithm has:


    Naive Distance Based LOD
    208,068 Triangles Generated (60% rasterized)


    Compute Shader Based LOD
    200,452 Triangles Generated (64% rasterized)


    Whilst the top image may appear more aesthetically pleasing due to the smooth gradients, the more chaotically shaded bottom image is by far the better output from a geometric perspective. In both images the patch detail is translated into a colour – red for high detail, green for mid detail and blue for low detail.

    Consider the two areas marked by the box; this area represents a good example of the benefits of the deviation based heuristic. In the top image note that the majority of tiles being rendered have all been assigned the same LOD (notable by being the same shade of green) yet the region to the left of the box is very flat and the region to the right is very rough. Conversely in the bottom image the flat region to the left is predominantly blue and the rough are to the right is mostly green.

    The bottom image is demonstrating that the Hull Shader is using the pre-pass information to assign detail to patches that warrant it and reducing detail from those that do not require it.


  • Improved LOD algorithm

    Posted by , 16 May 2009 - - - - - - · 264 views

    Been ill most of this week so haven't really been up to much development wise. That said, I still spent most of this afternoon playing around with my Direct3D 11 Terrain Renderer.

    During the week I came across a link for the Puget Sound dataset that I thought I'd make use of as it's definitely better than anything I can invent myself!

    For those who aren't familiar with Puget Sound, it's the bay area around Seattle. Compare the colour map with a google map (depressingly had to resort of using a Google service as none of the others have a 'terrain' mode [headshake]) and you should be able to identify it - look for the triangle of peaks in the lower-left portion of the image as being Mt Ranier (top), Mt St. Helens (bottom-left) and Mt Adams (bottom-right).

    Courtesy of y2kiah's comments in my previous journal entry I implemented a plane-based error metric and pushed ahead with a standard deviation based calculation.

    I now have a more complex Compute Shader pre-pass that generates a plane for each patch and takes the distance of each point to the plane as input into the standard deviation equation rather than just the raw height as I had previously done. Essentially I end up with a measurement of coplanarity (is that a word?) - low values indicate that all the points in the patch are roughly following the same flat surface and high values indicate a much rougher/uneven patch. Perfect!

    I also modified the way that the final LOD is chosen. I use the standard deviation as well as distance as inputs and have taken to simply adding them together with a given bias. Currently I'm using 35% distance and 65% deviation.

    From initial testing the results are pretty much exactly what I want. Flat and distant areas are low detail and rough pieces are high detail subject to their distance to the camera.




    Compute Shader seeded Terrain Tessellation

    Posted by , 09 May 2009 - - - - - - · 623 views

    I've spent most of this afternoon playing around with more complex LOD selection algorithms.

    I tweaked and fixed the last few bugs in the Compute Shader pre-pass I previously discussed and now have it seeding my Hull Shader with additional per-patch data.

    However, I'm not really happy with the results. Today's experiments have mostly demonstrated to me that a "one size fits all" metric is very hard to find - some heightmaps suit different heuristics better than others, and also having a multi-variable LOD scheme is very hard to balance regardless of target data. It's proven far too easy to invalidate one variable in favour of another, or have multiple variables cancel each other out, or have variables working well in different parts of the image (my main problem)...



    The above is the naive approach - simply take the distance from the camera, clamped to a maximum distance.

    Two main problems exist - there is extra detail where its no needed (flat areas are the same/similar shade of green to the hilly areas) and there is no red as the geometry closest to the camera is clipped by the view/projection and near plane clipping.



    The above is a revision on the above in that it implements a near plane as well as a far plane. Notice that you can now see all three graduations of detail - red, green and blue.

    Still, the problem of detail where its not necessary exists.



    The above is a static LOD metric using the standard deviation of height values. The idea here being that 'noisy' patches have a high standard deviation whereas flatter areas will have a very low standard deviation. This should distribute detail to the patches that vary the most and thus deserve the most detail.

    It works pretty well, but there are a few cases where it can be thrown off quite badly - particularly where most of a patch is flat and only the edge is raised. Like the skirting tiles around the islands.



    The above is based on the spread of heights - basically maximum less minimum. This achieves a similar effect to the standard deviation but isn't so easily fooled by skirting tiles at the expense of generating a few patches with more detail than they probably need.



    The above modulates the standard deviation by the distance from camera, which should work well as a hybrid. However the typically very small (max of 0.305 in this image) standard deviation means it either weighs heavily on the distance and gives mostly blue or, if weighted differently, is drowned out by the distance metric.



    Above modulates distance with the spread of heights and seems to prove much more pleasing results with a better distribution of detail. At this time it's my preferred hybrid metric for LOD.




    I've posted another YouTube video of the latter algorithm, and in it (as well as the above images) you can spot some gaps between patches - the twinkling white pixels. This is really not good, and seems to be a discontinuity introduced by my "improved" distance-from-camera equation which is a shame. Something I need to look into tomorrow.

    I want to start capturing some of the amplification ratios and other statistics as part of the display. I've got them writing to the console, but I want those in the videos so you can see the actual geometric complexity differences.


    Thoughts?






    Recent Entries

    Recent Comments