Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Online Last Active Today, 04:38 AM

#5191604 Slope-scale depth bias shadow map in HLSL

Posted by Hodgman on 06 November 2014 - 06:52 PM

Depth Biasing with slope-scale and offset shouldn't be done in your shader -- there's fixed function hardware sitting around for that job!
The interpretation of the depth-bias values depends on the depth target format.
For integer depth buffers:
  Bias = scale * max(|dzdx|,|dzdy|) + offset * 1/2^bits_in_z_format
For floating point depth buffers:
  Bias = scale * max(|dzdx|,|dzdy|) + offset * 2^(exponent(max_z_in_primitive) - mantissa_bits_in_z_format)
  If clamp is >0: FinalBias = min(Bias, clamp)
  Else If clamp is <0: FinalBias = max(Bias, clamp)
  Else: FinalBias = Bias
  Z += FinalBias

In D3D9, you set the offset & scale variables above with this code (clamp is always 0 / unused in d3d9):

device.SetRenderState(D3DRS_DEPTHBIAS, offset);
device.SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, scale);

In D3D11, the offset/scale/clamp variables are part of the RasterState:

//make a raster state object
ID3D11RasterizerState* state;
D3D11_RASTERIZER_DESC desc = {};
desc.DepthBias = offset;
desc.DepthBiasClamp = clamp;
desc.SlopeScaledDepthBias = scale
desc.... = ...;//set all the other members too!
device->CreateRasterizerState(&desc, &state);

//set the raster state from the pre-prepared object
context.RSSetState( state );

#5191453 Bets way to map a buffer

Posted by Hodgman on 05 November 2014 - 11:17 PM

Changes that you make to the buffer are only actually flushed through to the GPU when you call unmap.

Also, all GL commands are asynchronous -- the GPU will likely execute them many milliseconds after you call gl functions.


So, you need map-modify-unmap-drawmap-modify-unmap-draw... not map-modify-draw-modify-draw-unmap.


However, while this change will make it correct, it will also make it extremely slow.


The map command is requesting to modify the buffer, which is currently in use by the GPU. When you make the 2nd call to map, GL realizes that you're trying to modify a buffer that's in-use, so it will block and wait for the GPU to finish performing the first draw command before returning... The GPU might be running with 30ms latency behind the CPU, so this second Map call has to wait ~30ms for the GPU to drain it's command buffer...

This might be interesting reading:


#5191423 Is there a known bug with glMemoryBarrier() when using the latest AMD Catalys...

Posted by Hodgman on 05 November 2014 - 04:12 PM

Is this a known bug in the latest Catalyst driver?
Might be good to ask on the AMD forums here: http://devgurus.amd.com/welcome

#5191407 Funniest line of code ever ?

Posted by Hodgman on 05 November 2014 - 02:34 PM

Technically writing to one member of a union and then reading the other is undefined behaviour.

In practice, most compilers supportit as the preferred way to break aliasing rules.
Also, no sane compiler would add padding in that case -- the size of a structure has to be padded up to a multiple of it's alignment already (or arrays wouldn't work), so there's no need for extra padding.

Still, instead of the union, I'd prefer an array with named enum values as the indices.
...or making the whole animation system data-driven so there's no need to hard-code names to begin with ;)

#5191092 Datatype Size and Struct Compiler Padding

Posted by Hodgman on 04 November 2014 - 06:51 AM

Short story: avoid direct binary read/writes of internal data structs if you ever ever will do that on a machine other than one single development machine that will never be upgraded.

That's a bit dramatic; sure there's a lot of pitfalls and caveats, but I've seen game engines do this forever: generating data structures from a Windows PC toolchain, and then using those blobs on processors from MIPS/imagination, Sony, Toshiba, IBM, Apple, Intel, AMD, Ati, nVidia, Motorola, ARM, DMP...
Yes it requires a pact with your compiler, but in my experience with game engines it's also standard practice.

You also see it in much more mundane software like (cross platform mind you) image file loaders, which define C structures that map to the different headers defined by those formats.

If you're a hobbyist in the PC space, with friendly x86 CPUs and very mature compilers, then the pitfalls are pretty easily traversed.


Even in the more professional space, with games we generally know the exact architecture that we're shipping on, unlike say, if you're writing code in the Linux kernel then yeah you'd want to be much more conservative and keep portability in mind.

#5191065 Datatype Size and Struct Compiler Padding

Posted by Hodgman on 04 November 2014 - 12:31 AM

I use in-place data structures extensively, just because it's great to have no deserialization step biggrin.png


All of the basic datatypes can change their size. Use the types in stdint.h for explicit sizes.


For padding, you just have to read the documentation for each compiler, and accept that you're writing compiler-specific code.

Use static assertions to validate your assumptions about padding. e.g.

struct Foo { int8_t a; int32_t b }; static_assert( sizeof(Foo) == 8, "" );


Whenever you change the data structures, increment a magic version number at the start of the "blob". It's prone to human-error, but works 100% of the time, most of the time.


For ease of development, keep all of your source/editable data files in some other format (XML/JSON/etc), and build a tool that converts them into these optimized binary formats. If you change the data-structures, the tool can rebuild the new binaries from the source files. Don't support editing binaries -- keep it a one-way flow.


For cases where you want to regularly change the structure-layout / file-format, or where you can't easily recreate the binaries from XML/JSON/etc, then I wouldn't recommend using these kinds of blobs. Use a structured serialization system that supports mutable schemas. User save-games are a good example of this -- you don't want to invalidate someone's saved progress when a new patch comes out, so use soemthing slightly less optimal but much more flexible here.


For platforms that require different data structures (or different endianess), make different versions of those structures in your engine (with an #ifdef around them) and allow the tool to be told which platform it's generating data for.

I also allow the use of "pointers" inside my blobs, by encoding them as either an absolute offset (unsigned number of bytes from the start of the blob), or a relative offset (signed number of bytes from the "pointer" variable itself). When you have complex structures with many parts linked together, this allows the data-writing tool to completely change the layout of the binary file format without changing any of the game/engine code.
As well as faster load times, sometimes you can get better runtime performance by having the tool layout your data structures to improve CPU caching, or by avoiding extra runtime memory management by pre-allocating memory inside the blob. It's a very nice feature of some middleware packages where they request a large blob of RAM up-front (with some data from disk streamed into it), and then they reuse that large allocation internally, instead of making lots of global malloc/free calls.

#5190728 Steam takes 30% ?

Posted by Hodgman on 02 November 2014 - 07:26 AM

Steam know they're going to sell a tonne more games through their website than you will through yours.
Yes, they let you generate as many redeemable steam keys as you like and then do with them what you like.

#5190499 Steam takes 30% ?

Posted by Hodgman on 31 October 2014 - 06:43 PM

The standard rate used by all the digital distributors is 30%. It would be a big surprise if steam asked for a different cut.

When I worked on AAA console games ("boxed product"), the best cut we got from retail sales was 0.006%.
(or if you include the fact that the publisher paid development costs upfront, that figure rises to 2%)
No joke.

70% is a pretty good deal in comparison ;)

One great thing steam does as well, is they allow you to sell the steam version of your game via your own website, with you keeping 100% (minus tax).

As for VAT, it varies greatly by region. You'll also pay taxes on that income too. Maybe twice depending on what country you're in.

#5190266 Surface to texture

Posted by Hodgman on 30 October 2014 - 07:19 PM

No, you have to create a texture from a file. After that, you can optionally get a surface from the texture if it's required.


If you create a surface from a file, you can't use it as a texture.

#5190258 Surface to texture

Posted by Hodgman on 30 October 2014 - 06:49 PM

You can't do it that way; you have to go in the opposite direction sad.png

You need to have an IDirect3DTexture9 interface, and then call GetSurfaceLevel to get the corresponding IDirect3DSurface9 interface.

#5188456 Heap Error

Posted by Hodgman on 21 October 2014 - 11:04 PM

The main causes will be a use-after-free, or a buffer-overrun error:

Foo* foo = new Foo;
delete foo;
foo->member = 42; // write-after-free
vector<Foo> myVector;
myVector[9000].member = 42; // out-of-bounds write

Application Verifier is a good windows tool that can help track these down, but it's complicated...




#5187897 Indexed Drawing - Is it always useless when every face should have its own no...

Posted by Hodgman on 18 October 2014 - 05:24 PM

In the specific case of flat shading, there's other solutions, such as setting the interpolation mode of the normal variable to flat/no-interpolate, or to not have any per-vertex normal data at all but calculate the surface normal in the pixel shader using the derivative of the position.

#5187631 Integrating Image Based Lighting

Posted by Hodgman on 17 October 2014 - 05:18 AM

In an offline quality renderer, you'd sample every pixel in the environment map, treating them as a little directional light (using your full BRDF function), which is basically integrating the irradiance.
This is extremely slow, but correct... Even in an offline renderer, you'd optimise this by using importance sampling to skip most pixels in the environment map.

For a realtime renderer, you can 'prefilter' your environment maps, where you perform the above calculations ahead of time. Unfortunately the inputs to the above are at a minimum, the surface normal, the view direction, the surface roughness and the spec-mask/colour... That's 4 input variables (some of which are multidimensional), which makes for an unpractically huge lookup table.
So when prefitering, typically you make the approximation that the view direction is the same as the surface normal and the spec-color is white, leaving you just with surface normal and roughness.
In your new cube-map, the pixel location corresponds to the surface normal and the mip-level corresponds to the roughness. For every pixel in every mip of this new cube-map, sample all/lots of the pixels in the original cube-map * your BRDF (using the normal/roughness corresponding to that output pixels position).

#5187576 latching on to vsync

Posted by Hodgman on 16 October 2014 - 10:06 PM

Well I was thinking about sampling input at some multiple of the screen refresh rate and some other timing sensitive thoughts I was mulling over.  But I want the phase of the samples to be in sync with refresh.

Yeah there's no reliable way to do that, that I know of -- aside from writing a busy loop that eats up 100% CPU usage until the next vblank sad.png 

On older systems, the video-out hardware will send the CPU an interrupt on vblanking, and the CPU will then quickly flip the buffer pointers around.
However, modern systems will not involve the CPU at all -- the CPU will queue up multiple frame's worth of drawing commands to the GPU, and the GPU will be responsible for waiting on vblanks, flipping pointers, and syncing on back/front buffer availability...

By isolating these details inside the device drivers, the OS is able to evolve in this way... instead of being stuck with the old interrupt model, wasting CPU time, for the sake of compatibility with old software wink.png


You probably just want to create a background thread, and put it to sleep with small intervals similar to the refresh rate.

e.g. in doom 3, their input gathering thread is hard-coded to 16.66ms / 60Hz.

#5187365 latching on to vsync

Posted by Hodgman on 16 October 2014 - 05:46 AM

It's really not a good idea unless you're writing a device driver... what do you want to use this for?


It's easy to find a lot of old / outdated material on this - e.g.




slightly newer:


newer again:



But again, this is not something you want to do in 99.9% of cases... There might be an alternative solution to your actual problem.