Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Online Last Active Today, 02:11 AM

#5291749 Proper output buffering algorithm

Posted by Hodgman on 15 May 2016 - 06:55 PM

Can you make the loop's idle state have a timeout - so while waiting for commands, it will also wake up on it's own if no command is received within a certain amount of time?

#5291741 PBR 3D Models

Posted by Hodgman on 15 May 2016 - 05:23 PM

Most tools/engines are converging on two different workflows for specular maps.


In "traditional" game art, you usually had:

Specular Power: size/shape of the highlight

Specular Mask: intensity of the highlight.

These had lots of different names, such as gloss maps, spec-color maps, or just specular maps... but it usually boiled down to a power value and a mask value.


In PBR, the F0 ~= mask/spec-colour, and roughness/glossiness ~= power.


The two new workflows for authoring these values are the "spec/gloss" workflow and the "rougness/metalness" workflow.

Spec/gloss is very similar to the traditional workflow -- the monochrome gloss map controls the size/shape of the highlight, and the RGB specular map contains F0, which acts very similarly to a traditional RGB mask value. This workflow is easy for traditional game artists to understand due to the similarity. The main difference is that traditionally, artists put a lot of details into their masks/spec-colours, when they should now be putting detail into the power/roughness/gloss maps instead.

Metal/rough is different, but IMHO simpler and more intuitive -- the monochrome roughness map controls the size/shape of the highlight, but in a slightly different way (in some engines, it's inverted, so black = small highlights and white = large highlights), and the monochrome metalness map indirectly specifies the F0 value. If metalness is zero, then the F0 value is some hard-coded non-metal value, such as vec3(0.02, 0.02, 0.02), otherwise if metalness is one, F0 is the value stored in your material colour map. Moreover, metalness also affects your diffuse colour! If metalness is zero, the diffuse colour is the value stored in your material colour map, otherwise if metalness is one, the diffuse colour is black.

This is because pure metals should have bright RGB F0 values and black diffuse values, and non-metals should have monochrome, dark F0 values in approx the 2%-4% range, most of the time. So this workflow makes it harder for artist's to create "impossible" materials -- such as having bright a blue diffuse colour and bright red F0.


So with Spec/gloss, you'd have diffuse colour (RGB), specular colour (RGB) and glossiness (mono).

And with Metal/rough, you'd have material colour (RGB), metalness (mono) and roughness (mono).

#5291664 Texture Masking for Pseudo-Lens Flares

Posted by Hodgman on 15 May 2016 - 12:29 AM

But regarding the other approach you mentioned, wouldn't it be an expensive task to calculate the vector between the sun and the center of the camera, and rendering sprites based on the vector? Also, would using a uniform buffer object for sprites help with the performance?

No, it's ridiculously cheap!
You can probably do it on the CPU in less than a microsecond. Any sprite rendering technique can be used, such as creating dynamic vertex data from the CPU, etc...
Last time I used this technique, we had a static VBO containing half a dozen quads. Each quad's vertex had an attribute to identify which corner it was ([0,0] to [1,1]), and an attribute identifying which quad it belonged to (0,1,2...). The vertex shader then received the sun position from a UBO, computed the line, and then placed the quads at the appropriate positions.
This will be a thousandfold cheaper than any technique based on actually analyzing the framebuffer for bright areas :) It's how every 90's game achieved lens flares :lol:

Also, if I understand correctly, according to the GPU technique, we get the bright spots and calculate the flare geometry (positions) in the shader, and read that using the SSBO, and then render the flares using that data?

Yep. This gives automatic flares on any bright surface, but is very complex.

#5291640 Instance Rendering dosen't work :(

Posted by Hodgman on 14 May 2016 - 06:55 PM

But whats behind row_major, has he switched the rows with columns?

You use it to make sure that HLSL uses array indexing the same way as your CPU-side math library does.


Given a matrix:




Some math libraries will store it in RAM as ABCD, and others as ACBD.

By default, HLSL assumes your math library is doing the 2nd option (even though D3DX math library has always used the 1st option in the past!!)

#5291634 when to use concurrency in video games

Posted by Hodgman on 14 May 2016 - 06:11 PM

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix

MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.

Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually :)


This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).


Posted by Hodgman on 14 May 2016 - 04:39 PM

You might run foul of regulation.

As you're actually storing and trading in a real world currency (btc can be directly converted to and from dollars, pounds etc and even stolen and directly spent) you'll find that you have to operate to the same terms and conditions as banks and money lenders, much like PayPal is forced to within the EU.

It's a slippery slope and one I personally would rather avoid...

That's the thing about BTC though - you're not dealing with real currency. The exchanges deal with real money, but you don't have to.
Yes, paying real money back to your players is fraught with regulation -- we've looked into in-game semi-pro e-sport tournaments before, and the legalities get super complex as soon as you're moving money around between players... We might still do it, but by partnering with a different company who is happy to do the financial side of things :lol:

As long as a different company is actually the one taking and giving money from/to your players, then they take on all of that legal risk. There's a lot of companies popping up right now who are happy to take on the regulatory work, e.g. https://esportshero.com/


BTC is as much a real currency as WOW gold or Eve ISK - for now.
Furthermore, BTC doesn't even require the money to pass through you in the same way that real money would do. Players can directly be trading "money" with each other, without any of the money ever being in your control, or being an asset of yours (this is important because while BTC isn't a currency, it is an asset in some countries, where a business would still have to pay capital gains tax on it). If you used USD, you'd need players to transfer it to you, and then you'd transfer it to the other player. Player to player transfers can't be validated by your game in the same way that they can with BTC, where all transactions are publicly known/verifiable.
So for a trading game where players used USD - your company takes on a massive legal burden, a massive taxation burden, and a massive security burden (cheaters have a financial incentive to steal accounts, so you need two-factor-auth, and a full-time anti-cheat team...). A trading game where players use BTC dodges the legal and taxation burdens, and some of the security burden -- hacking you/the game would be useless, but stealing people's accounts would still be an issue (the same as regular BTC wallet theft).


Posted by Hodgman on 14 May 2016 - 07:14 AM

The game could handle a BTC account/wallet for you internally. You could not even be aware that it's using a cryptocurrency.

It would be interesting to find out later that all of your in-game credits are actually transferrable to other BTC wallets, or crypto currency exchanges :lol:


There's plenty of services that allow you to purchase BTC from USD/etc, via typical payment providers. You could still allow people to buy your in-game currency using real currency.

#5291545 when to use concurrency in video games

Posted by Hodgman on 14 May 2016 - 06:30 AM


If you use volatile for multi-threading, you're doing it wrong.

That isn't necessarily true, granted using it everywhere or for everything which is shared between two -or more- threads would be morally reprehensible.
If all you need to do is read some memory, and you don't truly care if SNAFU is the word of the day.. then feel free to use volatile.
I used volatile on certain data members I wanted to draw to screen as text. Health, ammo, score. Single values which get updated *maybe* once a frame, so if a couple frames got fubar'd I didn't care. In practice, every single frame was A-OK. Perhaps my loads weren't heavy enough.


Volatile does absolutely* nothing for multi-threading. You could leave those values as non-volatile and it would act the same way.

*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix, which acts as a SMP memory fence, which forces them to occur in order [edit]specified as having acquire/release semantics[/edit] - but that compiler behavior is frankly, wrong :P

In general, volatile does not ensure that reads/writes are atomic, or that reads/writes occur in order with respect to the rest of the code. If you see that keyword in multi-threaded code (perhaps except for inside the implementation details of std::atomic), then you have a bug. 

See also: https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

#5291543 [D3D12] Using SetGraphicsRoot*View functions

Posted by Hodgman on 14 May 2016 - 06:18 AM

IIRC, these functions actually skip the need to create a view at all... which makes them slightly faster, and much more dangerous.
Normally a CBV has two members: BufferLocation and SizeInBytes.
When you use SetGraphicsRootConstantBufferView, it's equivalent to just setting the BufferLocation and leaving SizeInBytes as a magic value of "unknown".

This means that if you provide an address of a GPU allocation that's only 4 bytes large, but your shader is expecting to be able to read a 256 byte structure, then very bad things will happen -- you'll most likely crash the GPU hardware, etc...

Regarding SRV's, AFAIK this isn't usable for texture's - only for buffers. This is because textures really do require knowledge of the members in the SRV, such as width/height/format.
It's possible for a shader to read from a StructuredBuffer without knowing the full SRV details / only knowing the GPU address... but the same dangers apply: if you read out of bounds, you're dead.

Likewise, a shader can read from a RWStructuredBuffer without knowing the full UAV details (if you're careful).

#5291541 Texture Masking for Pseudo-Lens Flares

Posted by Hodgman on 14 May 2016 - 06:11 AM

Blurring one image using another image is a convolution filter. The straightforward way of doing it is extremely expensive - it's O(N * M), where N is your pixels to be blurred, and M is the pixels in the shape image.

So a 64x64 pixel shape requires your blur pixel shader to perform four thousand texture fetches, which obviously isn't practical :wink:


Instead of using a "gather" based technique, it's much more efficient to use a "scatter" based technique.

i.e. instead of:

  for each pixel, gather pixels within the shape-blur area if this area contains a flare source.

to use:

  for each pixel, if a flare source, scatter blur shape to the surrounding areas.


These kind of scatter-based effects can't be done with simple post-processing shaders. You need to use UAV's/SSBO's for read-write access, or more likely: you need a two-pass technique, where the first pass generates some geometry (stream out / transform feedback / compute to generate vertices), and then the second pass renders that geometry to draw the blur sprites.


The same problem is faced with DOF effects that want to use a custom/textured "bokeh" shape. You might find something by searching for Bokeh shaders.


However, instead of doing a fancy modern GPU based scatter blur shader, you can just go old-school instead. Modern GPU-based lens flares have the advantage of allowing any bright pixels to create a lens flare, but, in your screenshots it looks like you only need the sun to create a lens flare.

On the CPU, you can determine where the sun is going to be on the screen, and then define a 2D line that passes through this sun position and the centre of the screen. Along this 2D line, you can then place several different sprites of your lens flare shapes, and you're done :)

#5291485 Downsampling texture to half resolution

Posted by Hodgman on 13 May 2016 - 10:12 PM

If you've already done bilateral blurring, then it should be pretty easy :D

To do depth-aware upsampling, when going from half-res to full-res, for each full-res pixel, point-sample the nearest 4 half-res pixels and generate standard bilinear weights for them.
Then, perform a depth/normal threshold test of some kind, to determine if each of those samples is 'valid' or not. If a sample is not valid, set its weight to zero.
Renormalize the weights so they sum to 1.0 (e.g. weights.xyzw /= dot(weights.xyzw, (float4)1))
-- But, take care to handle the case where all weights are zero: in that case, there's no valid low-res data that corresponds to your high res pixel, and the above code snippet will divide by zero! So, take the closest depth match, or average all 4 samples, or just use the initial bilinear weights, etc...
Combine the 4 samples using their new weights.

#5291477 Downsampling texture to half resolution

Posted by Hodgman on 13 May 2016 - 08:55 PM

Yes 1/2 are the standard approaches. A single texture fetch with bilinear filtering will calculate the average for you :)

However, you can't average depth and normals... Well you can, but the results won't be sensible and won't produce a good result -- at the edges of objects, the averaging will take two discontinuous surfaces (e.g. a character and the background), and "invent" a new surface that's half-way between both (something floating half way between the character and the background).

In this case, you want to simply throw away 75% of your data when downsampling, and then use a bilateral depth/normal-aware upsampling feature when going back to full resolution.

#5291393 Non-trivial union

Posted by Hodgman on 13 May 2016 - 05:15 AM

class Type
  Type() :

  Type(const Type& that) :

I read that you may only initialize one member of the union (otherwise the constructor is ill formed). In this case I choose the mObject variable.

You don't initialize any of the union members (see the initializer lists quoted above). You only perform assignment to the uninitialized members within the constructor body.

#5291360 when to use concurrency in video games

Posted by Hodgman on 12 May 2016 - 08:31 PM

​I am asking when to use concurrency when developing a video game.

Whenever you've got a function that has to operate on more than one object. Most game systems have more than one object in them... So, everywhere.

#5291114 Atmospheric Scattering with SkyDome, shader implementation

Posted by Hodgman on 11 May 2016 - 06:40 AM

First things first, try adding a tonemapping operator at the end of your shader.
e.g. reinhard:
    atmos = pow( clamp(atmos / (atmos+1.0),0.0,1.0), vec3(1.0/2.2) );
or logarithmic:
    atmos = pow( clamp(smoothstep(0.0, 12.0, log2(1.0+atmos)),0.0,1.0), vec3(1.0/2.2) );