Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 08:02 AM

#5299394 Frame buffer speed, when does it matter?

Posted by Hodgman on 06 July 2016 - 04:18 PM

GPU ALU (computation) speeds keep getting faster and faster -- so if a shader was ALU-bottlenecked on an old GPU, on a newer GPU with faster ALU processing, that same shader might would likely become memory bottlenecked -- so faster GPUs need faster RAM to keep up :)


Any shader that does a couple of memory fetches is potentially bottle-necked by memory.

Say for example that a memory fetch has an average latency of 1000 clock cycles, and a shader core can perform one math operation per cycle. If the shader core can juggle two thread(-groups) at once, then an optimal shader would only perform one memory fetch per 1000 math operations.

e.g. say the shader was [MATH*1000, FETCH, MATH*1000] then the core would start on thread-group #1, do 1000 cycles of ALU work, perform the fetch, and have to wait 1000 cycles for the result (before doing the next 1000 cycles of work). While it's blocked here though, it will switch to thread-group #2, and do it's first block of 1000 ALU instructions. By the time it gets to thread-group #2's FETCH instruction, (which forces it to block/wait a 1000 cycle memory latency), the results of thread-group #1's fetch will have arrived from memory, so the core can switch back to thread-group #1 and perform its final 1000 ALU instructions. By the time it's finished doing that, thread-group #2's memory fetch will have completed, so it can go on finishing thread-group #2's final 1000 ALU instructions.


If a GPU vendor doubles the speed of their ALU processing unit -- e.g. it's now 2 ALU-ops per cycle, then it doesn't really make this shader go much faster:

The core initially does thread-group #1's first block of 1000 ALU instructions in just 500 cycles, but then hits the fetch, which will take 1000 cycles. So as above, it switches over to processing thread-group #2 and performs it's first block of 1000 ALU instructions in just 500 cycles... but now we're only just 500 cycles into a 1000 cycle memory latency, so the core has to go idle for 500 cycles, waiting for thread-group #1's fetch to finish.

The GPU vendor would also have to halve their memory latency in order to double the speed of this particular shader.


Increasing memory speed is hard though. The trend is that processing speed improves 2x every 2 years, but memory speed improves 2x every 10 years... in which time processing speed has gotten 32x faster... so over a 10 year span, memory speed tends to actually get 16x slower when compared to processing speeds :o

Fancy new technologies like HBM aren't really bucking this trend; they're clawing to keep up with it.


So GPU vendors have other tricks up their sleeve to reduce observed memory latency, independent of the actual memory latency. In my above example, the observed memory latency is 0 cycles in the first GPU, and 500 cycles on the second GPU, despite the actual memory latency being 1000 cycles in both cases. Adding more concurrent thread-groups allows the GPU to form a deep pipeline and keep the processing units busy while performing these very latent memory fetches.


So as a GPU vendor increases their processing speed (at a rate of roughly 2x every 2 years), they also need to increase their memory speeds and/or the depth of their pipelining. As above, as an industry, we're not capable of improving memory at the same rate as we improve processing speeds... so GPU vendors are forced to improve memory speed when they can (when a fancy new technology comes out every 5 years), and increase pipelining and compression when they can't.


On that last point -- yep, GPUs also implement a lot of compression on either end of a memory bus in order to decrease the required bandwidth. E.g. DXT/BC texture formats don't just reduce the memory requirements for your game; they also make your shaders run faster as they're moving less data over the bus! Or more recently: it's pretty common for neighbouring pixels on the screen to have similar colours, so AMD GPUs have a compression algorithm that exploits this fact - to buffer/cache pixel shader output values and then losslessly block-compress them before they're written to GPU-RAM. Some GPUs even have hardware dedicated to implementing LZ77, JPEG, H264, etc...

Besides hardware-implemented compression, compressing your own data yourself has always been a big optimization issue. e.g. back on PS3/Xb360 games, I've shaved a good number of milliseconds off the frame-time by changing all of our vertex attributes from being 32 bit floats, to being a mixture of 16 bit float and 16/11/10/8 bit fixed point values, reducing the vertex shader's memory bandwidth requirement by over half.

#5299074 Unity vs Unreal Physics for driving game

Posted by Hodgman on 04 July 2016 - 07:46 PM

Depending on the type of driving game, you'll be writing a lot of the vehicle physics yourself, and just using the underlying physics engine for collision detection and integration.

#5298956 Porting OpenGL to Direct3D 11 : How to handle Input Layouts?

Posted by Hodgman on 03 July 2016 - 09:56 PM

While writing the abstraction i hit a bit of a road block : Input Layouts. So to my knowledge in Direct3D 11 you have to define Input Layout per shader (by providing Shader Bytecode). Whereas in OpenGL you have to make glVertexAttribPointer calls for each attribute

It's not per-shader, but per vertex shader input structure. If two shaders use the same vertex structure as their input, they can share an Input Layout. The bytecode parameter when creating an IL is actually only used to extract the shader's vertex input structure and pair it up with the attributes described in your descriptor.
In my engine, I never actually pass any real shaders into that function -- I compile dummy code for each of my HLSL vertex structures which is only used during IL creation -- e.g. given some structure definitions:
StreamFormat("colored2LightmappedStream",  -- VBO attribute layouts
	[VertexStream(0)] = 
		{ Float, 3, Position },
	[VertexStream(1)] = 
		{ Float, 3, Normal },
		{ Float, 3, Tangent },
		{ Float, 2, TexCoord, 0 },
		{ Float, 2, TexCoord, 1, "Unique_UVs" },
		{ Float, 4, Color, 0, "Vertex_Color" },
		{ Float, 4, Color, 1, "Vertex_Color_Mat" },
VertexFormat("colored2LightmappedVertex",  -- VS input structure
	{ "position",  float3, Position },
	{ "color",	   float4, Color, 0 },
	{ "color2",    float4, Color, 1 },
	{ "texcoord",  float2, TexCoord, 0 },
	{ "texcoord2", float2, TexCoord, 1 },
	{ "normal",    float3, Normal },
	{ "tangent",   float3, Tangent },
StreamFormat("basicPostStream",  -- VBO attribute layouts
	[VertexStream(0)] = 
		{ Float, 2, Position },
		{ Float, 2, TexCoord },
VertexFormat("basicPostVertex",  -- VS input structure
	{ "position", float2, Position },
	{ "texcoord", float2, TexCoord },
this HLSL file is automatically generated and then compiled by my engine's toolchain, to be used as the bytecode when creating IL objects:
Pass( 0, 'test_basicPostVertex', {
	vertexShader = 'vs_test_basicPostVertex';
	vertexLayout = 'basicPostVertex';
float4 vs_test_basicPostVertex( basicPostVertex inputs ) : SV_POSITION
	float4 hax = (float4)0;
	hax += (float4)(float)inputs.position;	hax += (float4)(float)inputs.texcoord;	 return hax;
Pass( 1, 'test_colored2LightmappedVertex', {
	vertexShader = 'vs_test_colored2LightmappedVertex';
	vertexLayout = 'colored2LightmappedVertex';
float4 vs_test_colored2LightmappedVertex( colored2LightmappedVertex inputs ) : SV_POSITION
	float4 hax = (float4)0;
	hax += (float4)(float)inputs.position;	hax += (float4)(float)inputs.color;	hax += (float4)(float)inputs.color2;	hax += (float4)(float)inputs.texcoord;	hax += (float4)(float)inputs.texcoord2;	hax += (float4)(float)inputs.normal;	hax += (float4)(float)inputs.tangent;	 return hax;

Won't claim this is the best/only way of doing this, but I define a "Geometry Input" object that is more or less equivalent to a VAO. It holds a vertex format and the buffers that are bound all together in one bundle. The vertex format is defined identically to D3D11_INPUT_ELEMENT_DESC in an array. In GL, this pretty much just maps onto a VAO. (It also virtualizes neatly to devices that don't have working implementations of VAO. Sadly they do exist.) In D3D, it holds an input layout plus a bunch of buffer references and the metadata for how they're bound to the pipeline.

The only problem with that is that an IL is a glue/translation object between a "Geometry Input" and a VS input structure -- it doesn't just describe the layout of your geometry/attributes in memory, but also describes the order that they appear in the vertex shader. In the general case, you can have many different "Geometry Input" data layouts that are compatible with a single VS input structure -- and many VS input structures that are compatible with a single "Geometry Input" data layout.
i.e. in general, it's a many-to-many relationship between the layouts of your buffered attributes in memory, and the structure that's declared in the VS.
In my engine:
* the "geometry input" object contains a "attribute layout" object handle, which describes how the different attributes are laid out within the buffer objects.
* the "shader program" object contains a "vertex layout" object handle, which describes which attributes are consumed and in what order.
* When you create a draw-item (which requires specifying both a "geometry input" and a "shader program"), then a compatible D3D IL object is fetched from a 2D table, indexed by the "attribute layout" ID and the "vertex layout" ID.
* This table is generated ahead of time by the toolchain, by inspecting all of the attribute layouts and vertex layouts that have been declared, and creating input layout descriptors for all the compatible pairs.

#5298885 draw lights in render engine

Posted by Hodgman on 03 July 2016 - 06:22 AM

Those are both different implementations of forward lighting. (1) Is single pass forward lighting and (2) is multi-pass forward lighting.
(2) Used to be popular back before shaders, or with the early shader models.
(1) Replaced it when shaders became flexibly enough.

They should both produce the same visual result -- except if you're not doing HDR (e.g. are using an 8-bit back buffer). In that situation, (2) will have an implicit saturate(result) at the end of every light, whereas (1) will only have this implicit clamp right at the end of the lighting loop.
There's also a middle-ground that prevents a technique explosion -- stop at material + N lights, and use
foreach(model in models) 
  for( i = 0; i < model.HitLights; i += N )
    model.draw(model.material,  model.HitLights.SubRange(i, i+N) );
Or another alternative -- you used to pre-compile many shader permutations (material + 1 light, material + 2 lights ...) because using a dynamic loop inside a shader used to be extremely slow.
These days, loops are pretty damn cheap though, so you can just put the number of a lights (and an array of light data) into a cbuffer and use a single shader technique for any number of lights in one pass.

#5298794 Copy texture from back buffer in Directx12

Posted by Hodgman on 02 July 2016 - 06:30 AM

If it's crashing, are you just not checking for success/failure? If it's failing, make sure you've got the directx debug layer installed.

#5298780 Does adding Delegates/Function pointers to an entity break ECS ideology?

Posted by Hodgman on 02 July 2016 - 02:10 AM

^What Josh said. ECS is not one particular pattern.

I'd say "pure ECS ideology" means: Entity doesn't has any logic, and components either. Logic goes into your systems. Entities are IDs, components are data, systems have all the logic. End of story.
So you wouldn't do any of what you mentioned (neither adding functions to an entity nor adding functions to a component).

Well a delegate is data, so that fits your description. A "CallDelegateOnConditionMet" system would have components containing game-state conditions to check, and delegates for the system to call when those conditions have been met :lol:

#5298626 Axis orientation

Posted by Hodgman on 30 June 2016 - 06:12 AM

By default, with no camera logic or anything, the hardware itself assumes that x is across the screen to the right, y is either up or down the screen, and z is either into or out of the screen.


On top of that, you can build any convention that you like.

Often games use a right handed coord system -- hold up your thumb and first two fingers, with thumb point right, index pointing up, and middle finger pointing towards you -- that's X, Y and Z.

Other games use right handed with Z as up -- thumb right, index finger away from you, middle finger up.


Other games use left handed coordinate system... z up, z in, z out, etc...

There's no "standard" :(


Back in the 80's and early 90's, 3d level editors were usually just 2d applications with a top down view, which would show a floor-plan with X/Y axis, which meant that Z became up and down. That convention is still popular with a lot of level designers, or anyone who's used CAD software.

Nowadays, Z defaults to in/out of the screen as I said at the start, so other games keep that convention and use Y as up and down in the world...


Then you've also got to define your rotational conventions. Is a positive X-axis rotation clockwise when looking from the origin out along +X, or anti-clockwise? :lol:

#5298620 Water and Fresnel

Posted by Hodgman on 30 June 2016 - 05:27 AM

What is it about water that makes it more reflective than other materials? Is it roughness?

What makes you say that it's more reflective than other materials?


A still pool of water will be almost perfectly smooth, so the weak reflections that it does have will have very little blurring occurring... which might make you notice them more, but there's actually very little energy being reflected off water at zero incidence.

#5298599 Water and Fresnel

Posted by Hodgman on 30 June 2016 - 01:42 AM

F0 is "reflectance at zero incidence", so reflectivity when looking straight down at a puddle on the floor. Water is mostly transparent when viewed at this angle.

When you view a puddle at a glancing angle (e.g. one in the distance / seen from the side), it becomes more reflective as the Fresnel function basically blends from F0 towards 100%.

#5298571 One channel textures not working for ES 2.0?

Posted by Hodgman on 29 June 2016 - 06:28 PM

GL_LUMINANCE stores a single value, but when sampled in a shader, it fetches vec4(value, value, value, 1.0). I'm not sure what happens when you try to render to one of these textures (via a FBO).

From the texture_rg extension it seems to be an attempt to modernize the API:

Historically one- and two-component textures have been specified in OpenGL
ES using the luminance or luminance-alpha (L/LA) formats. With the advent
of programmable shaders and render-to-texture capabilities these legacy
formats carry some historical artifacts which are no longer useful.

For example, when sampling from such textures, the luminance values are
replicated across the color components. This is no longer necessary with
programmable shaders.

It is also desirable to be able to render to one- and two-component format
textures using capabilities such as framebuffer objects (FBO), but
rendering to L/LA formats is under-specified (specifically how to map
R/G/B/A values to L/A texture channels).

This extension adds new base internal formats for one-component RED and
two-component RG (red green) textures as well as sized RED and RG internal
formats for renderbuffers. The RED and RG texture formats can be used for
both texturing and rendering into with framebuffer objects

#5298521 [MSVC] Why does SDL initialize member variables?

Posted by Hodgman on 29 June 2016 - 07:24 AM

So seems that is my answer right there. Seems that in this case you will obviously want to have SDL turned on all the time in all builds, because there is no point just protecting your developement/debug build. In this case I think I'll just turn it off altogether, as of now I don't require the additional security (I dare you to hack my offline games :D ) and I certainly don't want the overhead of additional checks for things that I could have easily found using something like cppcheck.

Security isn't necessarily about attacking a particular program, but using a weak program as a tool to attack a whole computer. e.g. in an insecure game, a save-game file, a custom level, a custom skin, a mod, etc, could contain data specifically designed to trigger a buffer overflow (or other security bug), in a very particular way, which allows the attacker to cause your game to run their code.
This could mean that someone sharing a save-game file in your game's community could actually be infecting all your players with a key-logger that steals their online banking password... which would be a disaster for you.


I'm not really a fan of the MS secure code generation features, but security is actually kinda important for all programmers to keep in mind.

#5298461 Dear people who actually work at videogame industry:

Posted by Hodgman on 28 June 2016 - 05:12 PM

In general, walking the walk trumps talking the talk :) So when comparing two candidates, the one who can prove they can actually do the job is in a better position.


...but often there's not two candidates, but two hundred candidates. That's a lot of resumes to read and interviews to conducts... so companies often use shortcuts. One common shortcut is to filter by arbitrary achievements -- e.g. throwing out all candidates who didn't finish high school, and then all candidates who don't have a university degree, etc...


If it's an option, practice your skills as much as you can in your spare time, and stay in school to get those arbitrary certificates as long as you can.

If you're in a country with a public / affordable university system (e.g. not america) then I'd especially recommend getting a degree.


Besides the learning that you'll do, university teaches some life skills, and the important skill of sticking through something that you don't really want to be doing for 4 years :lol:

It can also be important if you want to work in other countries. Often, to get a "skilled worker" visa/permit, you need a tertiary degree to prove it.

#5298380 [MSVC] Why does SDL initialize member variables?

Posted by Hodgman on 28 June 2016 - 07:30 AM

When /sdl is enabled, the compiler generates code to perform these checks at run time:
*Performs class member initialization

This seems vague -- does it check for uninitialized members, or zero them, or both? If it generates a breakpoint when a debugger is attached, but also zero-fills if no debugger is attached, then it's arguably helpful -- helps find the bug and mitigates its harm in the wild.
But yes, generally it's better for your compiler/runtime to initialize members to something like 0xcdcdcdcd by default, as that's likely to cause your code to crash, which lets you find the bug.

I can imagine that when inheriting an old, unmaintainable, buggy codebase, it might be very tempting to have a "zero-initialize all memory by default" compiler switch! :D

#5298228 Launching game using Steam Greenlight?

Posted by Hodgman on 27 June 2016 - 07:11 AM

Greenlight by itself is pretty much useless for gaining an audience or getting new press attention. If you have the option of skipping it and getting onto steam directly, there's no harm done in not having a greenlight campaign.


Most of your organic greenlight audience will appear within the first few days of you submitting your game. If you don't get enough votes during this time, then you're in trouble, because organic traffic disappears. After the first few days, you will need external traffic source (e.g. press coverage, advertising) to drive viewers to the greenlight page.


Steam usually approves a new greenlight batch about once a month. There initially was a very large queue/backlog, but not so much these days. If you've got a decent game (one that can get significant positive votes in it's first few days), you'll usually get approved in the next batch -- so about 2-6 weeks. If you're unlucky you might sit in greenlight hell for a year or more.


In any case, I'd plan to do a PR push at the same time as the greenlight launch, so that hopefully you get some press, which will hopefully drive more traffic to your greenlight page to ensure success. I'd probably engage a game-specific PR consultant to go over your greenlight page itself and review your PR plan before publishing it too.

Most of your organic greenlight audience will appear within the first few days of you submitting your game. If you don't get enough votes during this time, then you're in trouble, because organic traffic disappears. After the first few days, you will need external traffic source (e.g. press coverage, advertising) to drive viewers to the greenlight page.


P.S. there's a lot of very dodgy services out there who promise to get your game through greenlight... Some even call themselves "publishers" and will try to get you to sign contracts where they own a large chunk of your income, when in fact they're just small-time key-giveaway twitter account. Some of these services will actually ruin your chances of ever being greenlit, because if Steam finds out that people have been bribed into upvoting your game in exchange for a chance to win free game keys, they might just ban you outright.

#5298227 Diagnosing problems in shaders

Posted by Hodgman on 27 June 2016 - 07:02 AM

RenderDoc is my go-to debugging tool for graphics.


For "printf" debugging like Nanoha described, it really helps if you set up a key in your game that reloads all the shaders from disk. This should only take a few seconds to reload every shader, and it lets you very quickly write some "printf" test code, hit ctrl+s, hit alt+tab, press your reload-shaders button, and see new results. Sometimes it means you can find bugs in one minute instead of 30 minutes :D