Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 09:11 PM

#5206119 Vertex Shader - Pixel Shader linkage error: Signatures between stages are inc...

Posted by on 22 January 2015 - 09:34 PM

Change your vs_out struct (in both shaders) to:
struct vs_out
    float4 colour : COLOR0;
    float2 tex : TEXCOORD0;
    float4 pos : SV_POSITION;
The pixel shader doesn't use pos or tex, so they get optimized out, leaving the pixel shader with an input structure of:
  [0]  float4 colour;

The vertex shader doesn't use tex, so it gets optimized out, leaving the vertex shader with an output structure of:
  [0]  float4 pos;
  [1]  float4 colour;

Looking at the array indices on these generated interpolants, vs[0]'s semantic doesn't match ps[0]'s semantic, and vs[1] isn't even used.

With my change, you should end up with:
  [0]  float4 colour;

  [0]  float4 colour;
  [1]  float4 pos;

#5206107 Possible shadow artifact on the caster in a ray-tracer.

Posted by on 22 January 2015 - 08:32 PM

1: “Ain’t” ain't a word.

FTFW laugh.png

#5206104 no vsync means double buffering to avoid tearing, right?

Posted by on 22 January 2015 - 08:24 PM

single buffering = not possible on modern OS's

double buffering = mandatory! Screen tearing will be visible.

double buffering w/ vsync = no tearing, but CPU/GPU sleeps occur (waiting for vblank) if frame-time doesn't line up with refresh rate nicely

triple buffering = greater latency...

triple buffering w/ vsync = no tearing, greater latency, but less CPU sleeps.

#5206098 Current-Gen Lighting

Posted by on 22 January 2015 - 07:48 PM

Unity 5 and UE4's PBR solution (both powered by Enlighten, so of course they'll appear similar) is to reduce specular lighting down to a binary value that's either reflective or not (metal or dielectric). As for maps, there's a base color texture that shouldn't have any lighting baked into it to portray any type of depth. There's also roughness, which determines how reflective/shiny an object is.
This "metal" property is a new type of spec-map. In the traditional model you had a specular-mask-map (which was a multiplier for how intense the specular was) and a specular-power-map (which defined how 'tight'/small the highlights were).


With PBR you can still have these two kinds of maps, or another popular choice is the "metalness" workflow. This workflow is based on the observation that most real (physical) dielectrics have monochrome specular masks, all with almost the same value (about 0.03-0.04)... so there's not much point in having a map for them -- just hardcode 0.04 for non-metals!

Metals on the other hand, need a coloured specular mask, but at the same time, they all have black diffuse colours!

So you end up with this neat memory saving, as well as a simple workflow -- 

specPower = roughnessTexture;
if( metal )
  specMask = colorTexture;
  diffuseColor = 0;
  specMask = 0.04;
  diffuseColor = colorTexture


Normalized blinn-phong is typical N * L lighting, right? Cook-torrence is how eye-to-normal half-vector used to compute last-gen's concept of specular lighting, right?
NdotL is the core of every lighting algorithm, basically stating that if light hits a surface at an angle, then the light is being spread over a larger surface area, so it becomes darker.


This bit of math is actually part of the rendering equation (not the BRDF).

The lambertian BRDF actually is just "diffuseColor".

The rendering equation says that incoming light is "saturate(dot(N,L))".


So when writing a lambertian/diffuse shader, you write:

nDotL = saturate(dot(N,L));

result = nDotL * diffuseColor; // incoming light energy (from rendering equation) * the BRDF


Blinn-phong is:

result = NdotL * pow( NdotH, specPower ) * specMask;

Although traditionally, a lot of people wrongly excluded nDotL from this, and just wrote

result = pow( NdotH, specPower ) * specMask;


However, blinn-phong is not "energy conserving" -- with high specular power values, lots of energy just goes missing (is absorbed into the surface for no reason).

Normalized blinn-phong fixes this, so that all the energy is accounted for (an important feature of "PBR").

result = NdotL * pow( NdotH, specPower ) * specMask * ((specPower+1)/2*Pi)


Cook-Torrance has almost become like a BRDF framework with "plugins" biggrin.png which takes the form:

result = nDotL * distribution * fresnel * geometry * visibility


e.g. with normalized blinn phong distribution, schlick's fresnel, and some common geometry/visibility formulas --

distribution = pow( NdotH, specPower ) * specMask * ((specPower+1)/2*Pi)

fresnel = specMask + (1-specMask)*pow( 1-NdotV, 5 );

geometry = min( 1, min(2*NdotH*NdotV/VdotH, 2*NdotH*NdotL/VdotH) )

visibility = 1/(4*nDotV*nDotL)


Some newer games are replacing normalized-blinn-phong with GGX (within their Cook-torrance framework).

#5206080 Managing State

Posted by on 22 January 2015 - 05:58 PM

In practice, at the beginning of the draw call processing, the default set of states is copied onto a local set of states. The state as available by DrawItem is then written "piecewise" onto the local set. Then the local set is compared with the state set that represents the current GPU state, and any difference cause a call to the render context as well as adapting the latter set accordingly.

I actually do this kind of "layering" earlier, and the result is a "compiled" DrawItem structure biggrin.png
Often I have multiple state vectors being overlaid, such as defaults on the bottom, shaders-specific defaults on top of that, then material states, then per-object states, then per-pass overrides.

#5206064 Options for GPU debugging DX11 on Windows 7

Posted by on 22 January 2015 - 04:31 PM

RenderDoc is an awesome PIX replacement

The vendors all provide debuggers, which I think work with any GPU -

Intel GPA

AMD Perfstudio

nVidia NSight

#5205976 OpenGL to DirectX

Posted by on 22 January 2015 - 06:56 AM

thanks all for the answer, i try unity and i find it very easy, you can make games from copy pasting

You can make a game with D3D by copy pasting too, but you won't learn much that way...
You didn't answer if you want to learn how to program a GPU (what GL/D3D are for) or just want to make a game.
Why not make a game from scratch (no copy and pasting) on a real game engine first?
If you don't want to use an existing engine (for whatever reason), you can still use an existing graphics library, which is just a wrapper around GL/etc.
E.g. Look at Horde3D - you still have to write all your own C++ code, but they've abstracted GL from an unwieldy GPU-API into an understandable API designed for people who are making their own game engines.

i want to learn the core of game dev

On a professional game programming team of 20 staff, only one of them will write D3D/GL code - it's a specialist engine development skill, not a core game development skill.

if i choose to start in directx, will it be easy to switch to opengl? or vise versa?

Yes. They're both just APIs for sending commands to the GPU. Once you understand how GPU's work, then learning a 2nd/3rd/4th API is much easier.

#5205974 Encapsulation through anonymous namespaces

Posted by on 22 January 2015 - 06:47 AM

There's nothing wrong with this, and it's what anonymous namesakes are for. BTW, it's the same as:
static int privateVariable = 54;
    static int PrivateHelperFunction()
        // can change this to whatever i want without breaking the public interface
        return 123;
This style used to be quite common in C code, and you might even call it an ADT instead of a public interface if you came from those circles...

However, the 'there can only be one' and thus the singleton/global-state is a code smell. Why dictate that the library has to have a single global state and restrain it like that if you don't have to?

#5205943 Managing State

Posted by on 22 January 2015 - 02:05 AM

I wrap up every API in a stateless abstraction. e.g. at the lowest level, every renderable object is made up of (or dynamically creates per frame) DrawItems similar to below.

The renderer then just knows how to consume these DrawItems, which fully define the underlying API state, so it's impossible to accidentally forget to unset some previous state.

enum DrawType { Linear, Indexed, Indirect; }
struct Resources { vertex/instance/texture/cbuffer pointers };
struct DrawItem { u8 raster; u8 depthStencil; u8 blend; u8 primitive; u8 drawType; u16 shader; u16 inputLayout; Resources* bind; u32 vertexCount; u32 vbOffset; u32 ibOffset; };
typedef vector<DrawItem*> DrawList;

#5205938 Triangles can't keep up?

Posted by on 22 January 2015 - 01:35 AM

I could use instancing, but I've heard and get the notion that this is a trap. That the performance is poor unless I'm doing a certain amount of vertices per object and that a instancing a quad is not worth it. Something to best be avoided much like a GEO-shader. Maybe I have my wires crossed here?

Yes, the performance wins with instancing mostly appear when you have many vertices per instance. Only have 4 verts per instance is not ideal... but might still be worth it because it makes your code for drawing particles very simple -- One buffer with per-vertex data (just 4 tex-coords/corner values), one buffer with per-particle data (position/etc).


At the moment I am actually using this instancing technique to draw a crowd of 100000 people (each person is a textured quad, so 100k instances of a 4-vertex mesh) - so the performance is not terrible, it's just not as good as it could be in theory. 


That I could just use the glVertexAttribDivisor call, but not actually use one of the glDraw** instance calls

On GL2/D3D9 you can do this... On GL3/D3D10 you have to use instancing and the per-instance divisor (or do it yourself in a shader).

I interpret this two different ways. In the top portion you make it seem like I have the ability to do:
//In vertex shader main
int index = gl_VertexId % 4;
gl_Position[index] = vec4(1,1,1,0);

No, more like

int index = gl_VertexId % 4;
int cornerindex = gl_VertexId / 4;
vec3 position = u_PositionBuffer[index];
vec2 texcoord = u_VertexBuffer[cornerIndex];
gl_Position = mul( mvp, vec4(position + texcoord*2-1, 1) );

My question here is do you literally mean I can have a VBO sent through a uniform?

Shaders can have raw uniforms (e.g. a vec4 variable), UBOs (buffers that hold a structure of raw uniforms), Textures (Which you can sample/load pixel from) and yes, VBOs (which you can load elements from).
In D3D, you can't directly bind textures/buffers (resources) to shaders - you can only bind 'views' of those resources. So once you have a Texture or Buffer resource, you create a "Shader Resource View" for it, and then you bind that shader-resource-view to the shader. This means that binding a buffer to a shader is exactly the same as binding a texture to a shader -- they're both just "resource views".

In GL it's a bit different. In GL there's no "resource views", instead, you can just bind texture resources to shaders directly (I assume you know how to do this already - texturing is important biggrin.png).

In order to bind a buffer to a shader, you have to make GL think that it is a texture. You do this by making a "buffer texture" object, which links to your VBO but gives you a new texture handle! You can then bind this to the shader like any other texture, but internally it'll actually be reading the data from your VBO. In your shader, you can read vertices from your buffer by using the texelFetch​ GLSL function, e.g.

samplerBuffer​ u_positionBuffer;
vec3 position = texelFetch( u_positionBuffer, index );

#5205863 OpenGL to DirectX

Posted by on 21 January 2015 - 03:57 PM

The real question is what you want to do. Do you want to learn the core of 3D rendering? That is procedural meshes and shaders. Do you want to learn rendering in general? Choose one: DirectX or OpenGL. Do you want to write a real game? Don't use either. Start with [a game engine like Unity, or at least an existing OpenGL/DirectX wrapper like Ogre/Horde3D/etc].

^^ this (slightly edited for my opinion).

#5205723 Developing or Designing, Which Should I Do?

Posted by on 21 January 2015 - 01:46 AM

1) From what I understand, the designers do most of the creative work, putting characters, maps, and levels together, and the developers mostly translate it into coding and put it all together.
2) My only problem with being a designer is that I have no artistic ability beyond stick figures. Could I succeed in a game designing career without artistic skills?
3) On the other hand, I don't know if I want to be a full-time coder either ... coding for 8 hours a day, 5 days a week would get too tedious.
4) I definitely want to have some creative impact on my games' development -- Is there a way that I could take on both roles, help with the coding and design?
5) Should I just stick to writing and consult with the designers so I can get my ideas put into the games?

1) Nope, you're mixing together a LOT of different jobs:
Game developer -- anyone who works for a games company and works on the game. This includes designers!
Game designer -- is an expert at talking about game mechanics. Can design a board game that's actually fun. Also often has to do a lot of similar work to a Producer / Project Manager, in order to make sure everyone else is effectively moving forward on the project. Also, these people make up about 1% of the whole team they're rare -- e.g. you might have 50 other staff for each game designer.
Level designer -- knows a lot about game design, and how spaces affect gameplay. They work hand-in-hand with the 3d-environment-artists and game-designers to design the spaces in which the game will take place. After they've designed a space, the 3d environment artists will make it look pretty.
Game programmer -- writes the code that makes all the things in the game happen. Sometimes they do exactly what the game designer says to do, other times they have a lot of freedom to interpret and iterate on the the designer's original ideas.
Concept artist -- draws illustrations to guide the whole team, helping them visualise the end-product before it's been created. Often are the ones who design the 'look' of the characters/environments during the pre-production phase.
Environment artists -- make the pretty art/models/textures for buildings, locations, worlds.
Character artists -- sculpt the characters for the game.
Texture artists -- sometimes studios have dedicated people who's whole job is to paint the surfaces of 3D objects created by other artists.
Rigger -- takes the characters (and other moving objects) and attaches them to a skeleton so they can be animated.
Animator -- takes the rigged characters/other objects and creates all the different animations required, such as walking/running/jumping/etc... Sometimes using mo-cap data as a base.
Writer -- not usually a full-time job at a studio. Writes the storylines, etc... but this has zero impact on the game mechanics.

2) Designers produce zero artwork, so you're fine there.
3) If you don't like the idea of doing one job for 8 hours a day, then the workplace is not going to be fun sad.png
4) At some studios, game-programmers have a lot of input when translating the game-designers' mechanic ideas into reality... but at other studios you don't have any creative input. If you don't enjoy the creativity of writing code itself, you might not want to be a coder...
Going the "indie" route lets you be responsible for every single role in a company though biggrin.png
5) Being a writer is a completely different job to being a game designer.

#5205713 I have an idea that I think could be successful, what do I do with it?

Posted by on 21 January 2015 - 12:16 AM

Do you have just the dot points, or are you actually designing it?
What you have at the moment is only the very, very starting point for a game idea. There's months of work ahead to turn it into a rough game design.

#5205645 Composers - Do you ever need other skills?

Posted by on 20 January 2015 - 04:53 PM

All of the composers that I've worked with *who were full-time/salaried employees of a game company* also were sound designers (fx, foley, etc) and had to be able to use the engine's basic tools somewhat.

Usually there's only a small number of sound staff at games companies (e.g. One sound designer/composer and one audio-programmer in a 100-person office), so they kinda have to do everything sound related.

#5205446 OpenGL samplers, textures and texture units (design question)

Posted by on 19 January 2015 - 08:48 PM

Oh you want to start a war, dont you? smile.png

Sorry tongue.png

On current generation hardware

To clarify myself here -- AMD has won the console wars for now, with them supplying Microsoft, Sony and Nintendo with GPU architectures. As far as AAA stuff goes, the GCN architecture is the only one you really have to optimize for; it is the current generation -- acting as both your primary target, and you minimum spec for the PC port.
Even though nVidia has a majority market share in PC gaming, they're now the "alternative" GPU that a minority of total consumers will be using.
The assumption then is that if you've optimized it to run well on your min-spec GCN GPU, it will run fine on nVidia cards anyway.
Everything below is relevant to AMD's GCN architecture. nVidia's architecture isn't quite as bindless yet. 

Timotty Lottes has two posts with a very thorough analysis on both styles on modern HW.

As much as I like Lottes (I was very looking forward to playing his game, until he shelved it to start work at the Graphics Mafia sad.png) he's playing the GL-apologist here and has stopped examining the AMD side of things as soon as he got the conclusions he (and his employer) was looking for.
I won't go into his "Re: Things that drive me nuts about OpenGL" post because it's off topic and I don't have anything nice to say laugh.png 
A bunch of issues with his "Bindless and Descriptors" post though--
1- He fails to mention that his "D3D" examples could be used by GL drivers, and his "GL" examples could be used by D3D drivers. There is an API<->Hardware mismatch already, with the drivers converting between API abstractions and hardware realities. If his "GL" examples are indeed more optimal, you can expect that D3D drivers will be using them.
If the API uses D3D11-style split texture-view/sampler objects, it's very easy for the driver to support GL-style hardware realities by merging those two objects prior to submitting the draw-call. Vice-versa is also possible, though much harder on the driver.
That's one good reason that APIs should follow the D3D11-style abstraction -- it allows the driver/hardware designers more flexibility in choosing different solutions, while keeping the drivers clean and simple.
Emulating D3D-style APIs on GL-style hardware is dead easy, emulating GL-style APIs on D3D-style hardware is complex and/or slow.
Modern AMD hardware is D3D-style. nVidia hardware is still leaning towards GL-style. An API that's optimal everywhere should expose the abstraction that's easy to use on both sets of hardware.
2- AMD GCN is a fully bindless architecture. His "AMD GL Non-bindless" and "AMD DX Non-bindless" examples are actually bindless examples - examples of how the D3D/GL drivers themselves are internally using bindless for you, when you're still use these old non-bindless APIs.
There's no "texture registers" like in the nVidia non-bindless examples; the descriptors being loaded are the actual guts of ShaderResoruceView/SamplerState objects, not handles(slots) to special registers containing that information. These Views/States always have to be loaded into SGPRs for use, and you've got a generous number of SGPRs such that it's not really a problem (VGPR pressure is usually a bigger problem, bottlenecking your occupancy).
So, these examples are showing how a slot-based DX/GL API would be emulated on bindless hardware already... and they're also showing how a bindless API would work on bindless hardware!
The "AMD GL Bindless" example is then how to pointlessly emulate nVidia-style indirection into a common register table, which isn't how I'd implement a bindless API on top of this hardware...
3- He mentions that S_LOAD_DWORD can either load one Texture/Sampler descriptor, or, that if you pair them it can also load the pair in one go (his conclusion: you may as well just pair them all the time).
3.a - Firstly on this, optimizing for scalar instruction counts is really scraping the bottom of the barrel -- he theorizes it might be helpful when occupancy is so low that the hardware can't dual issue any more... but if you're in that situation, you're going to have terrible performance across the board (no memory latency hiding), so you should instead be optimizing to get occupancy back up. If you can't do that, then with such low occupancy you're probably suffering from the horrible latency on your vector loads (64x wider than your scalar loads), so you'd probably want to optimize them first too...
3.b - S_LOAD_DWORD can actually load 1-16 DWORDS though, not just 4/8. So if you have a descriptor table like:
struct Table { 
  SamplerDesc s0;
  TextureDesc t0, t1, t2;
^ then you can load all 4 of those descriptors with one load instruction, as the table size is 16 DWORDS.
If we convert that table to his GL version, where textures/samplers are always paired...
struct Table { 
  SamplerDesc s0; TextureDesc t0;
  SamplerDesc s1; TextureDesc t1;
  SamplerDesc s2; TextureDesc t2;
...then we need two load instructions as the table size is now 24 DWORDS... The opposite of what he claimed is true -- always pairing your samplers/textures actually results in more load instructions.
3.c- He mentions in the 'D3D' case that you need one load for each texture, plus one load every time a sampler is used for the first time. As shown above in 3.b, this just isn't true, but even if we assumed it was (and that S_LOAD_DWORD only loads either 4 or 8 DWORDS), it's still possible for him to apply his 'GL' optimization here!
e.g. A shader with two textures and one sampler might produce a table like below:
struct Table {
  TextureDesc t0; // Lottes fetch #0  // Alternate: fetch #0  // Reality: fetch #0
  SamplerDesc s0; // Lottes fetch #1  // Alternate: fetch #0  // Reality: fetch #0
  TextureDesc t1; // Lottes fetch #2  // Alternate: fetch #1  // Reality: fetch #0
He says that you'd require a load for t0 plus a load for s0 (as it's being used for the first time), then later you'd need another load for t1.
Applying his 'GL' optimization, you'd get a single double-sized load for t0+s0 in one instruction, then later another load for t1.
But as above, in reality you could load that whole table with one instruction anyway...
3.d - Even if we're optimizing to minimize peak scalar-GPR usage, as well as optimizing for scalar instruction count, the D3D-style gives the driver way more options.
Say we've got three textures that all use one sampler.
The shader compiler may decide that it doesn't want to keep around the SGPR data for the sampler all the time. In that case, the driver can choose to waste memory/bandwidth and duplicate the sampler-desc - producing the "GL-style" example with paired textures/samplers and three memory fetches to get them into SGPRs:
struct Table {
  TextureDesc t0;       //fetch #0
  SamplerDesc s0;       //fetch #0
  TextureDesc t1;       //fetch #1
  SamplerDesc s0_clone; //fetch #1
  TextureDesc t2;       //fetch #2
  SamplerDesc s0_clone2;//fetch #2
Or it can get fancy by realizing it doesn't need s0_clone, as t1 is actually still contiguous to s0!
struct Table {
  TextureDesc t0;      //fetch #0
  SamplerDesc s0;      //fetch #0 & fetch #1
  TextureDesc t1;      //           fetch #1
  TextureDesc t2;      //fetch #2
  SamplerDesc s0_clone;//fetch #2
Or maybe the shader compiler decides that it can keep s0 around in SGPR's, but only between the usage of t0/t1, and that it will have to be re-fetched later for t2. In that case, if we're still also optimizing for scalar instruction count, the driver can produce:
struct Table {
  TextureDesc t0;      //fetch #0
  SamplerDesc s0;      //fetch #0
  TextureDesc t1;      //fetch #0
  TextureDesc t2;      //fetch #1
  SamplerDesc s0_clone;//fetch #1
It doesn't have to be one or the other. The driver is free to use hybrids between the GL-style and D3D-style examples.
If the high level API has already merged samplers/textures into one object, the driver is robbed of this flexibility (unless it wants to do the complex stuff from my last post, of finding unique sets, etc).

4- he mentions this crucial detail but then doesn't factor it into his examples:
"As can be seen in the AMD programming guide under "SGPR Initialization", up to 16 scalars can be pre-loaded before shader start."
Let's say we have a simple shader with: one cbuffer, one sampler, two textures.
That's a descriptor table like:
struct Table { 
  BufferDesc b0;
  SamplerDesc s0;
  TextureDesc t0, t1;
As this descriptor table is exactly 16 DWORDS in size, we get it pre-loaded for free, which removes all the S_LOAD_DWORD instructions from all of his examples.
If we used his GL example though where we always pair up our Textures/Samplers, our table flows past the 16 DWORD limit, so we have to split it in two structures now:
struct TableBase { 
  BufferDesc b0;
  SamplerDesc s0;
  TextureDesc t0;
  void* extra;
  //2 spare DWORDS, could have another void* if required
struct TableExtra
  SamplerDesc s1;
  TextureDesc t1;
We then still get the data in TableBase loaded for free, and we can load all the data from TableExtra with a single S_LOAD_DWORD instruction.
For more complex shaders, we end with this general rule of thumb:
descriptorCount = numBuffers + numTextures + numSamplers;
if( descriptorCount <= 4 )
  loadsRequired = 0;
  loadsRequired = ceil( (descriptorCount-3)/4 );