• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jiraya
      For a 2D game, does using a float2 for position increases performance in any way?
      I know that in the end the vertex shader will have to return a float4 anyway, but does using a float2 decreases the amount of data that will have to be sent from the CPU to the GPU?
    • By ucfchuck
      I am feeding in 16 bit unsigned integer data to process in a compute shader and i need to get a standard deviation.
      So I read in a series of samples and push them into float arrays
      float vals1[9], vals2[9], vals3[9], vals4[9]; int x = 0,y=0; for ( x = 0; x < 3; x++) { for (y = 0; y < 3; y++) { vals1[3 * x + y] = (float) (asuint(Input1[threadID.xy + int2(x - 1, y - 1)].x)); vals2[3 * x + y] = (float) (asuint(Input2[threadID.xy + int2(x - 1, y - 1)].x)); vals3[3 * x + y] = (float) (asuint(Input3[threadID.xy + int2(x - 1, y - 1)].x)); vals4[3 * x + y] = (float) (asuint(Input4[threadID.xy + int2(x - 1, y - 1)].x)); } } I can send these values out directly and the data is as expected

      Output1[threadID.xy] = (uint) (vals1[4] ); Output2[threadID.xy] = (uint) (vals2[4] ); Output3[threadID.xy] = (uint) (vals3[4] ); Output4[threadID.xy] = (uint) (vals4[4] ); however if i do anything to that data it is destroyed.
      If i add a
      vals1[4] = vals1[4]/2; 
      or a
      vals1[4] = vals[1]-vals[4];
      the data is gone and everything comes back 0.
      How does one go about converting a uint to a float and performing operations on it and then converting back to a rounded uint?
    • By fs1
      I have been trying to see how the ID3DInclude, and how its methods Open and Close work.
      I would like to add a custom path for the D3DCompile function to search for some of my includes.
      I have not found any working example. Could someone point me on how to implement these functions? I would like D3DCompile to look at a custom C:\Folder path for some of the include files.
    • By stale
      I'm continuing to learn more about terrain rendering, and so far I've managed to load in a heightmap and render it as a tessellated wireframe (following Frank Luna's DX11 book). However, I'm getting some really weird behavior where a large section of the wireframe is being rendered with a yellow color, even though my pixel shader is hard coded to output white. 

      The parts of the mesh that are discolored changes as well, as pictured below (mesh is being clipped by far plane).

      Here is my pixel shader. As mentioned, I simply hard code it to output white:
      float PS(DOUT pin) : SV_Target { return float4(1.0f, 1.0f, 1.0f, 1.0f); } I'm completely lost on what could be causing this, so any help in the right direction would be greatly appreciated. If I can help by providing more information please let me know.
    • By evelyn4you
      i try to implement voxel cone tracing in my game engine.
      I have read many publications about this, but some crucial portions are still not clear to me.
      At first step i try to emplement the easiest "poor mans" method
      a.  my test scene "Sponza Atrium" is voxelized completetly in a static voxel grid 128^3 ( structured buffer contains albedo)
      b. i dont care about "conservative rasterization" and dont use any sparse voxel access structure
      c. every voxel does have the same color for every side ( top, bottom, front .. )
      d.  one directional light injects light to the voxels ( another stuctured buffer )
      I will try to say what i think is correct ( please correct me )
      GI lighting a given vertecie  in a ideal method
      A.  we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie
      B.  we would take into account every occluder ( which is very much work load) and sample the color from the hit point.
      C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays
      Voxel GI lighting
      In priciple we want to do the same thing with our voxel structure.
      Even if we would know where the correct hit points of the vertecie are we would have the task to calculate the weighted sum of many voxels.
      Saving time for weighted summing up of colors of each voxel
      To save the time for weighted summing up of colors of each voxel we build bricks or clusters.
      Every 8 neigbour voxels make a "cluster voxel" of level 1, ( this is done recursively for many levels ).
      The color of a side of a "cluster voxel" is the average of the colors of the four containing voxels sides with the same orientation.

      After having done this we can sample the far away parts just by sampling the coresponding "cluster voxel with the coresponding level" and get the summed up color.
      Actually this process is done be mip mapping a texture that contains the colors of the voxels which places the color of the neighbouring voxels also near by in the texture.
      Cone tracing, howto ??
      Here my understanding is confus ?? How is the voxel structure efficiently traced.
      I simply cannot understand how the occlusion problem is fastly solved so that we know which single voxel or "cluster voxel" of which level we have to sample.
      Supposed,  i am in a dark room that is filled with many boxes of different kind of sizes an i have a pocket lamp e.g. with a pyramid formed light cone
      - i would see some single voxels near or far
      - i would also see many different kind of boxes "clustered voxels" of different sizes which are partly occluded
      How do i make a weighted sum of this ligting area ??
      e.g. if i want to sample a "clustered voxel level 4" i have to take into account how much per cent of the area of this "clustered voxel" is occluded.
      Please be patient with me, i really try to understand but maybe i need some more explanation than others
      best regards evelyn
  • Advertisement
  • Advertisement
Sign in to follow this  

DX11 ShaderPack in renderer design

This topic is 469 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all,

I've been making nice progress in abstracting/ creating my new renderer.

At this point I'm adding ShaderPack's for rendering (a pack is a combination of a PS, VR, GS etc., and a sef of defines/macro's etc.


My question is if you'd say I'm on the right track of handling the following abstraction.

This is the idea I want to implement:


1. IShaderPack is the base shaderpack class

2. DX11ShaderPack is derived and contains DX11 API objects for the shaders

3. IShaderMgr is the base shader manager class

(where all IShaderPacks are stored, current shader index is stored, returns const ref to current IShaderPack etc.)

4. IShaderMgr will get a virtual 'SetShaderPack' function.

5. DX11ShaderMgr will implement this function and handle inputlayouts in the background, bind PS, VS etc. to the device etc.

6. My main renderer class will have a IShaderMgr object (if dx11 is used it will 'live' as a new *D3DX11ShaderMgr

7. Main renderer function will get a public 'SetShaderPack' method for the frontend, only taking a GUID/ handle (known in asset data).

This function will simply 'forward' to call to the IShaderMgr object in the renderer, containing this same method (SetShaderPack)


This way all implementation is hidden for the frontend and the main renderer class 'SetShaderPack' function only has to call the IShaderMgr->SetShaperPack function. Which can be completely different depending on the API, but this is then hidden from the renderer.


What do you think about this approach and/or how would you do it differently?

(btw, I've already implemented step 1 and 2, working like a charm)

Share this post

Link to post
Share on other sites

Sounds reasonable. I have a few notes, but they are pretty minor suggestions, and could mostly be left as 'later, if needed' changes.


- What does 'current shader index' mean? If you intend to thread things later, be careful about global state. If this is just internal to the renderer -- the current shader while recording its internal D3D11 cmd list -- that's fine.


- If you don't need rendering backends to be swappable at runtime, you can bypass virtuals. Your different shader backends can just each implement a single non-virtual SetShaderPack, and only the one that a given implementation needs gets compiled in, either by using ifdef or putting them in separate files and only compiling the one needed.


- Whether shader stages need to be tightly coupled is dependent on the backend and which stages are in use, it might be a good idea to do some kind of de-duplication (and avoid unnecessary platform BindShader calls) for shader packs that share some stages, for example you might use the same vertex shader with many different pixel shaders.

Edited by ShaneYCG

Share this post

Link to post
Share on other sites

At the low level you've got individual shader programs (or packs of programs as you call them), but usually at a higher level, engines will have some kind of larger / more abstract shader object.

e.g. FX/CgFX has 'Effects' (high level shader pack), which have 'Techniques' (for different purposes - transparent/forward opaque/deferred gbuffer/shadow map/etc), which have 'Passes' (your low level shader packs).

Or Unity has 'Shaders' with 'SubShaders' with 'Passes'.


In my engine I have Techniques (high level shader pack), which have Passes (for different purposes), which have Permutations (defines on/off).

The user of the API only binds a technique. The pass is chosen by a different object that also holds the render-target pointers. The user can also set "shader options", which are used to automatically select the appropriate permutation from a pass internally.


If you go with your low-level 'packs' instead of high level ones, then in my experience, at some point you'll have to build a high level shader system on top of it.


I personally prefer this system to be a part of the core rendering API rather than built on top of it because it often ends up cleaner for the user. In one engine that I've used which didn't do this cleanly, we ended up with routines that would: loop over every object, swap it's shader for a shadow map shader while remembering the original, render the shadow map pass, loop over every object restoring its original shader, render the gbuffer pass, etc...



I talked to you a bit on the chat about input layouts - there's two main approaches:

* hard code the way that vertex attributes are stored in memory. Each VS then is paired with one IL that maps the VS inputs to that storage structure.

* support multiple different vertex structures. Each VS is then paired with a collection of IL's, one for each vertex structure that's compatible with the VS inputs.


For a high level game renderer, the first option is perfectly fine. For a flexible rendering API that can be used to implement any kind of effect/pipeline, then the second one becomes more important.

Share this post

Link to post
Share on other sites

Thanks both.
In the meantime I've implemented most of this. The inputlayouts are a pain in the *ss though );

The universal struct for input Vertex attributes works fine and lets me reconstruct the d3d11 element descs easily, which helps with the abstraction.

Regarding the inputlayout ID I've found a solution too. I store a MD5 checksum of the input attributes in the IShaderMgr, which in case of dx11 is the checksum of the d3d11 element descs. This I can simply compare to the result of GetVSInputChecksum of the IShaderPack (which in case of DX11ShaderPack is also the checksum of the d3d11 element descs.

What's left is that I still need to pass the VS shaderblob to be able to create the inputlayout. But this is too dx11 specific. I've read one can create a dummy VS when creating the inputlayout, using the VS input I need. Do you have an example on how I could do that?

Last but not least, comparing 16 char strings to select the inputlayout sounds less efficient. Would it be possible to convert these strings to a uint somehow? (Without loosing the unique identification, ie abcdef shouldn't give the same value as fedcba)

Edited by cozzie

Share this post

Link to post
Share on other sites

which in case of dx11 is the checksum of the d3d11 element descs

One pitfall to look out for here -- the standard way to initialize a structure in C++ is with:

FOO_DESC desc = {}; //initialize to zero efficiently

desc.bar = 42;

But this won't necessarily zero out any padding bytes within the structure. If you're going to be hashing these structs, you need to ensure the padding bytes hold consistent values. So you have to use the heavyweight version:

FOO_DESC desc;

memset( &desc, 0, sizeof(desc) );SecureZeroMemory( &desc, sizeof(desc) );

desc.bar = 42;

I've read one can create a dummy VS when creating the inputlayout, using the VS input I need. Do you have an example on how I could do that?

For every particular set of vertex input attributes used by a shader, my toolchain spits out a dummy hlsl files like below, which is compiled into a dummy shader binary to be used during IL creation at runtime:

// Hash: A54D7D16
struct colored2LightmappedVertex
  float3 position : POSITION0;
  float4 color : COLOR0;
  float4 color2 : COLOR1;
  float2 texcoord : TEXCOORD0;
  float2 texcoord2 : TEXCOORD1;
  float3 normal : NORMAL0;
  float3 tangent : TANGENT0;
float4 vs_test_colored2LightmappedVertex( colored2LightmappedVertex inputs ) : SV_POSITION
  float4 hax = (float4)0;
  hax += (float4)(float)inputs.position; hax += (float4)(float)inputs.color; hax += (float4)(float)inputs.color2; hax += (float4)(float)inputs.texcoord; hax += (float4)(float)inputs.texcoord2; hax += (float4)(float)inputs.normal; hax += (float4)(float)inputs.tangent;  return hax;

The 'hax' variable just makes sure that the HLSL compiler doesn't optimize out any of the input variables.


As long as the actual VS used alongside the IL does actually have inputs that match the dummy shader, then everything works fine. If your dummy shader inputs and actual shader inputs differ, you get undefined behavior. 

comparing 16 char strings to select the inputlayout sounds less efficient. Would it be possible to convert these strings to a uint somehow?

Well 16 chars is 4 uints :)
You should use a different hash function than MD5. MD5 was originally designed to be a cryptographic hash designed to detect file tampering -- you don't need something that strong. I currently use FNV32a for most things like this, but check out this overview of many choices: http://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/

Share this post

Link to post
Share on other sites

Thanks, this helps a lot.

Maybe a stupid question, but do you just create a temporary ASCII HLSL file? (using C++ std IO libraries).

I assume you remove the file after the IL is created.


I'll also look into the hash functions. In the end I would hope to have something else then a string, because a string compare will always be slower then some number (Disclaimer: this is an assumption, not profiled :))

Share this post

Link to post
Share on other sites

Maybe a stupid question, but do you just create a temporary ASCII HLSL file? (using C++ std IO libraries). I assume you remove the file after the IL is created.

I do this as part of my toolchain, not the engine, so yeah I write a hlsl file to disc from the C# tool code and then launch an FXC.exe process to compile it into a bytecode file.
If you're doing this at runtime, there's likely no need to touch the disc. You can use the API to compile HLSL from memory, to an in-memory bytecode blob IIRC. You also can just use any of your real vertex shaders that happen to use the particular vertex structure that's appropriate. This fake VS code idea just lets you create all your IL's up front, without any dependencies on the shader system / shader loading.

Edited by Hodgman

Share this post

Link to post
Share on other sites

With some bumps and reviewing on the chat, I've managed to solve it.

The solution follows the following principles:

- the IShaderPack class does no longer have DX11 specific stuff in it

-- thus, the DX11ShaderManager, inherited from IShaderMgr contains IShaderPacks

- the IShaderPack no longer exposes a void* for the VS shader blob

-- instead is has 3 new generic const getters: for the VS filename, VS entrypoint and shadertarget

- both the DX11 elementDesc's and generic VtxAttributes are no longer stored in a class (just temporaries)

-- I even got rid of the generic VtxAttributes struct completely (because I don't need them for anything else yet)


To be able to achieve this, I've implemented some DX11 helper functions (pseudo code, because I'm at work :)):

std::vector<D3D11_INPUT_ELEMENTDESC> CreateDX11ElementDescs(ID3DBlob *pVsShaderBlob)
ID3D11InputLayout* CreateDX11InputLayout(const std::string &pVSFilename, const std::string &pVSEntryPoint, const std::string &pShaderTarget);
std::string CreateDX11VSInputChecksum(ID3DBlob *pVsShaderBlob);

What I basically do know:


- For each shader:

-- Compile and createshader

-- within scope directly create the VS input checksum (hash over the DX11 descs, which are created in the helper using the other helper).

This is easy/ convenient because here I already have the VS shaderblob around.


- Find unique inputlayouts by iterating over the checksums


- For each unique inputlayout

-- call helper CreateDX11InputLayout, i.e.

recompile shader using vsfilename, entrypoint and target

-- create temporary/ local scope element descs using the DX11 helper

-- Create DX11 IL using shaderblob and DX11 descs in local scope


I'm actually quite happy with how the solution grew to how it is now.


Another/ last change I'm gonna do here, is get rid of the mCurrentInputLayoutId within the IShaderPack, because it's too DX11'ish :) I plan to do this, by changing the SetShaderPack function of the manager, from: GetInputLayoutId from IShaderPack, to simply comparing the GetVSInputChecksum with the checksums that are stored in a vector within the Manager. This currently is a bit expensive (MD5 checksums), but in combination with moving the xxHash32, this should work fine (I think, because then it's no longer a string compera of 16 char's but a number comparison). 


@Hodgman/others: any last thoughts/ remarks?


(Ps.; next step is constant buffers :cool:)

Edited by cozzie

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement