• 9
• 13
• 9
• 18
• 19
• ### Similar Content

• By cozzie
Hi all,
As a part of the debug drawing system in my engine,  I want to add support for rendering simple text on screen  (aka HUD/ HUD style). From what I've read there are a few options, in short:
1. Write your own font sprite renderer
2. Using Direct2D/Directwrite, combine with DX11 rendertarget/ backbuffer
3. Use an external library, like the directx toolkit etc.
I want to go for number 2, but articles/ documentation confused me a bit. Some say you need to create a DX10 device, to be able to do this, because it doesn't directly work with the DX11 device.  But other articles tell that this was 'patched' later on and should work now.
Can someone shed some light on this and ideally provide me an example or article on  how to set this up?
All input is appreciated.
• By stale
I've just started learning about tessellation from Frank Luna's DX11 book. I'm getting some very weird behavior when I try to render a tessellated quad patch if I also render a mesh in the same frame. The tessellated quad patch renders just fine if it's the only thing I'm rendering. This is pictured below:
'
However, when I attempt to render the same tessellated quad patch along with the other entities in the scene (which are simple triangle-lists), I get the following error:

I have no idea why this is happening, and google searches have given me no leads at all. I use the following code to render the tessellated quad patch:
for (unsigned int i = 0; i < scene->GetEntityList()->size(); i++) { Entity* entity = scene->GetEntityList()->at(i); if (entity->m_VisualComponent->m_visualType == VisualType::MESH) DrawMeshEntity(entity, cam, sun, point); else if (entity->m_VisualComponent->m_visualType == VisualType::BILLBOARD) DrawBillboardEntity(entity, cam, sun, point); else if (entity->m_VisualComponent->m_visualType == VisualType::TERRAIN) DrawTerrainEntity(entity, cam); } HR(m_swapChain->Present(0, 0)); Any help/advice would be much appreciated!

• Am trying a basebones tessellation shader and getting unexpected result when increasing the tessellation factor. Am rendering a group of quads and trying to apply tessellation to them.
OutsideTess = (1,1,1,1), InsideTess= (1,1)

OutsideTess = (1,1,1,1), InsideTess= (2,1)

I expected 4 triangles in the quad, not two. Any idea of whats wrong?
Structs:
struct PatchTess { float mEdgeTess[4] : SV_TessFactor; float mInsideTess[2] : SV_InsideTessFactor; }; struct VertexOut { float4 mWorldPosition : POSITION; float mTessFactor : TESS; }; struct DomainOut { float4 mWorldPosition : SV_POSITION; }; struct HullOut { float4 mWorldPosition : POSITION; }; Hull shader:
PatchTess PatchHS(InputPatch<VertexOut, 3> inputVertices) { PatchTess patch; patch.mEdgeTess[ 0 ] = 1; patch.mEdgeTess[ 1 ] = 1; patch.mEdgeTess[ 2 ] = 1; patch.mEdgeTess[ 3 ] = 1; patch.mInsideTess[ 0 ] = 2; patch.mInsideTess[ 1 ] = 1; return patch; } [domain("quad")] [partitioning("fractional_odd")] [outputtopology("triangle_ccw")] [outputcontrolpoints(4)] [patchconstantfunc("PatchHS")] [maxtessfactor( 64.0 )] HullOut hull_main(InputPatch<VertexOut, 3> verticeData, uint index : SV_OutputControlPointID) { HullOut ret; ret.mWorldPosition = verticeData[index].mWorldPosition; return ret; }
[domain("quad")] DomainOut domain_main(PatchTess patchTess, float2 uv : SV_DomainLocation, const OutputPatch<HullOut, 4> quad) { DomainOut ret; const float MipInterval = 20.0f; ret.mWorldPosition.xz = quad[ 0 ].mWorldPosition.xz * ( 1.0f - uv.x ) * ( 1.0f - uv.y ) + quad[ 1 ].mWorldPosition.xz * uv.x * ( 1.0f - uv.y ) + quad[ 2 ].mWorldPosition.xz * ( 1.0f - uv.x ) * uv.y + quad[ 3 ].mWorldPosition.xz * uv.x * uv.y ; ret.mWorldPosition.y = quad[ 0 ].mWorldPosition.y; ret.mWorldPosition.w = 1; ret.mWorldPosition = mul( gFrameViewProj, ret.mWorldPosition ); return ret; }
Any ideas what could be wrong with these shaders?
• By simco50
Hello,
I've stumbled upon Urho3D engine and found that it has a really nice and easy to read code structure.
I think the graphics abstraction looks really interesting and I like the idea of how it defers pipeline state changes until just before the draw call to resolve redundant state changes.
This is done by saving the state changes (blendEnabled/SRV changes/RTV changes) in member variables and just before the draw, apply the actual state changes using the graphics context.
It looks something like this (pseudo):
void PrepareDraw() { if(renderTargetsDirty) { pD3D11DeviceContext->OMSetRenderTarget(mCurrentRenderTargets); renderTargetsDirty = false } if(texturesDirty) { pD3D11DeviceContext->PSSetShaderResourceView(..., mCurrentSRVs); texturesDirty = false } .... //Some more state changes } This all looked like a great design at first but I've found that there is one big issue with this which I don't really understand how it is solved in their case and how I would tackle it.
I'll explain it by example, imagine I have two rendertargets: my backbuffer RT and an offscreen RT.
Say I want to render my backbuffer to the offscreen RT and then back to the backbuffer (Just for the sake of the example).
You would do something like this:
//Render to the offscreen RT pGraphics->SetRenderTarget(pOffscreenRT->GetRTV()); pGraphics->SetTexture(diffuseSlot, pDefaultRT->GetSRV()) pGraphics->DrawQuad() pGraphics->SetTexture(diffuseSlot, nullptr); //Remove the default RT from input //Render to the default (screen) RT pGraphics->SetRenderTarget(nullptr); //Default RT pGraphics->SetTexture(diffuseSlot, pOffscreenRT->GetSRV()) pGraphics->DrawQuad(); The problem here is that the second time the application loop comes around, the offscreen rendertarget is still bound as input ShaderResourceView when it gets set as a RenderTargetView because in Urho3D, the state of the RenderTargetView will always be changed before the ShaderResourceViews (see top code snippet) even when I set the SRV to nullptr before using it as a RTV like above causing errors because a resource can't be bound to both input and rendertarget.
What is usually the solution to this?

Thanks!
• By MehdiUBP
Hello,
I wrote a MatCap shader following this idea:
Given the image representing the texture, we compute the sample point by taking the dot product of the vertex normal and the camera position and remapping this to [0,1].
This seems to work well when I look straight at an object with this shader. However, in cases where the camera points slightly on the side, I can see the texture stretch a lot.
Could anyone give me a hint as how to get a nice matcap shader ?
Here's what I wrote:

{
Properties
{
_MainTex ("Texture", 2D) = "white" {}
}
{
Tags { "RenderType"="Opaque" }
LOD 100
Pass
{
CGPROGRAM
#pragma vertex vert
#pragma fragment frag
// make fog work

#include "UnityCG.cginc"
struct appdata
{
float4 vertex : POSITION;
float3 normal : NORMAL;
};
struct v2f
{
float2 worldNormal : TEXCOORD0;
float4 vertex : SV_POSITION;
};
sampler2D _MainTex;
v2f vert (appdata v)
{
v2f o;
o.vertex = UnityObjectToClipPos(v.vertex);
o.worldNormal = mul((float3x3)UNITY_MATRIX_V, UnityObjectToWorldNormal(v.normal)).xy*0.3 + 0.5;  //UnityObjectToClipPos(v.normal)*0.5 + 0.5;
return o;
}

fixed4 frag (v2f i) : SV_Target
{
// sample the texture
fixed4 col = tex2D(_MainTex, i.worldNormal);
// apply fog
return col;
}
ENDCG
}
}
}

Thanks!

# DX11 DX11 - Swap Chain - Slow Engine

This topic is 1772 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi guys,

PS. FPS I GOT IN MY ENGINE: 50 - 70 fps, res: 1080x1920, system can run Crysis 3 at high settings, so that's not the problem.

Scene:

right now I'm facing another issue, which is the performance of my Engine. Now the problem is not my architecture, well at least I don't think so, so I tried running a Profiler.

Most of the performance was taken off in the initialization of course, but then later on in the render loop then rendering of the depth, normal, diffuse maps are quite fast, relatively to other things. Now the real problem comes when the swapchain->present(0,0) is called. As i understand it takes a long time because it's waiting for the previous frame to finish, right?

I have head some rumors about that if statements in a shader are quite weird, because even though it is a negative statement, false, it will still semi-run/check all the contents inside the if statement, which makes it quite slow, is this true or just some rubbish?

Also my FPS is NOT affected if I change my regular shader for all the objects (tried to remove all so only the positions were calculated in world space and then returned a white color), but then if I changed my Post Processing shader to a very simple version which ONLY returns the diffuse map, my FPS is boosted 2x - 3x times! But why? (Shader is below if you're interested, but there are still errors).

Texture2D t_dffmap : register(t0);
Texture2D t_depthmap : register(t1);
Texture2D t_normalmap : register(t2);
Texture2D t_random : register(t3);
Texture2D t_blmextract : register(t4);
Texture2D t_megaparticles : register(t5);
Texture2D t_fractalnoise : register(t6);
Texture2D t_softp : register(t7);
Texture2D t_softp_depth : register(t8);
Texture2D t_glowmap : register(t10);
SamplerState ss;

cbuffer PARAMSBUFFER : register(b0)
{
float time;
float blur;
float bloomExtract;
float bloom;
float pixelDisortion;
float pixelDisorterAmount;
float ssao;
float bluramount;
float megaparticles;
float fractalNoise;
float glowmap;

matrix view;
};

cbuffer BloomBuffer : register(b1)
{
float BloomThreshold;
float BloomSaturation;
float BaseSaturation;
float BloomIntensity;
float BaseIntensity;
};

cbuffer SSAOBuffer : register(b2)
{
float g_scale;
float g_bias;
float g_intensity;
float ssaoIterations;
float3 pppspace;
};

cbuffer GODRAYBuffer : register(b3)
{
float3 LightPosition;
matrix WorldViewProjection;

float GOD_Density;
float GOD_Weight;
float GOD_Decay;
float GOD_Exposure;
float GOD_NUM_SAMPLES;
float GodRays;
};

struct VS_Output
{
float4 Pos : SV_POSITION;
float2 Tex : TEXCOORD0;
float2 LightPos : TEXCOORD1;
};

{
VS_Output Output;
Output.Tex = float2((id << 1) & 2, id & 2);
Output.Pos = float4(Output.Tex * float2(2,-2) + float2(-1,1), 0, 1);

return Output;
}

// Helper for modifying the saturation of a color.
{
// The constants 0.3, 0.59, and 0.11 are chosen because the
// human eye is more sensitive to green light, and less to blue.
float grey = dot(color, float3(0.3, 0.59, 0.11));

return lerp(grey, color, saturation);
}

// Ambient Occlusion Stuff --------------------------------------------------

float3 getPosition(in float2 uv)
{
return mul( t_depthmap.Sample(ss, uv).xyz, view);
}

float3 getNormal(in float2 uv)
{
return normalize(t_normalmap.Sample(ss, uv).xyz * 2.0f - 1.0f);
}

float2 getRandom(in float2 uv)
{
return normalize( mul(t_random.Sample(ss, float2(800, 600) * uv / float2(64, 64)).xy * 2.0f - 1.0f, view) );
}

float doAmbientOcclusion(in float2 tcoord,in float2 uv, in float3 p, in float3 cnorm)
{
float3 diff = getPosition(tcoord + uv) - p;
const float3 v = normalize(diff);
const float d = length(diff)*g_scale;
return max(0.0,dot(cnorm,v)-g_bias)*(1.0/(1.0+d))*g_intensity;
}

// End

{
if (bloomExtract == 1)
{
// Look up the original image color.
float4 c = t_dffmap.Sample(ss, input.Tex);

// Adjust it to keep only values brighter than the specified threshold.
return saturate((c - BloomThreshold) / (1 - BloomThreshold));
}

float4 color = float4(1.0f, 1.0f, 1.0f, 1.0f);

if (pixelDisortion == 1)
{
// Distortion factor
float NoiseX = pixelDisorterAmount * (time/1000) * sin(input.Tex.x * input.Tex.y+time/1000);
NoiseX=fmod(NoiseX,8) * fmod(NoiseX,4);

// Use our distortion factor to compute how much it will affect each
// texture coordinate
float DistortX = fmod(NoiseX,5);
float DistortY = fmod(NoiseX,5+0.002);

// Create our new texture coordinate based on our distortion factor
input.Tex = float2(DistortX,DistortY);
}

if (fractalNoise == 1)
{
float offset = saturate((t_fractalnoise.Sample(ss, input.Tex) / 10.0f));

input.Tex += 1 * (t_fractalnoise.Sample(ss, input.Tex).xy - 0.5)/15;
}

float4 dffMAP = t_dffmap.Sample(ss, input.Tex);

if (megaparticles == 1)
{
dffMAP.a = 0.0f;

dffMAP += t_megaparticles.Sample(ss, input.Tex);
}

color = dffMAP;

if(bloom == 1)
{
// Look up the bloom and original base image colors.
float4 cbloom = t_blmextract.Sample(ss, input.Tex);
float4 base = color;

// Adjust color saturation and intensity.
cbloom = AdjustSaturation(cbloom, BloomSaturation) * BloomIntensity;
base = AdjustSaturation(base, BaseSaturation) * BaseIntensity;

// Darken down the base image in areas where there is a lot of bloom,
// to prevent things looking excessively burned-out.
base *= (1 - saturate(cbloom));

// Combine the two images.
color = base + cbloom;
}

if (blur == 1)
{
float hblur[17] = {0, -1, 2, -3, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13, 14, -15, 16};

int i = 0;
for(;i < bluramount;)
{
color += t_dffmap.Sample(ss, input.Tex + float2(0.002f * hblur[i+1],0.002f * hblur));

i++;
}

i++;
color = color / i;
}
{
const int nsamples = 16;

input.Tex -= 0.5;

for(int i=0; i<nsamples; i++)
{
color += t_dffmap.Sample(ss, input.Tex.xy*scale + radial_center );
}
color /= nsamples;
}

if (ssao == 1)
{
// Apply SSAO

const float2 vec[4] = {float2(1,0),float2(-1,0),
float2(0,1),float2(0,-1)};

float3 p = getPosition(input.Tex);
float3 n = getNormal(input.Tex);
float2 rand = getRandom(input.Tex);

float ao = 0.0f;

//**SSAO Calculation**//
int iterations = ssaoIterations;
for (int j = 0; j < iterations; ++j)
{
float2 coord2 = float2(coord1.x*0.707 - coord1.y*0.707,
coord1.x*0.707 + coord1.y*0.707);

ao += doAmbientOcclusion(input.Tex,coord1*0.25, p, n);
ao += doAmbientOcclusion(input.Tex,coord2*0.5, p, n);
ao += doAmbientOcclusion(input.Tex,coord1*0.75, p, n);
ao += doAmbientOcclusion(input.Tex,coord2, p, n);
}
ao/=(float)iterations*4.0;
color.rgb *= ao;
}

// Soft Particles
float pDepth = t_softp_depth.Sample(ss, input.Tex);
float wDepth = t_depthmap.Sample(ss, input.Tex);
float twD = t_depthmap.Sample(ss, input.Tex);

if ((wDepth < pDepth) || twD == 0)
{
float4 pColor = t_softp.Sample(ss, input.Tex);

if (twD != 0)
pColor.a = saturate((wDepth - pDepth) * 1);

color += pColor;
}
// End

if (glowmap == 1)
{
color += t_glowmap.Sample(ss, input.Tex);
}

if (GodRays == 1)
{
float2 deltaTexCoord = (input.Tex - input.LightPos.xy);
deltaTexCoord *= GOD_Density / GOD_NUM_SAMPLES;
//float4 color = t_dffmap.Sample(ss, input.Tex);
float illuminationDecay = 1.0f;
for (int i = 0; i < GOD_NUM_SAMPLES; i++)
{
input.Tex -= deltaTexCoord;
float4 sample = t_dffmap.Sample(ss, input.Tex);
sample *= illuminationDecay * GOD_Weight;
color += sample;
illuminationDecay *= GOD_Decay;
}

return color * GOD_Exposure;
}

return color;
}


Now I understand that in GameDev there are many experienced programmers in graphics, so now I ask them/you, what can speed up an engine, what should I avoid? Tricks?

And that's basically it, thanks for taking interest!

##### Share on other sites

1. You are most obviously fillrate-bound. Post the FPS in lower resolutions (e.g. 640x480).

2. You took a shorter / simpler route and created an uber-shader. That's great during debugging/development, but as you noticed, it has a negative impact.

3. It is very easy to check the impact of conditions. Just create a separate technique/shader pair that will have only single codepath (say - just bloom), without any conditions whatsoever.

4. Make sure VSync is off.  With 52 , it doesn't look like VSync is On, but better safe than sorry...

Of course, as always, there will be 10 other things that impact FPS (many of them on the CPU side - AI, pathfinding, ...) , but let's first address those that have biggest impact.

##### Share on other sites

So is it possible to put all the different effects in one shader?

##### Share on other sites

You can put them in the same hlsl file, as long as you're willing to compile it more than once.

You simply swap tests like "if (blur == 1) { ... }" for "#if defined(BLUR) ... #endif" and compile once with "/D BLUR" on the command line, once with "/D SSAO", etc. You can also set the defines in code if you're compiling shaders at runtime.

You then pick the correct shader to use, instead of setting a constant.

##### Share on other sites

SSAO and god rays can be very pixel heavy effects and I'm guessing you're doing them at full resolution (1920x1080).  Even full screen blurs can put a fair amount of pressure on fill rate at high resolutions.  When you take into account that more than likely every one of your branches is being evaluated even if the conditions are false, this could be adding up to make a very expensive shader.

A lot of these effects are rendered to smaller render targets, such as something like 1/4 size of the backbuffer (experiment with the size to get a good image quality vs. performance trade off).  And as mentioned above, even though it's 2013 we still really need to be using the preprocessor for our branches rather than if statements.  My recommendation would be render SSAO to a small target by itself, then god rays to another small target by themselves, then have your big post process shader at the end composite those effects along with blurs and distortion etc. using #defines to turn effects on and off as needed.

##### Share on other sites

So I need to compile individual shaders for texturing, bump, parallax mapping?

How would I blend these together?

##### Share on other sites

In HLSL, are if's very heavy?

##### Share on other sites

If statements can be cheap in hlsl, if there's no branch involved and the code you execute is simple. For example this if statement is a cheap one:

if (x > 7) x = 7;

They are also free if the condition can be evaluated at compile time.

They get expensive when the extra code that gets executed is significant, because the compiler will generally execute the code anyway and multiply the result by either 0 or 1 depending on the result of the if.

You can also ask the compiler to [branch] instead of evaluating the whole thing and throw away the result. The expense of that depends on things like what pattern of pixels goes down each path, but it can be beneficial if you avoid executing the extra code a lot of the time. While the compiler will sometimes automatically decide to do a real branch, you're best off specifying it yourself as you get extra errors back if if it can't do a branch due to texturing issues (i.e. tex2D vs tex2Dlod).

Your best option when optimizing is to use a tool like GPU Shader Analyzer to see what instructions get generated, as well as profiling the performance yourself, because if statement performance depends on the input data.

##### Share on other sites

Would sending many buffers per draw call slow it down? I know it does, but how much?

##### Share on other sites

Ok, now I've improved the frame rate a bit, so basically what I do is that I have an individual material for each mesh, (a shader) which can be modified by the user on creation. So this also helped me to escape the fixed shadings. Now the only problem is that I need to write a class that can parse any kind of shader with it's needs, because some shaders needs a specific input and some don't, and the class needs to detect that.

And a funny note, whilst doing this I lost some shader data, basically my whole post processing shader, because I closed Visual Studio without undoing, but then I realized that I had a copy here in this forum