Sign in to follow this  

DX11 DX11 Rendering an Object the fastest way

Recommended Posts

Migi0027    4628

Hi guys,

my engine has come to a stage where instancing will be required, so what i do right now is somewhat like this:

void RenderFunc(..)


 If instancing: // This one knows if I'm rendering an instance



Now the thing is that I'm using one render function to render all meshes, is that a bad idea? My full render function looks like this:


void C3DEngineObject::RenderMesh(UMesh &mesh, bool recursive)
	#define Passes      mesh.passes  Passes is just a class containing information for the passes

	ObjectBuffer.AmbientColor = Passes.GlobalMaterial.AmbientColor;
	/////////PREPARE ALPHA SETTINGS////////////
	ObjectBuffer.Alpha = 1.0f;
	ObjectBuffer.Alpha = Passes.getAlpha();
	if (Passes.Lighting)
		ObjectBuffer.Alpha += Lights.Directional.size();

	float alphaUp = 1.0f/ObjectBuffer.Alpha;
	ObjectBuffer.Alpha = 1.0f/ObjectBuffer.Alpha;

	#define AlphaStepUp ObjectBuffer.Alpha += alphaUp

	if (!recursive) // Don't send the buffers if recursive
		mesh.SendBuffers(devcon, dev, Camera, ObjectBuffer, Bufferphase);

	/////////RENDER LIGHTING/////////////////

	if (Passes.Lighting)
		////      For each directional light          ////

		FOREACH (Lights.Directional.safe_size())
			Shaders[7].ApplyShader(dev, devcon);

			// INJECT!

			Shaders[7].UpdateObject(ObjectBuffer, devcon);
			Shaders[7].UpdateWorld(FrameBuffer, devcon);
			//Shaders[7].SendRegBuffer(FrameBuffer, 1, 0, devcon);

			Shaders[7].Render(devcon, Bufferphase);

			// Step up one alpha

	/////////RENDER THE MESH/////////////////

	for (int p = 1; p <= Passes.options; p++) // Get all the passes
		if (Passes.getAt(p) == true) // Is this pass enabled?
			int relative = Passes.getRelative(p); // If so, find the mat id.
			Material* active = getMaterial(relative, Passes); // Find the mat of the id

			active->ApplyMaterial(dev, devcon);

			active->ShaderRef->UpdateObject(ObjectBuffer, devcon); // Send the object and world buffers
			active->ShaderRef->UpdateWorld(FrameBuffer, devcon);

			if (relative == BUMPMAP_SHADER) // Is this bump mapping?
			{ = ObjectBuffer.Final;
				active->ShaderRef->UpdateMatrix(MatrixBuffer, devcon);
				active->ShaderRef->SendTexture(Passes.base->pTexture, 0, devcon);
				active->ShaderRef->SendTexture(Passes.bumpmap->pTexture, 1, devcon);

			if (relative == TEXTURE_SHADER) // Texturing?
				active->ShaderRef->SendTexture(Passes.base->pTexture, 0, devcon);
				//active->ShaderRef->SendTexture(Passes.opacityMap->pTexture, 1, devcon);

			if (mesh.Instanced && !recursive) // Is this mesh instanced
				FOREACH (mesh.InstancedData.size()) // For all instances
					mesh.SendInstancedData(ObjectBuffer, Camera, i, Bufferphase); // Send the location, rotation and scaling of this instance.

					RenderMesh(mesh, true); // Call this function, with recursive true so the origional matrixes wont be re-sent (removing the instance loc,rot and scaling).

			// Render the mesh in diffuse mode
			active->Render(devcon, Bufferphase);

			// Step up one alpha

	if (recursive)




PS. The mesh.passes is just a class with contains the passes(e.g. Texture, shadows, Lighting).

Now if you look at my code, in each pass it checks if it is bump mapping or texturing, but is this a slow method, how could it be improved? Or even better, how can it be improved in the concept of instancing?

Also if you have time, it could be nice if you could post a good way of handling mesh rendering, or an efficient method, as I'm just following my minds advice, and not a professional persons advice.

Also, how much time does it take to check if a bool is true?

Thank You

Edited by Migi0027

Share this post

Link to post
Share on other sites
Jason Z    6434

It seems to me like you are looking for improvements without a strong need in mind, which will make it very hard to find a solution that you are looking for.  Ultimately, you will find that normally the CPU side manipulations account for a fairly small amount of work in the grand scheme of things unless you are using lots and lots of small objects.  However, even in that case, the GPU work being done can often take up more time than the CPU side - meaning that any optimization on the CPU side won't make any difference to the end result framerate.


For your specific question about testing a bool - it is pretty much one of the fastest operations that exist, so you don't need to worry about changing that.  You should invest some time in learning how to use profiling tools (free or otherwise) and also tools like PIX/Graphics Debugger to learn more about the performance of your engine/app.  That will allow you to pinpoint the bottlenecks and make effective use of your optimization time.

Share this post

Link to post
Share on other sites
e.s.    110

Most of the loss of performance is /not/ the draw calls, but a Game Engine's utilization of them... 


From an architectural point of view, the code hint that you may not be implementing standard "Separation of Concerns" patterns, (ensuring that methods/classes don't do more than one thing, but rather responsible for doing a specialized purpose). 


In other words, at this stage, I believe everyone would agree that getting your engine do to what you want is more important than the performance.


For example, you are using a "renderfunc"  and "rendermesh" function names: Okay, I know this sounds a little out of left field for being, nitpicky, but bear with me ... Rather than, RenderScene, -> RenderEachLayout  -> RenderBufferSet -> RenderBufferPartitions ->  or some other rendering step in the context of your Game Engine's OWN Pipeline.


In a game engine, there is /massive/ importance on the efficient management of the associations between Modeling entities, specifically:

1. Scenes to Layouts

2. Layouts To Buffers

3. Buffers to Buffer Partitions, Shaders, etc

4. Model to: (Mesh, Behaviors, Textures, Shaders, etc)

5. Many Textures to Many Models and Meshes

6. Many Meshes to Many Models

7. and the list goes on and on and on and on.


The GPU operations are /fast/ ....  Managing memory, garbage collection, managing collections in memory that are hundreds of megabytes, etc, is one of the biggest game performance hits.  This is where a lot of your optimizations will need to occur.  As horrible as it sounds, use place holder shaders, textures, etc at first, (greyscale gradients, etc), to ensure your Game Engine's Framework is working right.


In other words, pick a game engine feature, set of features that you want, program towards them, and find out if those features are running slow or not, then performance tune them.


Stage 1 Features:

1. Rendering Multiple Instances of the Same Model definition.  (sidewalk squares)

2. Rendering Multiple Instances of the Same Model, but with different Shaders,  (stars, the same car but different color, etc)

3. Rendering Multiple Types of Models, (Tree, Sidewalk Square, etc)

4. Pre-Defined animations for a subset of models, (rotations, movement/translation in a circle pattern, movement along a Bezier, whatever).

5. User Driven Movement,

6. etc.


There are a /lot/ of design patterns that can be used to organize the complexity of the code, so that the application will scale appropriately with the features you want.


You may think that implementing facades, multiton, or factory design patterns are a /lot/ of overhead for a small program, but in actuality this is not the case as compilers optimize and inline what needs to be. 


Still, the point that I am trying to make, the use of Design Patterns, and the avoidance of Anti-Patterns in a game engine is what is going to always be what will determine your game engine's performance story.


Then, there are fancy shmancy GPU render operations... See, the funny thing about fancy shmancy GPU operations is that /only if/ your architecture is "sound" in the first place, can you implement GPU operations on a priority basis, (maybe ensure that that that fancy shmancy model never attempts to get rendered more than once a frame, or maybe once every two frames, (there is a LOT of overrendering in games).


Architectural techniques and optimizations are probably where you want to start off first when facing performance problems, and regardless, you will be able to do more with your engine anyway.

Edited by e.s.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By gsc
      Hi! I am trying to implement simple SSAO postprocess. The main source of my knowledge on this topic is that awesome tutorial.
      But unfortunately something doesn't work... And after a few long hours I need some help. Here is my hlsl shader:
      float3 randVec = _noise * 2.0f - 1.0f; // noise: vec: {[0;1], [0;1], 0} float3 tangent = normalize(randVec - normalVS * dot(randVec, normalVS)); float3 bitangent = cross(tangent, normalVS); float3x3 TBN = float3x3(tangent, bitangent, normalVS); float occlusion = 0.0; for (int i = 0; i < kernelSize; ++i) { float3 samplePos = samples[i].xyz; // samples: {[-1;1], [-1;1], [0;1]} samplePos = mul(samplePos, TBN); samplePos = + samplePos * ssaoRadius; float4 offset = float4(samplePos, 1.0f); offset = mul(offset, projectionMatrix); offset.xy /= offset.w; offset.y = -offset.y; offset.xy = offset.xy * 0.5f + 0.5f; float sampleDepth = tex_4.Sample(textureSampler, offset.xy).a; sampleDepth = vsPosFromDepth(sampleDepth, input.uv).z; const float threshold = 0.025f; float rangeCheck = abs(positionVS.z - sampleDepth) < ssaoRadius ? 1.0 : 0.0; occlusion += (sampleDepth <= samplePos.z + threshold ? 1.0 : 0.0) * rangeCheck; } occlusion = saturate(1 - (occlusion / kernelSize)); And current result:
      I will really appreciate for any advice!
    • By isu diss
       I'm trying to code Rayleigh part of Nishita's model (Display Method of the Sky Color Taking into Account Multiple Scattering). I get black screen no colors. Can anyone find the issue for me?
      #define InnerRadius 6320000 #define OutterRadius 6420000 #define PI 3.141592653 #define Isteps 20 #define Ksteps 10 static float3 RayleighCoeffs = float3(6.55e-6, 1.73e-5, 2.30e-5); RWTexture2D<float4> SkyColors : register (u0); cbuffer CSCONSTANTBUF : register( b0 ) { float fHeight; float3 vSunDir; } float Density(float Height) { return exp(-Height/8340); } float RaySphereIntersection(float3 RayOrigin, float3 RayDirection, float3 SphereOrigin, float Radius) { float t1, t0; float3 L = SphereOrigin - RayOrigin; float tCA = dot(L, RayDirection); if (tCA < 0) return -1; float lenL = length(L); float D2 = (lenL*lenL) - (tCA*tCA); float Radius2 = (Radius*Radius); if (D2<=Radius2) { float tHC = sqrt(Radius2 - D2); t0 = tCA-tHC; t1 = tCA+tHC; } else return -1; return t1; } float RayleighPhaseFunction(float cosTheta) { return ((3/(16*PI))*(1+cosTheta*cosTheta)); } float OpticalDepth(float3 StartPosition, float3 EndPosition) { float3 Direction = normalize(EndPosition - StartPosition); float RayLength = RaySphereIntersection(StartPosition, Direction, float3(0, 0, 0), OutterRadius); float SampleLength = RayLength / Isteps; float3 tmpPos = StartPosition + 0.5 * SampleLength * Direction; float tmp; for (int i=0; i<Isteps; i++) { tmp += Density(length(tmpPos)-InnerRadius); tmpPos += SampleLength * Direction; } return tmp*SampleLength; } static float fExposure = -2; float3 HDR( float3 LDR) { return 1.0f - exp( fExposure * LDR ); } [numthreads(32, 32, 1)] //disptach 8, 8, 1 it's 256 by 256 image void ComputeSky(uint3 DTID : SV_DispatchThreadID) { float X = ((2 * DTID.x) / 255) - 1; float Y = 1 - ((2 * DTID.y) / 255); float r = sqrt(((X*X)+(Y*Y))); float Theta = r * (PI); float Phi = atan2(Y, X); static float3 Eye = float3(0, 10, 0); float ViewOD = 0, SunOD = 0, tmpDensity = 0; float3 Attenuation = 0, tmp = 0, Irgb = 0; //if (r<=1) { float3 ViewDir = normalize(float3(sin(Theta)*cos(Phi), cos(Theta),sin(Theta)*sin(Phi) )); float ViewRayLength = RaySphereIntersection(Eye, ViewDir, float3(0, 0, 0), OutterRadius); float SampleLength = ViewRayLength / Ksteps; //vSunDir = normalize(vSunDir); float cosTheta = dot(normalize(vSunDir), ViewDir); float3 tmpPos = Eye + 0.5 * SampleLength * ViewDir; for(int k=0; k<Ksteps; k++) { float SunRayLength = RaySphereIntersection(tmpPos, vSunDir, float3(0, 0, 0), OutterRadius); float3 TopAtmosphere = tmpPos + SunRayLength*vSunDir; ViewOD = OpticalDepth(Eye, tmpPos); SunOD = OpticalDepth(tmpPos, TopAtmosphere); tmpDensity = Density(length(tmpPos)-InnerRadius); Attenuation = exp(-RayleighCoeffs*(ViewOD+SunOD)); tmp += tmpDensity*Attenuation; tmpPos += SampleLength * ViewDir; } Irgb = RayleighCoeffs*RayleighPhaseFunction(cosTheta)*tmp*SampleLength; SkyColors[DTID.xy] = float4(Irgb, 1); } }  
    • By amadeus12
      I made my obj parser
      and It also calculate tagent space for normalmap.
      it seems calculation is wrong..
      any good suggestion for this?
      I can't upload my pics so I link my question.
      and I uploaded my code here

    • By Alessandro Pozzer
      Hi guys, 

      I dont know if this is the right section, but I did not know where to post this. 
      I am implementing a day night cycle on my game engine and I was wondering if there was a nice way to interpolate properly between warm colors, such as orange (sunset) and dark blue (night) color. I am using HSL format.
      Thank  you.
    • By thefoxbard
      I am aiming to learn Windows Forms with the purpose of creating some game-related tools, but since I know absolutely nothing about Windows Forms yet, I wonder:
      Is it possible to render a Direct3D 11 viewport inside a Windows Form Application? I see a lot of game editors that have a region of the window reserved for displaying and manipulating a 3D or 2D scene. That's what I am aiming for.
      Otherwise, would you suggest another library to create a GUI for game-related tools?
      I've found a tutorial here in gamedev that shows a solution:
      Though it's for D3D9, I'm not sure if it would work for D3D11?
  • Popular Now