Sign in to follow this  
Migi0027

DX11 DX11 Rendering an Object the fastest way

Recommended Posts

Hi guys,

my engine has come to a stage where instancing will be required, so what i do right now is somewhat like this:

void RenderFunc(..)
{
 renderAllLights

 RenderDiffuse

 If instancing: // This one knows if I'm rendering an instance
   RenderFunc(instance);
}

 

 

Now the thing is that I'm using one render function to render all meshes, is that a bad idea? My full render function looks like this:

 

void C3DEngineObject::RenderMesh(UMesh &mesh, bool recursive)
{
	#define Passes      mesh.passes  Passes is just a class containing information for the passes

	ObjectBuffer.AmbientColor = Passes.GlobalMaterial.AmbientColor;
	
	/////////PREPARE ALPHA SETTINGS////////////
	ObjectBuffer.Alpha = 1.0f;
	
	ObjectBuffer.Alpha = Passes.getAlpha();
	if (Passes.Lighting)
	{
		ObjectBuffer.Alpha += Lights.Directional.size();
		BlendStateManager.AlphaPass(devcon);
	}

	float alphaUp = 1.0f/ObjectBuffer.Alpha;
	ObjectBuffer.Alpha = 1.0f/ObjectBuffer.Alpha;

	#define AlphaStepUp ObjectBuffer.Alpha += alphaUp

	if (!recursive) // Don't send the buffers if recursive
		mesh.SendBuffers(devcon, dev, Camera, ObjectBuffer, Bufferphase);

	/////////RENDER LIGHTING/////////////////

	if (Passes.Lighting)
	{
		//////////////////////////////////////////////////
		////      For each directional light          ////
		//////////////////////////////////////////////////

		FOREACH (Lights.Directional.safe_size())
		{
			Shaders[7].ApplyShader(dev, devcon);

			// INJECT!
			Lights.Directional[i].Inject(FrameBuffer);

			Shaders[7].UpdateObject(ObjectBuffer, devcon);
			Shaders[7].UpdateWorld(FrameBuffer, devcon);
			//Shaders[7].SendRegBuffer(FrameBuffer, 1, 0, devcon);

			Shaders[7].Render(devcon, Bufferphase);

			// Step up one alpha
			AlphaStepUp;
		}
	}

	/////////RENDER THE MESH/////////////////

	for (int p = 1; p <= Passes.options; p++) // Get all the passes
	{
		if (Passes.getAt(p) == true) // Is this pass enabled?
		{
			int relative = Passes.getRelative(p); // If so, find the mat id.
			Material* active = getMaterial(relative, Passes); // Find the mat of the id

			active->ApplyMaterial(dev, devcon);

			active->ShaderRef->UpdateObject(ObjectBuffer, devcon); // Send the object and world buffers
			active->ShaderRef->UpdateWorld(FrameBuffer, devcon);

			if (relative == BUMPMAP_SHADER) // Is this bump mapping?
			{
				MatrixBuffer.world = ObjectBuffer.Final;
				active->ShaderRef->UpdateMatrix(MatrixBuffer, devcon);
				active->ShaderRef->SendTexture(Passes.base->pTexture, 0, devcon);
				active->ShaderRef->SendTexture(Passes.bumpmap->pTexture, 1, devcon);
			}

			if (relative == TEXTURE_SHADER) // Texturing?
			{
				active->ShaderRef->SendTexture(Passes.base->pTexture, 0, devcon);
				//active->ShaderRef->SendTexture(Passes.opacityMap->pTexture, 1, devcon);
			}


			if (mesh.Instanced && !recursive) // Is this mesh instanced
			{
				FOREACH (mesh.InstancedData.size()) // For all instances
				{
					mesh.SendInstancedData(ObjectBuffer, Camera, i, Bufferphase); // Send the location, rotation and scaling of this instance.

					RenderMesh(mesh, true); // Call this function, with recursive true so the origional matrixes wont be re-sent (removing the instance loc,rot and scaling).
				}
			}

			// Render the mesh in diffuse mode
			active->Render(devcon, Bufferphase);

			// Step up one alpha
			AlphaStepUp;
		}
	}

	if (recursive)
		return;

	BlendStateManager.RestoreBlendState(devcon);
}

 

 

PS. The mesh.passes is just a class with contains the passes(e.g. Texture, shadows, Lighting).

Now if you look at my code, in each pass it checks if it is bump mapping or texturing, but is this a slow method, how could it be improved? Or even better, how can it be improved in the concept of instancing?

Also if you have time, it could be nice if you could post a good way of handling mesh rendering, or an efficient method, as I'm just following my minds advice, and not a professional persons advice.

Also, how much time does it take to check if a bool is true?

Thank You

Edited by Migi0027

Share this post


Link to post
Share on other sites

It seems to me like you are looking for improvements without a strong need in mind, which will make it very hard to find a solution that you are looking for.  Ultimately, you will find that normally the CPU side manipulations account for a fairly small amount of work in the grand scheme of things unless you are using lots and lots of small objects.  However, even in that case, the GPU work being done can often take up more time than the CPU side - meaning that any optimization on the CPU side won't make any difference to the end result framerate.

 

For your specific question about testing a bool - it is pretty much one of the fastest operations that exist, so you don't need to worry about changing that.  You should invest some time in learning how to use profiling tools (free or otherwise) and also tools like PIX/Graphics Debugger to learn more about the performance of your engine/app.  That will allow you to pinpoint the bottlenecks and make effective use of your optimization time.

Share this post


Link to post
Share on other sites

Most of the loss of performance is /not/ the draw calls, but a Game Engine's utilization of them... 

 

From an architectural point of view, the code hint that you may not be implementing standard "Separation of Concerns" patterns, (ensuring that methods/classes don't do more than one thing, but rather responsible for doing a specialized purpose). 

 

In other words, at this stage, I believe everyone would agree that getting your engine do to what you want is more important than the performance.

 

For example, you are using a "renderfunc"  and "rendermesh" function names: Okay, I know this sounds a little out of left field for being, nitpicky, but bear with me ... Rather than, RenderScene, -> RenderEachLayout  -> RenderBufferSet -> RenderBufferPartitions ->  or some other rendering step in the context of your Game Engine's OWN Pipeline.

 

In a game engine, there is /massive/ importance on the efficient management of the associations between Modeling entities, specifically:

1. Scenes to Layouts

2. Layouts To Buffers

3. Buffers to Buffer Partitions, Shaders, etc

4. Model to: (Mesh, Behaviors, Textures, Shaders, etc)

5. Many Textures to Many Models and Meshes

6. Many Meshes to Many Models

7. and the list goes on and on and on and on.

 

The GPU operations are /fast/ ....  Managing memory, garbage collection, managing collections in memory that are hundreds of megabytes, etc, is one of the biggest game performance hits.  This is where a lot of your optimizations will need to occur.  As horrible as it sounds, use place holder shaders, textures, etc at first, (greyscale gradients, etc), to ensure your Game Engine's Framework is working right.

 

In other words, pick a game engine feature, set of features that you want, program towards them, and find out if those features are running slow or not, then performance tune them.

 

Stage 1 Features:

1. Rendering Multiple Instances of the Same Model definition.  (sidewalk squares)

2. Rendering Multiple Instances of the Same Model, but with different Shaders,  (stars, the same car but different color, etc)

3. Rendering Multiple Types of Models, (Tree, Sidewalk Square, etc)

4. Pre-Defined animations for a subset of models, (rotations, movement/translation in a circle pattern, movement along a Bezier, whatever).

5. User Driven Movement,

6. etc.

 

There are a /lot/ of design patterns that can be used to organize the complexity of the code, so that the application will scale appropriately with the features you want.

 

You may think that implementing facades, multiton, or factory design patterns are a /lot/ of overhead for a small program, but in actuality this is not the case as compilers optimize and inline what needs to be. 

 

Still, the point that I am trying to make, the use of Design Patterns, and the avoidance of Anti-Patterns in a game engine is what is going to always be what will determine your game engine's performance story.

 

Then, there are fancy shmancy GPU render operations... See, the funny thing about fancy shmancy GPU operations is that /only if/ your architecture is "sound" in the first place, can you implement GPU operations on a priority basis, (maybe ensure that that that fancy shmancy model never attempts to get rendered more than once a frame, or maybe once every two frames, (there is a LOT of overrendering in games).

 

Architectural techniques and optimizations are probably where you want to start off first when facing performance problems, and regardless, you will be able to do more with your engine anyway.

Edited by e.s.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628320
    • Total Posts
      2982056
  • Similar Content

    • By GalacticCrew
      In some situations, my game starts to "lag" on older computers. I wanted to search for bottlenecks and optimize my game by searching for flaws in the shaders and the layer between CPU and GPU. My first step was to measure the time my render function needs to solve its tasks. Every second I wrote the accumulated times of each task into my console window. Each second it takes around
      170ms to call render functions for all models (including settings shader resources, updating constant buffers, drawing all indexed and non-indexed vertices, etc.) 40ms to render the UI 790ms to call SwapChain.Present <1ms to do the rest (updating structures, etc.) In my Swap Chain description I set a frame rate of 60 Hz, if its supported by the computer. It made sense for me that the Present function waits some time until it starts the next frame. However, I wanted to check, if this might be a problem for me. After a web search I found articles like this one, which states 
      My drivers are up-to-date so that's no issue. I installed Microsoft's PIX, but I was unable to use it. I could configure my game for x64, but PIX is not able to process DirectX 11.. After getting only error messages, I installed NVIDIA's NSight. After adjusting my game and installing all components, I couldn't get a proper result, but my game freezes after a new frames. I haven't figured out why. There is no exception, error message and other debug mechanisms like log messages and break points tell me the game freezes at the end of the render function after a few frames. So, I looked for another profiling tool and found Jeremy's GPUProfiler. However, the information returned by this tool are too basic to get an in-depth knowledge about my performance issues.
      Can anyone recommend a GPU Profiler or any other tool that might help me to find bottlenecks in my game and or that is able to indicate performance problems in my shaders? My custom graphics engine can handle subjects like multi-texturing, instancing, soft shadowing, animation, etc. However, I am pretty sure, there are things I can optimize!
      I am using SharpDX to develop a game (engine) based on DirectX 11 with .NET Framework 4.5. My graphics cards is from NVIDIA and my processor is made by Intel.
    • By GreenGodDiary
      SOLVED: I had written 
      Dispatch(32, 24, 0) instead of
      Dispatch(32, 24, 1)  
       
      I'm attempting to implement some basic post-processing in my "engine" and the HLSL part of the Compute Shader and such I think I've understood, however I'm at a loss at how to actually get/use it's output for rendering to the screen.
      Assume I'm doing something to a UAV in my CS:
      RWTexture2D<float4> InputOutputMap : register(u0); I want that texture to essentially "be" the backbuffer.
       
      I'm pretty certain I'm doing something wrong when I create the views (what I think I'm doing is having the backbuffer be bound as render target aswell as UAV and then using it in my CS):
       
      DXGI_SWAP_CHAIN_DESC scd; ZeroMemory(&scd, sizeof(DXGI_SWAP_CHAIN_DESC)); scd.BufferCount = 1; scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; scd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_SHADER_INPUT | DXGI_USAGE_UNORDERED_ACCESS; scd.OutputWindow = wndHandle; scd.SampleDesc.Count = 1; scd.Windowed = TRUE; HRESULT hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL, D3D11_SDK_VERSION, &scd, &gSwapChain, &gDevice, NULL, &gDeviceContext); // get the address of the back buffer ID3D11Texture2D* pBackBuffer = nullptr; gSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&pBackBuffer); // use the back buffer address to create the render target gDevice->CreateRenderTargetView(pBackBuffer, NULL, &gBackbufferRTV); // set the render target as the back buffer CreateDepthStencilBuffer(); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); //UAV for compute shader D3D11_UNORDERED_ACCESS_VIEW_DESC uavd; ZeroMemory(&uavd, sizeof(uavd)); uavd.Format = DXGI_FORMAT_R8G8B8A8_UNORM; uavd.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D; uavd.Texture2D.MipSlice = 1; gDevice->CreateUnorderedAccessView(pBackBuffer, &uavd, &gUAV); pBackBuffer->Release();  
      After I render the scene, I dispatch like this:
      gDeviceContext->OMSetRenderTargets(0, NULL, NULL); m_vShaders["cs1"]->Bind(); gDeviceContext->CSSetUnorderedAccessViews(0, 1, &gUAV, 0); gDeviceContext->Dispatch(32, 24, 0); //hard coded ID3D11UnorderedAccessView* nullview = { nullptr }; gDeviceContext->CSSetUnorderedAccessViews(0, 1, &nullview, 0); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); gSwapChain->Present(0, 0); Worth noting is the scene is rendered as usual, but I dont get any results from the CS (simple gaussian blur)
      I'm sure it's something fairly basic I'm doing wrong, perhaps my understanding of render targets / views / what have you is just completely wrong and my approach just makes no sense.

      If someone with more experience could point me in the right direction I would really appreciate it!

      On a side note, I'd really like to learn more about this kind of stuff. I can really see the potential of the CS aswell as rendering to textures and using them for whatever in the engine so I would love it if you know some good resources I can read about this!

      Thank you <3
       
      P.S I excluded the .hlsl since I cant imagine that being the issue, but if you think you need it to help me just ask

      P:P:S. As you can see this is my first post however I do have another account, but I can't log in with it because gamedev.net just keeps asking me to accept terms and then logs me out when I do over and over
    • By mister345
      Does buffer number matter in ID3D11DeviceContext::PSSetConstantBuffers()? I added 5 or six constant buffers to my framework, and later realized I had set the buffer number parameter to either 0 or 1 in all of them - but they still all worked! Curious why that is, and should they be set up to correspond to the number of constant buffers?
      Similarly, inside the buffer structs used to pass info into the hlsl shader, I added padding inside the c++ struct to make a struct containing a float3 be 16 bytes, but in the declaration of the same struct inside the hlsl shader file, it was missing the padding value - and it still worked! Do they need to be consistent or not? Thanks.
          struct CameraBufferType
          {
              XMFLOAT3 cameraPosition;
              float padding;
          };
    • By noodleBowl
      I was wondering if anyone could explain the depth buffer and the depth stencil state comparison function to me as I'm a little confused
      So I have set up a depth stencil state where the DepthFunc is set to D3D11_COMPARISON_LESS, but what am I actually comparing here? What is actually written to the buffer, the pixel that should show up in the front?
      I have these 2 quad faces, a Red Face and a Blue Face. The Blue Face is further away from the Viewer with a Z index value of -100.0f. Where the Red Face is close to the Viewer with a Z index value of 0.0f.
      When DepthFunc is set to D3D11_COMPARISON_LESS the Red Face shows up in front of the Blue Face like it should based on the Z index values. BUT if I change the DepthFunc to D3D11_COMPARISON_LESS_EQUAL the Blue Face shows in front of the Red Face. Which does not make sense to me, I would think that when the function is set to D3D11_COMPARISON_LESS_EQUAL the Red Face would still show up in front of the Blue Face as the Z index for the Red Face is still closer to the viewer
      Am I thinking of this comparison function all wrong?
      Vertex data just in case
      //Vertex date that make up the 2 faces Vertex verts[] = { //Red face Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), //Blue face Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), };  
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
  • Popular Now