• Advertisement
Sign in to follow this  

Why does my static mesh drop FPS from 60 to 4?

This topic is 1238 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

At first thought my graphics card was puking chips. The static mesh I created is a cup that has 18,000 total polygons. I noticed the FPS to drop from 60 to 4-8. The cup is being rendered in the deferred renderer.

 

I also wanted to see how Unreal Engine 4 handle it - so I loaded it inside Unreal Engine 4 and it was just fine. I'm not the expert level as in Epic Games but my game rendering is suffering a bunch of frame drops.

 

I loaded a game character from another game that is about 12K total of polygon - I loaded three of them inside my level editor - frame dropped from 60 to 10 frames per second. In the game Amnesia they have multiple of those creatures chasing the player - so obviously my renderering system blows. 

 

I disabled the deferred shading and lowered down the shadow map resolution to 1024 x 1024 which was 2048 x 2048. Didn't really help much.

 

I use std::vectors to store the mesh data and such.

 

Each rendering scene object and I will get around to changing it:

void SceneObject::RenderSceneMesh(GraphicsDevice *device, MaterialShader &mShader, XMMATRIX &world, XMMATRIX &view, XMMATRIX &proj) {

		if (isCulled) {

			UINT stride = sizeof(ObjectVertexData);
			UINT offset = 0;

			device->devcon->IASetVertexBuffers(0, 1, &vertexBuffer, &stride, &offset);
			device->devcon->IASetIndexBuffer(indexBuffer, DXGI_FORMAT_R32_UINT, 0);
			device->devcon->IASetInputLayout(mShader.eInputLayout);


			D3D11_MAPPED_SUBRESOURCE map;
			D3D11_MAPPED_SUBRESOURCE camMap;

			cameraConstantBuff *cameraCBuffer;
			MESHES_CB_BUFFER *sceneCBuffer;

			XMMATRIX mWorld;
			mWorld = XMMatrixIdentity();
			XMMATRIX invView;


		//	ZeroMemory(&map, sizeof(D3D11_MAPPED_SUBRESOURCE));
			device->devcon->Map(PrimaryConstantBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &map);

			sceneCBuffer = (MESHES_CB_BUFFER*)map.pData;

			world = scaleMatrix * rotationMatrix * translationMatrix;
			XMMATRIX wvp = world * view * proj;

			sceneCBuffer->WVP = XMMatrixTranspose(wvp);
			sceneCBuffer->WorldMatrix = XMMatrixTranspose(world);
			sceneCBuffer->viewMatrix = XMMatrixTranspose(view);
			sceneCBuffer->projectionMatrix = XMMatrixTranspose(proj);
			sceneCBuffer->modelWorld = XMMatrixTranspose(world);

			XMVECTOR det;
			XMMATRIX invWorld = XMMatrixInverse(&det, world);
			sceneCBuffer->invWorldMatrix = invWorld;

			sceneCBuffer->UVTile.x = gMaterial.TextureTile.x;
			sceneCBuffer->UVTile.y = gMaterial.TextureTile.y;

			if (!isPlaced) {
				sceneCBuffer->ghostModeEnabled = XMFLOAT2(1.0f,0.0f);
			}
			else {
				sceneCBuffer->ghostModeEnabled = XMFLOAT2(0.0f,0.0f);
				isPlaced = true;
			}
			sceneCBuffer->isSelected = XMFLOAT2(isSelected, 0.0f);
			sceneCBuffer->padding = XMFLOAT2(0, 0);

			device->devcon->Unmap(PrimaryConstantBuffer, 0);

			//ZeroMemory(&camMap, sizeof(D3D11_MAPPED_SUBRESOURCE));
			device->devcon->Map(cameraConstantBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &camMap);

			cameraCBuffer = (cameraConstantBuff*)camMap.pData;

			//cameraCBuffer->reflectionMatrix = XMMatrixLookAtLH(reflectionPosition, reflectionLookAt, reflectionAim);

			cameraCBuffer->cameraPosition = cameraPosition;
			cameraCBuffer->padding = XMFLOAT4(0, 0, 0, 0);

			device->devcon->Unmap(cameraConstantBuffer, 0);

			D3D11_MAPPED_SUBRESOURCE lightMapped;
			//ZeroMemory(&lightMapped, sizeof(D3D11_MAPPED_SUBRESOURCE));

			device->getDeviceContext()->Map(lightCB, 0, D3D11_MAP_WRITE_DISCARD, 0, &lightMapped);

			lightConstantBuffer *lightCBuff;
			lightCBuff = (lightConstantBuffer*)lightMapped.pData;
			
			XMMATRIX lightProjWS = XMLoadFloat4x4(&ShadowProjWS);
			XMMATRIX ShadowWS = world * lightProjWS;
			lightCBuff->lightViewMatrix = XMMatrixTranspose(ShadowWS);
			lightCBuff->lightProjMatrix = XMLoadFloat4x4(&ShadowProjWS);

			device->getDeviceContext()->Unmap(lightCB, 0);

			device->devcon->VSSetShader(mShader.eVertexShader, 0, 0);
			device->devcon->PSSetShader(mShader.ePixelShader, 0, 0);

			ID3D11Buffer *constantbuffers[3] = { PrimaryConstantBuffer, cameraConstantBuffer, lightCB };

			device->devcon->VSSetConstantBuffers(0, 3, constantbuffers);
			device->devcon->PSSetConstantBuffers(0, 3, constantbuffers);
			
			ID3D11ShaderResourceView* srvs[7] = { diffuseSRV, normalSRV, specularSRV, ambientOccSRV, displacementSRV, ShadowMapSRV , SRV3D};
			device->devcon->PSSetShaderResources(0, 7, srvs);

			ID3D11SamplerState *samplers[2] = { device->pointTextureSampleState, device->clampTextureSampleState };
			
			device->devcon->PSSetSamplers(0, 2, samplers);
		
			//device->devcon->RSSetState(device->SoldMode);
			device->enableDepthBuffer();

			device->devcon->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);

			device->devcon->DrawIndexed(renderObject.numIndices(), 0, 0);

			aabb.isBuilt = true;
			ID3D11ShaderResourceView *nullSRVS[7] = { 0 };
			device->getDeviceContext()->PSSetShaderResources(0, 7, nullSRVS);
			
		}

	}

Could it be the dyanmic constant buffers that are hurting performance? Thanks for chiming in.

 

Share this post


Link to post
Share on other sites
Advertisement

You set a lot of GPU parameters and resources, apparently for each object. One improvement would be to sort your objects by shader, and by textures. If you have a lot of objects using the same shader, the same view/projection, etc., no need to update the resources, reload the same shader and resources, etc.

 

I can't for the life of me find the reference for a list of rendering tips here on gamedev, but, among them that may be applicable:

 

- Don't use 32 bit indices. Use 16-bit. Why send 4 bytes per index to the GPU when 2 will suffice? E.g., for 18000 triangles, that's 54000 indices, within range of a 16-bit index. 54000 indices = 216K @ 32-bit, but only 108K @ 16-bit - just half the throughput for a single object.

- Sort opaque objects front-to-back to reduce overdraw. You can do a lot of CPU sorting in the time it takes to overdraw an occluded object in the GPU.

- If you have several duplicate objects (characters, etc.) which have the same vertex/index buffer, render them with the same shader, vertex/index buffers, etc.

Edited by Buckeye

Share this post


Link to post
Share on other sites

Can you confirm that the presented code is the bottle neck?

 

I can see that you are setting lots of redundant parameters such as render targets and camera and light constants. Still I'm not totally convinced that the above code could hurt the fps as much as you have observed. How many times the above code is called when rendering your big object?

 

Are you using debug flags for the device? If not, you should maybe check out for invalid function calls? If yes, it hurts performance too.

 

Cheers!

Edited by kauna

Share this post


Link to post
Share on other sites

Why are you using:

 

device->devcon

 

In part of the code, but also using:


device->getDeviceContext()

 

In other parts of it?  Does your getDeviceContext call do anything special?  At the very least this is a code smell, at worst getDeviceContext may be doing a whole lot of unnecessary extra work.

Share this post


Link to post
Share on other sites

Have you tried to use the frame analyzer from Visual Studio 2013 or use the RenderDoc? With these programs you can analayze a frame and see the performance of the GPU for every draw call. It is very useful to know where is the bottle neck exactly.

Share this post


Link to post
Share on other sites

I use std::vectors to store the mesh data and such.

 

I'm no expert but maybe its something simple like you're accidentally recreating your resources every frame instead of once.  I'm most likely wrong but I figured I'd try to help.

 

edit- I'm wrong I misread you and didn't realize you said framerate drops, sorry.

Edited by Infinisearch

Share this post


Link to post
Share on other sites

You set a lot of GPU parameters and resources, apparently for each object. One improvement would be to sort your objects by shader, and by textures. If you have a lot of objects using the same shader, the same view/projection, etc., no need to update the resources, reload the same shader and resources, etc.

 

I can't for the life of me find the reference for a list of rendering tips here on gamedev, but, among them that may be applicable:

 

- Don't use 32 bit indices. Use 16-bit. Why send 4 bytes per index to the GPU when 2 will suffice? E.g., for 18000 triangles, that's 54000 indices, within range of a 16-bit index. 54000 indices = 216K @ 32-bit, but only 108K @ 16-bit - just half the throughput for a single object.

- Sort opaque objects front-to-back to reduce overdraw. You can do a lot of CPU sorting in the time it takes to overdraw an occluded object in the GPU.

- If you have several duplicate objects (characters, etc.) which have the same vertex/index buffer, render them with the same shader, vertex/index buffers, etc.

Attempted to use 16 bit - got back a weird looking mesh. I globalized most of the constantbuffers[3] - the srvs[7] and samplers[7] instead of initializing in the render loop;. That didn't help either.

Share this post


Link to post
Share on other sites

Have you tried to use the frame analyzer from Visual Studio 2013 or use the RenderDoc? With these programs you can analayze a frame and see the performance of the GPU for every draw call. It is very useful to know where is the bottle neck exactly.

I looked at renderdoc for a few minutes. I came up with the notion that possible having srvs[5] and samplers[2] and constants[3] were killing the performance value but it didn't. However, I do have these D3D11 warning messages from the D3D11_DEBUG :

 

D3D11 WARNING: ID3D11DeviceContext::PSSetShaderResources: Resource being set to PS shader resource slot 5 is still bound on output! Forcing to NULL. [ STATE_SETTING WARNING #7: DEVICE_PSSETSHADERRESOURCES_HAZARD]
D3D11 WARNING: ID3D11DeviceContext::OMSetRenderTargets: Resource being set to OM RenderTarget slot 0 is still bound on input! [ STATE_SETTING WARNING #9: DEVICE_OMSETRENDERTARGETS_HAZARD]
D3D11 WARNING: ID3D11DeviceContext::OMSetRenderTargets[AndUnorderedAccessViews]: Forcing PS shader resource slot 5 to NULL. [ STATE_SETTING WARNING #7: DEVICE_PSSETSHADERRESOURCES_HAZARD]
D3D11 WARNING: ID3D11DeviceContext::PSSetShaderResources: Resource being set to PS shader resource slot 5 is still bound on output! Forcing to NULL. [ STATE_SETTING WARNING #7: DEVICE_PSSETSHADERRESOURCES_HAZARD]
The program '[1096] SICStudio.exe' has exited with code 0 (0x0).
 

 

I'm trying to fix them.

Share this post


Link to post
Share on other sites

The 5th SRV would be the ShadowMap SRV.  I'll disable the rendering the shadow map and see if there's any performance changes and less D3D11 warnings.

Share this post


Link to post
Share on other sites

I disabled the shadow map and increased fps and no D3D11 Warning Messages. So the issue likes also in the deferreredRendering.cpp

 

So you set the render targets

-clear render targets

-render scene to render targets

-reset render target

-reset viewport

-set back to backbuffer and depth stencil

- render scene to default render target

 

How would I normally reset the render target for the shadow map? So the shadow map then can be used for the shader resource input

Share this post


Link to post
Share on other sites

I believe I know what the issue surroudning the rendering slow. I launched Heaven benchmark and put it on extreme settings and it didn't miss a beat. Which tells me - my rendering is fubared and needs to be corrected in order to continue the process of making the game engine.

 

Couple of issues that could be underlying the issue are the inheritence classes that I thought it would make things easier. Like IOBJECT, IMESH, IENTITY then Staticmesh suing the Ientity. That's a lot in memory to store all that inheritences.

 

Now, it's time to go back and rework the entire rendering and change the classes so it uses less memory clogging up the process power. I may also change the constants from dynamic to default and use the updatesubresources inside the rendering - which is probably the root of clogging the performence.

Share this post


Link to post
Share on other sites


Attempted to use 16 bit - got back a weird looking mesh.

 

That has nothing to do with using a 16-bit index versus any other size. It's likely you have code elsewhere that assumes something other than 16-bit. E.g., perhaps you're not setting the index format to DXGI_FORMAT_R16_UINT when you call IASetIndexBuffer.

Share this post


Link to post
Share on other sites

I forgot to sweep clean after myself in the loading mesh function. The vertex information and induce data were stored as an std::vector. After the buildBuffers() function; I cleared those two because I didn't need them any longer - they are in the index buffer and vertex buffers. What happened when I imported the monster from Amnesia - before it dropped to 60 to 30 fps - now it doesn't. 

 

Again, I fixed the issue - just teaches me to make sure I clean up after my self in code. My poor programming ways can lead a game to stall or crash on a client's computer. So, I have to get better at that.

 

Thanks alll.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement