Updating large buffers

Started by
5 comments, last by MJP 8 years, 9 months ago
Hi,

I am working on a voxel based game and found out that updating "large" vertex buffers leads to stuttering. It is not much stuttering, only a few frames are affected, but it is annoying.

I have about 1k to 16k vertices per chunk and I update an existing vertex buffer (or sometimes generate a new one if none with enough space exists) every time I generate a new chunk. Each vertex has 8 bytes. I tried to avoid this stuttering by only updating 2k vertices (16 kB) at once and then sleep for 100 ms. Generation of chunks and vertex buffer updates are done in a separate thread, so sleeping does not affect the render thread. This reduces stuttering, but it still occurs. I use buffers with D3D11_USAGE_DEFAULT and update them with UpdateSubresource().

When I walk in my game, 8 to 15 chunks are loaded at once, but I only do 5 chunk updates per second (calls to the method which updates the buffer and then sleeps like mentioned above, so it may be more than 5 chunk vertex buffer updates at once if some updates take longer than 200 ms because of sleeping).

I found out that stuttering occurs more likely when I move towards mountains (there are more vertices than in a flat landscape).

I started the .exe directly from the folder, not from Visual Studio. (Starting it using the Visual Studio debugger reduces performance a lot, even in release build.)

Is there something I am doing wrong? How can I get rid of this stuttering? My computer should be fast enough to play a game without stuttering (i7-3930K and GTX Titan).
Advertisement

You could try using two buffers. One is used for rendering and second for updating new data. When data is updated, you switch buffers. That approach should avoid situation, when GPU has to wait for buffer to update data. Also using D3D11_USAGE_IMMEDIATE and Map() can be faster. Of course then you do not need to sleep.

The GPU does not have to wait for buffer updates since no rendering is done before the whole chunk is loaded completely. I think the rendering of the other chunks is affected because the GPU copies the data from RAM to VRAM. But it's just less than 1 MB per second, this should not be too much. I don't know why this does not work. I disabled everything, even plants, there is just the voxel vertex buffer filling left, so this must cause the stuttering.

Should I wait one second after filling the buffers before I start rendering the chunk?

Should I wait one second after filling the buffers before I start rendering the chunk?

Definitely. Though a few frames should be enough. Don't read from the buffer at all after write until several frames have passed and see if it makes a difference.

Also, check feature support on your device and make sure it supports multi-threading properly: https://msdn.microsoft.com/en-us/library/windows/desktop/ff476497%28v=vs.85%29.aspx

I would try Map like the above poster suggests as well, could make a big difference.

I update an existing vertex buffer (or sometimes generate a new one if none with enough space exists)

This is a problem. Resource generation is not fast. It can cause the driver to do all sorts of maintenance tasks.
Make sure you've preallocated big enough vertex buffers so you don't have to create more.

This reduces stuttering, but it still occurs. I use buffers with D3D11_USAGE_DEFAULT and update them with UpdateSubresource().

That's another problem. The DX runtime will try to copy your data to a temporary location and defer the actual upload to the GPU in order to avoid UpdateSubresource from being a blocking call. But if the buffer is too large or the runtime ran out of temporary storage, it will block and upload on the spot.
You basically lose control on when your data is truly uploaded.

Use a dynamic buffer mapped with MAP_NO_OVERWRITE (and when you've fully written to the buffer and therefore need to start from 0, either issue a DISCARD or use Event Queries as a means of synchronizing with the GPU), or use a staging buffer and map it with D3D11_MAP_FLAG_DO_NOT_WAIT and then issue a CopySubResource from the staging buffer to the vertex buffer.

Thank you.

I disabled all rendering except terrain rendering and I stood still in the game without even moving the camera to look around. There are no buffer updates after initial terrain generation since I don't move. The same draw calls with the same data, all the time, nothing changes.

And it still stutters. I got about 200 fps, but every few seconds the framerate drops to 30 fps and it stutters for about 500 ms. I reduced the render distance to 96 meters and it still stutters! WTF?

How could it be? Is there any logical explanation for this?

Is my graphics card broken? Wait, I try another game...

Edit: No, it's not broken, I can play GTA 5 in 4K without any stuttering.

Edit 2: I disabled terrain rendering and I still have problems with stuttering, although NOTHING is rendered except postprocessing and the GUI.

Edit 3: I found the reason for stuttering: screen space post processing. But I don't know why it stutters, it is always the same input, so computation (rendering) time should be identical, but it isn't.

Edit 4: Is it maybe the light pixel shader? This doesn't make sense since no light is rendered, but it is the only post processing shader with loops and I had updated it a few days ago to render spot lights.

Edit 5: I realized that the constant buffer for this pixel shader has a size of 35376 bytes. It is updated every frame using map/unmap. Is it maybe too big for updating it every frame? Should I create multiple constant buffers and use them alternating?

[spoiler]
#define MAX_NUM_DIR_LIGHTS 4
#define MAX_NUM_TEXTURES_PER_LIGHT 8

#define MAX_NUM_SPOTLIGHTS_WITH_SHADOW 32
#define MAX_NUM_SPOTLIGHTS 128

Texture2D colorTexture : register(t0);
Texture2D normalTexture : register(t1);
Texture2D positionTexture : register(t2);

Texture2DArray<float> pointLightTexture : register(t3);
Texture2DArray<float> spotLightTexture : register(t4);
Texture2D<float> DirectionalLightTextures[MAX_NUM_DIR_LIGHTS*MAX_NUM_TEXTURES_PER_LIGHT];


static const float PI = 3.14159265f;


SamplerState SampleTypePoint : register(s0);
SamplerState SampleTypeClamp : register(s1);

struct LightDirectionAndNumTextures{
	float3 Direction;
	uint NumTextures;
};

struct DiffuseColorAndBias{
	float3 DiffuseColor;
	float Bias;
};

struct LightColorAndDistance_t{
	float3 Color;
	float MaxLightRange;
};

struct LightPositionAndFlags_t{
	float3 Position;
	uint Flags;
};

struct SpotLightDirAndAngle_t{
	float3 DirNormalized;
	float Angle;
};

cbuffer LightBuffer
{
	matrix DirectionalLightMatrices[MAX_NUM_DIR_LIGHTS*MAX_NUM_TEXTURES_PER_LIGHT];
	matrix PointLightMatrices[256];
	matrix PointLightOrientationMatrices[6];
	matrix SpotLightMatrices[MAX_NUM_SPOTLIGHTS_WITH_SHADOW];

	float3 ambientColor;
	uint NumDirectionalLights;

	DiffuseColorAndBias DiffuseColorsAndBias[MAX_NUM_DIR_LIGHTS];

	LightDirectionAndNumTextures LightDirsAndNumTextures[MAX_NUM_DIR_LIGHTS];

	LightPositionAndFlags_t pointLightPositions[256];
	LightColorAndDistance_t pointLightColors[256];
	uint pointLightCount;
	float3 padding0;

	LightPositionAndFlags_t spotLightPositions[MAX_NUM_SPOTLIGHTS];
	SpotLightDirAndAngle_t spotLightDirs[MAX_NUM_SPOTLIGHTS];
	LightColorAndDistance_t spotLightColors[MAX_NUM_SPOTLIGHTS];
	uint spotLightCount;
	float3 padding1;
};


struct PixelInputType
{
    float4 position : SV_POSITION;
    float2 tex : TEXCOORD0;
};







float4 main(PixelInputType input) : SV_TARGET
{
    float4 normals;
	float4 positions;
    float lightIntensity;
    float4 OutputColor;
	float3 distanceVector;
	float4 pointLightShadow;
	uint i, j;
	float2 DirectionalLightTexCoords;
	float2 SpotLightTexCoords;
	
    normals = normalTexture.Sample(SampleTypePoint, input.tex);
	positions = positionTexture.Sample(SampleTypePoint, input.tex);

	OutputColor.rgb = ambientColor;
	OutputColor.a = 1.0f;

	if (positions.w == 1.0f){ //positions.w has been set to 1.0f in other pixel shaders if light calculations should be done for this pixel
		
		//////////////////////////
		/// DIRECTIONAL LIGHTS ///
		//////////////////////////
		
		for (i = 0; i < NumDirectionalLights && i < MAX_NUM_DIR_LIGHTS; i++){

			for (j = 0; j < LightDirsAndNumTextures[i].NumTextures && j < MAX_NUM_TEXTURES_PER_LIGHT; j++){

				float4 LightPosition = mul(float4(positions.xyz, 1.0f), DirectionalLightMatrices[i*MAX_NUM_TEXTURES_PER_LIGHT + j]);
				DirectionalLightTexCoords.x = LightPosition.x / LightPosition.w / 2.0f + 0.5f;
				DirectionalLightTexCoords.y = -LightPosition.y / LightPosition.w / 2.0f + 0.5f;

				if (DirectionalLightTexCoords.x >= 0.0f && DirectionalLightTexCoords.x <= 1.0f && DirectionalLightTexCoords.y >= 0.0f && DirectionalLightTexCoords.y <= 1.0f){

					float Depth = DirectionalLightTextures[i*MAX_NUM_TEXTURES_PER_LIGHT + j].Sample(SampleTypeClamp, DirectionalLightTexCoords);

					float LightDepth = LightPosition.z / LightPosition.w;

					LightDepth -= DiffuseColorsAndBias[i].Bias;

					if (LightDepth < Depth){

						OutputColor.rgb += (DiffuseColorsAndBias[i].DiffuseColor.xyz * saturate(dot(normals.xyz, LightDirsAndNumTextures[i].Direction)));


					}
					if (LightDepth <= 1.0f){
						break; //use only the most detailed depth texture of the light's textures for those the world position is in range of this texture
					}
				}

			}
		}
	


		////////////////////
		/// POINT LIGHTS ///
		////////////////////

		
			for (i = 0; i<pointLightCount && i<256; i++){
				distanceVector = positions.xyz - pointLightPositions[i].Position;
				float distance2 = dot(distanceVector, distanceVector);

				if (distance2 == 0.0f){
					OutputColor.rgb += pointLightColors[i].Color;
				}
				else if (distance2 < pointLightColors[i].MaxLightRange*pointLightColors[i].MaxLightRange){
					OutputColor.rgb += pointLightColors[i].Color * pow(sqrt(distance2) / (pointLightColors[i].MaxLightRange) - 1, 2);
				}

			}

		///////////////////
		/// SPOT LIGHTS ///
		///////////////////

		
			for (i = 0; i<spotLightCount && i<MAX_NUM_SPOTLIGHTS_WITH_SHADOW; i++){

				float SpotLightIntensityShadow = 1.0f;
				if (spotLightPositions[i].Flags & 1 != 0){

					float4 LightPosition = mul(float4(positions.xyz, 1.0f), SpotLightMatrices[i]);
						SpotLightTexCoords.x = LightPosition.x / LightPosition.w / 2.0f + 0.5f;
					SpotLightTexCoords.y = -LightPosition.y / LightPosition.w / 2.0f + 0.5f;

					if (SpotLightTexCoords.x >= 0.0f && SpotLightTexCoords.x <= 1.0f && SpotLightTexCoords.y >= 0.0f && SpotLightTexCoords.y <= 1.0f){

						float Depth = spotLightTexture.Sample(SampleTypeClamp, float3(SpotLightTexCoords, i));

						float LightDepth = LightPosition.z / LightPosition.w;

						LightDepth *= 0.9999f;

						if (LightDepth >= Depth){
							SpotLightIntensityShadow = 0.5f;
						}
					}
				}
				distanceVector = positions.xyz - spotLightPositions[i].Position;

				float distance2 = dot(distanceVector, distanceVector);

				float SpotLightIntensity = acos(dot(normalize(distanceVector), spotLightDirs[i].DirNormalized));

				if (SpotLightIntensity > spotLightDirs[i].Angle) continue;

				SpotLightIntensity = 1 - pow(SpotLightIntensity / spotLightDirs[i].Angle, 2);

				if (distance2 < spotLightColors[i].MaxLightRange*spotLightColors[i].MaxLightRange){
					OutputColor.rgb += spotLightColors[i].Color * pow(sqrt(distance2) / (spotLightColors[i].MaxLightRange) - 1, 2) * SpotLightIntensity * SpotLightIntensityShadow;
				}

			}
			for (i = MAX_NUM_SPOTLIGHTS_WITH_SHADOW; i<spotLightCount && i<MAX_NUM_SPOTLIGHTS; i++){
				distanceVector = positions.xyz - spotLightPositions[i].Position;

				float distance2 = dot(distanceVector, distanceVector);

				float SpotLightIntensity = acos(dot(normalize(distanceVector), spotLightDirs[i].DirNormalized));

				if (SpotLightIntensity > spotLightDirs[i].Angle) continue;

				SpotLightIntensity = 1 - pow(SpotLightIntensity / spotLightDirs[i].Angle, 2);

				if (distance2 < spotLightColors[i].MaxLightRange*spotLightColors[i].MaxLightRange){
					OutputColor.rgb += spotLightColors[i].Color * pow(sqrt(distance2) / (spotLightColors[i].MaxLightRange) - 1, 2) * SpotLightIntensity;
				}

			}

		/*if(lightColor.r != 0.0f || lightColor.g != 0.0f || lightColor.b != 0.0f){
			lightColor=lightColor/max(lightColor.r,max(lightColor.g,lightColor.b));
		}*/
		


		OutputColor.rgb=saturate(OutputColor.rgb);
		

	}
	else{
		OutputColor.rgb = ambientColor;
	}

	
    return float4(OutputColor.rgb,1.0f);
}

[/spoiler]
The tool that you need to use to dig into problems like this is GPUView. To get the ETW capture, you can try using UIforETW if you don't want to use the command line version. You may also want to read this presentation, and try doing what it suggests.

This topic is closed to new replies.

Advertisement