Vulkan render-call performance drain

Started by
5 comments, last by duckflock 7 years, 11 months ago

My Vulkan program is running extremely slow, and I'm trying to figure out why. I've noticed that even a few draw-calls already drain the performance far more than they should.

For instance, here's an extract(Pseudocode) for rendering a few meshes:


int32_t numCalls = 0;
int32_t numIndices = 0;
for(auto &mesh : meshes)
{
	auto vertexBuffer = mesh.GetVertexBuffer();
	auto indexBuffer = mesh.GetIndexBuffer();

	vk::DeviceSize offset = 0;
	drawCmd.bindVertexBuffers(0,1,&vertexBuffer,&offset); // drawCmd = CommandBuffer for all drawing commands (single thread)
	drawCmd.bindIndexBuffer(indexBuffer,offset,vk::IndexType::eUint16);

	drawCmd.drawIndexed(mesh.GetIndexCount(),1,0,0,0);

	numIndices += mesh.GetIndexCount();
	++numCalls;
}

There are 238 meshes being rendered, with a total vertex index count of 52050. The GPU is definitely not overburdened (The shaders are extremely cheap).

If I run my program with the code above, the frame is being rendered in approximately 46ms. Without it it's a mere 9ms.

I'm using fifo present mode with 2 swapchain images. Only a primary command buffer at this time (No secondary command buffers/pre-recorded buffers), same buffer for all frames.

My problem is, I don't really know what to look for. These few rendering calls should barely make a dent, so the source of the problem must be somewhere else.

Can anyone give me any hints how I should tackle this? Are the any profilers around for Vulkan already?

I just need a nudge in the right direction.

// EDIT:

So, it looks like vkDeviceWaitIdle takes about 32ms to execute, if all 238 meshes are rendered. (If none are rendered, it's < 1ms).

Most of the stalling stems from there, but I still don't know what to do about it.

Advertisement

... What if you don't wait until the device has executed all the commands? 90% of the point with GPUs these days is that you submit work, and then (hopefully, otherwise you'll require fences) forget about it, so they can work asynchronously while the CPU builds the next set of commands.

If you wait for the device every frame you're basically synchronizing the CPU with the GPU, not allowing the CPU to work ahead on more commands. Like calling glFlush/glFinish every frame.

Then again, I haven't delved in Vulkan so I dunno if there is any specific use for vkDeviceWaitIdle.

EDIT: Good job on the OP for measuring the actual issue btw! Most often the code provided has nothing to do with the actual issue since no one bothers to profile.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

If you wait for the device every frame you're basically synchronizing the CPU with the GPU, not allowing the CPU to work ahead on more commands. Like calling glFlush/glFinish every frame.

That's fine, except for the fact that the GPU isn't done with its work for 32 ms, so either way the GPU work is taking forever.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

What GPU do you have? Also since drivers are immature have you updated to the latest version?

I don't know vulkan but I noticed you're using a seperate vertex and index buffer per mesh and rebinding for each mesh in your mesh list. This might cause a performance problem... try putting your meshes into the minimum number of vertex and index buffers and see what happens.

edit - also how many command buffers is your 238 meshes divided among? (one if I read correctly)

also are you sorting your meshes front to back? or by texture?

edit - how many triangles per mesh? Also the code you provided is an example or your actual code?

-potential energy is easily made kinetic-

..as above. Vulkan still is not going to save you from unecessary/redundant state changes.

I'm not sure the same state changes in previous api's are as expensive in D3D12/Vulkan.

-potential energy is easily made kinetic-

Are you using different Queues for present and render or the same Queue?

Have you tried using fences on the present operation instead of vkDeviceWaitIdle? The documentation suggests that vkDeviceWaitIdle should be used to wait in a shutdown situation, so it might be rather pessimistic in some implementations.

Vulkan's timeout functions are typically bound to the OS' timing granularity, which is usually 16ms on Windows (IIRC). Another way to make sure the driver really waits 32ms is to set the timing granularity to 1ms. Chrome does this by default, for example.

This topic is closed to new replies.

Advertisement