... What if you don't wait until the device has executed all the commands? 90% of the point with GPUs these days is that you submit work, and then (hopefully, otherwise you'll require fences) forget about it, so they can work asynchronously while the CPU builds the next set of commands.
If you wait for the device every frame you're basically synchronizing the CPU with the GPU, not allowing the CPU to work ahead on more commands. Like calling glFlush/glFinish every frame.
Then again, I haven't delved in Vulkan so I dunno if there is any specific use for vkDeviceWaitIdle.
EDIT: Good job on the OP for measuring the actual issue btw! Most often the code provided has nothing to do with the actual issue since no one bothers to profile.