Buffer size increase reduces frame rate,
Members - Reputation: 149
Posted 13 August 2012 - 08:52 PM
Members - Reputation: 149
Posted 14 August 2012 - 12:23 AM
Finally, I wondered if the GPU didn't load the buffers until they were actually used inside a shader, so I wrote a simple shader that writes values of 0 into the buffer. After about 10 frames, I switched to my actual shader, and voila! The frame rate was much much higher (600fps compared to the 15fps I was measuring for the first few frames). Changing the buffer size also did not signicantly change the frame rate. From what I see, it seems like the GPU will not load the buffers into memory until a shader accesses those buffers, and by the time that occurs, buffers that are large in size will incur a large overhead for the first 10 frames or so. In my opinion, this is quite strange behavior, since I'd like to think that the GPU would be able to load all the resources when given enough time (i.e. stalling the CPU to allow the GPU to catch up). Maybe something else is at work here, so please chime in if you have any ideas.
If you're wondering why it was necessary for me to measure the first frame (and not just the average fps), it was because my shader builds a sparse octree. The shader is much more demanding when the octree has to be subdivided many times, so the first frame (or frames with dynamic objects) requires much more time, and I needed a way of checking if the changes in my shaders were improving how quickly the tree could be built. So if you're looking for a way to profile the first few frames of a shader, make sure you have a "warmup" shader that writes values of 0 (or something that would have no effect) into the buffers you're using. After a few frames, switch to your actual shader and the frame rate should be more indicative of what you would get in a true continuous run.
Members - Reputation: 207
Posted 14 August 2012 - 06:47 AM
The best way to determine a bottleneck is to use PIX and profile your engine/code This will give you a better idea of what directx is doing with the CPU/GPU memory.
Moderators - Reputation: 18019
Posted 14 August 2012 - 01:04 PM
Also with regards to profiling with queries...you have to be careful with the results from that. It really only gives you the latency from which the GPU starts the query to when it reaches the end query in the command buffer, which doesn't necessarily give you the total time that the GPU spends on all of the commands within that query since it might be executing unrelated tasks concurrently. Things can also get really tricky when an expensive decompression or synchronization step is involved, since that won't happen until you do something later that causes it to happen. For instance I've seen this when trying to profile how long it takes my AMD GPU to fill an MSAA G-Buffer. I thought it was only taking a short amount of time, but then if I forced a sync/decompression by running a short compute shader that samples from the G-Buffer textures with one thread it showed up as taking much longer. I don't think you'll run into anything like that with buffers, since as far as I'm aware neither AMD or Nvidia do anything fancy for buffers in terms of memory layout or compression. But there may be some synchronization costs that could miss with a query.