Can anyone with Windows 10 PC help test my App on a performance issue

Started by
28 comments, last by Yu Liu 7 years, 11 months ago

Can anyone with Windows 10 PC help test my App on a performance issue


The interesting thing is, just a simplest app using D3D12 API without VSync runs with 120 FPS (on my desktop PC AMD 3.2GHz CPU and Radeon R7 260x GPU). While it reaches 2000 FPS on a laptop (Intel i5 Core 2.7GHz and GeForce GTX 950M). It's apparently unreasonably too slow on my desktop, and subsequenly I monitored CPU and GPU load to see what the bottleneck is.

The result is, CPU is used less then 10% on both machines, which appears right. But the Radeon R7's GPU is just used less then 17% load and often drops to zero, while the GTX 950M is consistently used for 86% load.

It looks like a deliberate "VSync" on GPU side for power-saving purpose?

I quote my App here.

Advertisement
You mentioned this is the "simplest app" - profiling a program that does virtually nothing isn't too useful. Try profiling your application as it's really being put through its paces.

Also, have you updated your driver for your desktop?

You mentioned this is the "simplest app" - profiling a program that does virtually nothing isn't too useful. Try profiling your application as it's really being put through its paces.

Also, have you updated your driver for your desktop?

It's simple, but has already done things like, world/project matrix transforming on a quad, with 2 textures being mapped, along with the Phong lighting per-pixel. I don't think it's reasonable to have such simple task to run just under 120 FPS while both CPU and GPU are almost idle.

BTW I'll try to update the Radeon driver tonight, in the meanwhile I'd like to see if any of you observe the same result with my app.

Oops, I forgot to mention I tried to run your app, but since I didn't have VS2015's runtime installed, it failed to start due to missing DLLs that I didn't want to install.

I don't think it's reasonable to have such simple task to run just under 120 FPS while both CPU and GPU are almost idle.

Sounds reasonable to me. If your program isn't really doing anything, why should the videocard hand your program extra resources?

I don't think it's reasonable to run at 2000 frames per second, if 1940 of those frames are pretty much being wasted.

I'm ignorant in this area, so hopefully someone more educated will come along and confirm, but I would -guess- that unless your application is handing the videocard a few tens of thousands of triangles a frame, it's not going to bother spinning up to full power.

Unless your program drops below your desired target framerate (i.e. 60 FPS or whatever your goal is), or until your program is actually processing a workload similar to what your game is actually going to be running, I wouldn't bother profiling at the 'whole application' level.

Oops, I forgot to mention I tried to run your app, but since I didn't have VS2015's runtime installed, it failed to start due to missing DLLs that I didn't want to install.

I don't think it's reasonable to have such simple task to run just under 120 FPS while both CPU and GPU are almost idle.

Sounds reasonable to me. If your program isn't really doing anything, why should the videocard hand your program extra resources?

I don't think it's reasonable to run at 2000 frames per second, if 1940 of those frames are pretty much being wasted.

I'm ignorant in this area, so hopefully someone more educated will come along and confirm, but I would -guess- that unless your application is handing the videocard a few tens of thousands of triangles a frame, it's not going to bother spinning up to full power.

Unless your program drops below your desired target framerate (i.e. 60 FPS or whatever your goal is), or until your program is actually processing a workload similar to what your game is actually going to be running, I wouldn't bother profiling at the 'whole application' level.

So pity you can't run it!

It's pretty possible (and reasonable in your words) that such FPS control is done in the AMD's video driver. If so, I agree it's OK just like why we favor Vsync. The only problem here is you know, for developers to evaluate the rendering performance, which tends to be the main reason why non-Vsync mode is still useful.

Isn't 120FPS the perceived ideal frame rate for VR? Well, sounds more like something intentional.

The only problem here is you know, for developers to evaluate the rendering performance, which tends to be the main reason why non-Vsync mode is still useful.

Yes, but it's (usually) only important to evaluate the performance of code that actually does something significant. If you are sending very little work to the videocard, even if you profile it accurately, it'd be of almost no value. It'd tell us things like 40% of your time is being spent clearing the screen, 10% is drawing textured triangles... because you're only draw a single triangle. The proportions of work, not being a realistic workload, would give misleading information - or rather, would give accurate information (spending 4x more clearing the screen then rendering textures), but most likely lead to misleading conclusions pulled from it (the wrong conclusion that we need to optimize screen clearing).

Another example is economies of scale - alot of graphics work has setup costs before a draw operation, and then is designed to churn through a bucket of work. Drawing three times more triangles doesn't cost three times as much time, so it's not easy to extrapolate framerates ("I'm drawing 10,000 triangles at 10 ms a frame, that means I should be able to draw 30,000 triangles at 10 ms a frame", would be incorrect extrapolation). It's just better to profile realistic workloads.

Plus, consumers greatly outnumber developers - AMD should have its default settings configured for consumers. :)

Anyway, about vsyncing, maybe your AMD control panel ("Catalyst") has Vsync turned on, even if the application turns it off. Those control panels can override applications' explicit requests, making applications run with different settings than it asked for. Probably worth a look.

DX12 doesn't run at a bazillion fps by calling Present(0, 0) like DX11 does. Your present call is going to block after calling it for as many swap chain buffers you have before the current frame finishes presenting.

You can make your app do a bunch of throwaway busy work by running command lists over and over again if you want, and copying data to the swap chain texture when there's one available. You can also use a waitable swap chain to make sure present never blocks.

The only problem here is you know, for developers to evaluate the rendering performance, which tends to be the main reason why non-Vsync mode is still useful.

Yes, but it's (usually) only important to evaluate the performance of code that actually does something significant. If you are sending very little work to the videocard, even if you profile it accurately, it'd be of almost no value. It'd tell us things like 40% of your time is being spent clearing the screen, 10% is drawing textured triangles... because you're only draw a single triangle. The proportions of work, not being a realistic workload, would give misleading information - or rather, would give accurate information (spending 4x more clearing the screen then rendering textures), but most likely lead to misleading conclusions pulled from it (the wrong conclusion that we need to optimize screen clearing).

Another example is economies of scale - alot of graphics work has setup costs before a draw operation, and then is designed to churn through a bucket of work. Drawing three times more triangles doesn't cost three times as much time, so it's not easy to extrapolate framerates ("I'm drawing 10,000 triangles at 10 ms a frame, that means I should be able to draw 30,000 triangles at 10 ms a frame", would be incorrect extrapolation). It's just better to profile realistic workloads.

Plus, consumers greatly outnumber developers - AMD should have its default settings configured for consumers. :)

Anyway, about vsyncing, maybe your AMD control panel ("Catalyst") has Vsync turned on, even if the application turns it off. Those control panels can override applications' explicit requests, making applications run with different settings than it asked for. Probably worth a look.

Haven't found a way in Catalyst to enable/disable this feature, though I did find a "frame rate control" option, after adjusting the FPS setting value there, I didn't see any change of FPS in my App.

After all I'm almost confident to assert that it's an intentional behavior done by the video driver rather than coincidence (otherwise why it could be so precise 120FPS rather than other number?)

DX12 doesn't run at a bazillion fps by calling Present(0, 0) like DX11 does. Your present call is going to block after calling it for as many swap chain buffers you have before the current frame finishes presenting.

You can make your app do a bunch of throwaway busy work by running command lists over and over again if you want, and copying data to the swap chain texture when there's one available. You can also use a waitable swap chain to make sure present never blocks.

If that's the case, why it's not blocked when running on a GeForce 950M? So I don't think it's a mechanism in D3D level.

It used to have that behaviour but it doesn't any more, at least on newer builds of Windows 10. I have a GPU from Red, Green and Blue teams and they all run at "a bazillion fps" with Present(0,0).

Do you, by any chance, have an older build of Windows 10 on the machine with the AMD card in?

If you're building the application separately on each machine, are they both targeting the 10586 Windows SDK?

I'm afraid I can't remember whether it was a build of Windows or the SDK itself that allow the uncapped FPS, but I'm pretty sure it'll be one of those two things.

Run "winver" from a command prompt and report back the full version string from each machine (eg "Version 1511 (OS Build 14342.1000)") from each machine.

And also from the Project Properties in Visual Studio, "General -> Target Platform Version" (eg 10.0.10586.0).

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

This topic is closed to new replies.

Advertisement