|
Jump to: Software All product reviews |
|
From a developer tool perspective GPUs have followed similar path to that of general purpose CPUs. First we had assemblers, then command-line compilers, IDE tool support, and finally advanced specialty tools. Over the years NVIDIA has made a wide range of tools following this arc. Their latest offering is NVPerfHUD 3.
As you might infer from the name, NVPerfHUD is a performance analysis tool for NVIDIA graphics cards that takes the form of a HUD (heads up display). It is built around DirectX and takes advantage of instrumentation in NVIDIA's graphics drivers and hardware to get the low-level support it needs to do its work. Version 3 is a complete rewrite of NVPerfHUD.
NVPerfHUD requires a modern development machine with a reasonably recent NVIDIA (GeForce 3+) graphics card, a current graphics driver and the latest version of DirectX 9. NVPerfHUD supports all modern NVIDIA GPUs but has reduced functionality on anything older than the GeForce FX (NV3x) series.
The download is a reasonable 11 meg file and installs to be about 25 meg. Installation is clean except there is no option to specify where in the start menu the program group should go. After installation the program group contains shortcuts to NVPerfHUD, the user manual, quick reference card, release notes, license and an uninstall option. There is also a shortcut to launch NVPerfHUD with an included sample application so you can immediately explore its features. All of the documentation is in Adobe's PDF format which now has good load times due to the recently released Adobe Reader 7.0.
The user manual is light yet complete at about 45 pages. It's clearly written and is a good mixture of "control X does Y" descriptions and explanations of how and why you might use the features of this tool. A quarter of the manual is dedicated to how to identify performance bottlenecks and hints of how to avoid and/or fix them. The two page quick reference card includes a conventient listing of all the hot-keys, a bottleneck identification chart, and suggested optimizations.
Using NVPerfHUD with your own application is straight-forward. Because NVPerfHUD could be used as a powerful reverse engineering tool your application has to "opt in" before it can be used. This is done by simply adding a few lines to your DirectX initialization code. This has not changed since NVPerfHUD 2.x so existing code should be able to get version 3 support without even a recompile.
Using NVPerHUD is simple: either drag your application onto the NVPerfHUD icon or run NVPerfHUD with your application as a command-line argument.
NVPerfHUD has two user interfaces. Running NVPerfHUD with no arguments brings up a configuration dialog which allows you to set such things as an activation hot-key for NVPerfHUD and the methods your application uses for mouse and keyboard input.
The second interface is when an application is run using NVPerfHUD. It's drawn on top of the existing DirectX canvas (figure 1 above) and contains a mixture of text, graphs, and gui controls. Input can be toggled between the application and NVPerfHUD by using the activation hot-key. Most of the interface is driven by keyboard hot-keys which allows for quick navigation once you learn the keys. F1 provides a summary of the keys for quick reference and there's always the quick reference card and user guide for more detailed help. There are three main modes in this interface: Performance Analysis, Debug Console, and Frame Analysis mode.
For previous users of NVPerfHUD this will be the most familiar mode. The main elements on the interface are an information strip across the top, a resource creation monitor in the middle and several graphs towards the bottom. The information strip shows the frames per second, triangles per frame, and elapsed time into the run. The resource creation monitor is a list a various resources like texture and vertex buffers which flashes an indicator next to the resource whenever one is created. The four main graphs show usage of the various forms of DrawPrimitive, performance usage, card memory usage, and AGP/PCI/PCI-Express memory usage as a function of time.

The memory graphs are the simplest: they show the current amount of allocated memory on the card and across the AGP/PCI/PCI-Express bus. For example, zooming in World Wind causes the card memory usage to jump from about 10 meg to about 32 meg as it loads more detailed textures (Figure 3).
The DrawPrimitive graph is also straight forward. It shows the number of calls per frame of various forms of DrawPrimitive such as DrawPrimitive(), DrawPrimitiveUP() and DrawIndexedPrimitive(). It can be toggled to be a batch histogram using the "B" key. It then shows the distribution of vertexes per DrawPrimitive call.
The performance graph shows four line graphs showing time to render each frame, time spent in the driver, time the driver spent waiting on the CPU, and the time that the GPU is idle.
Some more advanced visualization options are available though hot-keys. Various levels of pixel shaders can be visualized by toggling them on and off. This will cause them to be drawn in colors tied to pixel shader version. Also available is a wireframe toggle and a depth complexity toggle which shows the amount of pixel overdraw in your application. There is also a hot-key to toggle the display of the graphs to reduce display clutter when needed.
However some of the most interesting stuff is in the pipeline experiments. Using hot-keys you can turn on and off various portions of the graphics pipeline to see how it effects performance. You can toggle all of the textures to be a single 2x2 texture to see how texture-bound your application is, turn off the raster stage of the pipeline to isolate the vertex unit and even turn off the entire pipeline to see how fast your core application logic would be if your application were attached to an infinitely fast card.
This is by far the simplest mode (see figure 2 above). It captures the output of NVPerfHUD diagnostics, the DirectX runtime, and OutputDebugString() to a text box. Each line is timestamped. Several options are available such as to turn off logging and to clear the log after each frame. Unfortunately, the amount of output can be significant, especially if your code has a lot of diagnostic output. It would be nice to have a way to filter the output. I'd like to see an option to reduce the output to that which matches a user defined string fragment (i.e. a grep).
This is by far the most impressive mode of NVPerfHUD. Text or static pictures do little justice to seeing it at work. The first time I played with it I was impressed by the teaching potential of this mode. A few moments in this mode does more to show how advanced graphic pipelines work than anything else I have seen.
Frame Analysis Mode freezes the current frame of the application and allows you step through it one DrawPrimitive call at a time. Using this it's easy to see exactly how a frame is built up. As if this weren't enough there is an advanced mode in which you can see even more detail. Turning on advanced mode (figures 4-7) allows you to see each stage of the graphics pipeline: the geometry to be rendered, the vertex shader used, the pixel shader used, and the raster operations used for each call. Each stage has its own set of inspectors so you can see all of the inputs such as the pixel shader source, the state of all of its input constants and the input textures. The analysis is static so you cannot do things like step through the shader instructions but what is present is very impressive and useful.
This mode may not work for all applications out of the box as it imposes some additional requirements. Basically the application must support time based rendering and needs to be able to handle a time delta of 0 or at least a very small value. Since most modern games use time based rendering this should be an easy modification if any is needed.
The main thing I found missing in this mode is the ability to step to a next frame. For example, if you were trying to analyze an explosion effect you might not be able to toggle to frame analysis mode at the right time to catch it. It would be nice if you could specify an interval (say 100ms) and be able to step through the frames until you found the effect that you were looking for.
No matter how good a tool is developers always want more and I'm no exception. I wish it supported OpenGL as it's my API of choice. The good news is that NVIDIA plans to support instrumenting of OpenGL applications in its upcoming NVPerfKit toolkit. Some details of this toolkit are also available on NVIDIA's GDC 2005 Presentations page. It is currently available in beta form for registered NVIDIA developers.
I also wish that NVPerfHUD could approximate other NVIDIA cards and/or configurations. I realize this is a tall order but it's hard for a large development shop, much less a small one, to test on many cards. It would be nice to be able to lower the amount of video ram available, vary the GPU/memory clocks, and turn off some of the graphics pipelines from NVPerfHUD. It might be impossible to do but it's a nice wish...
Developer support for 3D cards has come a long way in a short time. Not too long ago developer tools were a header file, some documentation and a few code examples. Today we have companies like NVIDIA offering suites of high-quality free tools to aid in many aspects of developing cutting edge graphics software.
NVPerfHUD should be of interest to anybody who develops software that employs DirectX graphics. Even if you don't have an NVIDIA card it's worth the price of a GeForce 6600 or 6800 to get this tool.
It's not very often that you get a tool that's actually fun to use. I wish my CPU development tools were as nice as my NVIDIA GPU tools.