Baffling performance issue

Started by
6 comments, last by MJP 13 years, 9 months ago
Hello,

Let me preface this by saying I am NOT a very good DirectX programmer by any means. I know only enough to accomplish what I was attempting. Briefly, my application is reading frame buffers from a machine vision camera into a texture and processing it on the GPU via a pixel shader (v3) using Managed DirectX (v9...obsolete, I know). The pixel shader program processes the texture by analyzing neighborhoods of pixels in sizes of 5x5, 7x7 or 9x9 centered on the output pixel. The neighborhood is processed and the color of the output pixel is adjusted according to the result of the processing. More or less a convolution (in fact, not even that as I'm not doing any multiplications, just additions over the kernel window). At the same time, the application uses an open audio library called NAudio to play test tones (sine waves) in my app. The sine waves are simultaneously and intermittently computed on the UI thread and loaded into a circular buffer for audio output while the graphics processing is going on. This has been working very smoothly on my desktop machine (GeForce GTS 250 and core i5). I get real time frame rates, smooth updating and seamless audio output.

Yesterday I tried this same application on a core i7 laptop with an ATI Mobility Radeon HD 5470 and was surprised to see the graphics and audio performance deteriorate hugely. With the 5x5 neighborhood pixel shader the frame rates were fine and audio was good, but at 9x9 it slowed down to a fraction of the GeForce card in the desktop...about 5 frames a second, no exaggeration. In additon to this I got some pretty bad stuttering in the acoustic sine wave output. When I tax the GPU with larger convolution kernels the audio stuttering becomes really bad. The only time the audio is seamless and the frame rates are good is with the 5x5 kernel. The weird thing is that the convolution processing is occurring on the GPU (supposedly) whereas the audio sine wave generation is on the UI thread, so the larger kernel in the pixel shader shouldn't affect the audio processing. I've tried moving the audio processing to a secondary thread but that made no difference. I've also noticed that interface responsiveness slows down. It's almost as if the pixel shader is running on the CPU and not the GPU. Any ideas what may be going on here? This has me stumped. The device is created with HardwareVertexProcessing and it does support pixel shader 3, but has a max instruction slots value of 512. I've noticed that I am unable to instantiate the device on the laptop with the "PureDevice" argument as this causes an exception, but then again I wasn't using that argument for the desktop version either and that one screams.

Any thoughts anyone? Any help much appreciated!
-L
Advertisement
Well if your GPU performance tanks and it can no longer keep up with the CPU, the driver will block your CPU thread when you call Present so that the GPU can catch up (the driver typically doesn't let the CPU get more than 2 or 3 frames ahead of the GPU). I guess since you're sitting around in driver mode, this is screwing up your audio processing.

To get more information about your performance problems, you'll probably have to dig in deeper with GPU PerfStudio to see what's bottlenecking you. A mobility 5470 is your typicaly low-bandwidth, low-throughput laptop GPU so I'm not that surprised that it's not happy with 81 texture samples. However I still wouldn't think it would drop to 5fps.

Also in college for my senior project I worked on an autonomous vehicle, and I wrote all the routines for processing the images from the machine vision camera to detect obstacles. Good times! :P
MJP, thanks very much for the reply!

I stupidly should have mentioned that all the DirectX device "stuff" is happening on another thread. In other words:

1) Instantiate DirectX device on UI thread, initialize, setup vertices, etc.

2) When necessary launch worker thread which loops and loads buffer to texture, calls device.Present, etc.

3) At the same time, UI thread intermittently processes / loads looping audio buffer.

So, I believe if the graphics driver were to block the thread it would be blocking the worker thread (the one calling Present()), no? In that case my UI thread is free to carry on the sine wave processing. I have 4 full cores to play with so I'm dismayed to see the audio get choppy. I can live with a 7x7 kernel but not discontinuous audio.

The NAudio library is using WaveOut api...is it possible the audio is being channeled through the graphics card as well, as when the card starts to bog down it affects the audio output?

-L
Because you're just doing additions over the filter kernel window you may be able to optimize that pixel shader significantly. For a 9x9 matrix instead of doing 81 reads per pixel, you could get away with 18 by separating it into one horizontal pass, and one vertical. See http://blogs.mathworks.com/steve/2006/10/04/separable-convolution/

I'd also make sure that if your rendering thread gets blocked, that doesn't also block your UI thread because for example it's waiting on some critical section. Creating A third higher priority thread which is just for sound may help here.

To try to rule out hardware / driver sound problems you could try playing some music with another program and see if that still plays correctly.
Music seems to play fine, and according to Windows my driver is the latest (although it's dated Jan.). I've tried running the sound loop in worker thread to guarantee it doesn't block, but I still get an intermittent stuttering with larger texture kernels. Dang strange!

Is it possible that the instruction count for the larger kernels is exceeding the maximum and DirectX automatically falls back to software implementation of the pixel shader?

-L
check the card, i had a semi similiar issue where a radeon 4850 or something was losing out bad to an 8400 gs. which is basically impossible. i called them and they said they had a problem with that models memory interface and to ship it back. i have since tried on an hd 5770 and it works fine and comparable to my gtx275.
Is there some sort of benchmark or diagnostic tool that someone knows of that I can download and run which would provide objective numbers?
Yeah I think you mentioned that you were using a separate thread...if not I assumed it. Anyway using separate threads doesn't mean that they won't sync or block each other. This is especially true with drivers, which have the ability to use a variety of system-wide locks/mutexes. Also I know ATI desktop cards have an HDMI controller with an audio device built in (so that audio can be output over HDMI), and that might be causing extra trouble.

Anyway to get to the bottom of your GPU performance problem I would strongly recommend running GPU PerfStudio. It will give you detailed information about where your bottlenecks are for a particular portion of a frame.

This topic is closed to new replies.

Advertisement