Speeding up LockRect
Hi all,
I'm working on some software that relies on having graphics in main memory.
I'm doing some rendering to a Direct3D surface, then locking the surface and reading the image back to main memory.
Unfortunately this is /really/ slow (3 frames per second). Is there any way to speed this up?
I'm using a PCI Express video card, and I heard that this was supposed to speed up reading back from video memory. Is there some way to turn on this ability?
Thanks very much
That sounds really slow. How large is the texture?
Things to try (just guesswork): use DX release mode, update drivers.
Things to try (just guesswork): use DX release mode, update drivers.
Reading back from video memory is still very slow. It's faster on a PCI-Express slot, but still slow as hell.
Although, you should get more than 3 FPS, I suspect there's somethign else going on in your code. Can we see some code? And are you using the debug D3D runtimes? That'll give you much more information if there's something going wrong.
Although, you should get more than 3 FPS, I suspect there's somethign else going on in your code. Can we see some code? And are you using the debug D3D runtimes? That'll give you much more information if there's something going wrong.
Just a random thought really... have you run your profiling on a Release build of your code?
In my experience, the debug builds tend to be much slower when dealing with DMA - lots of under/over run and other access-related debugging stuff can really hurt performance.
Also, are you reading back each pixel (e.g. a nested for() loop) or you grabbing the whole block? It'll be substantially faster to grab a single huge block of binary data and then process each element than to combine both during a lock/unlock...
Another one that I've posted about before that people have told me works quite well is to use a ring-buffer approach. Store N images, rendering to each one after another. For each frame you download 1/Nth of the previous frame. This way you can maximize concurrency between the CPU and GPU and keep both busy at the same time without stalling either unnecessarily...
hth
Jack
In my experience, the debug builds tend to be much slower when dealing with DMA - lots of under/over run and other access-related debugging stuff can really hurt performance.
Also, are you reading back each pixel (e.g. a nested for() loop) or you grabbing the whole block? It'll be substantially faster to grab a single huge block of binary data and then process each element than to combine both during a lock/unlock...
Another one that I've posted about before that people have told me works quite well is to use a ring-buffer approach. Store N images, rendering to each one after another. For each frame you download 1/Nth of the previous frame. This way you can maximize concurrency between the CPU and GPU and keep both busy at the same time without stalling either unnecessarily...
hth
Jack
Use GetRenderTargetData(). It's *much* faster than locking a full render target surface. (It's still not blistering fast, but it should improve your frame rate.)
I posted this reply to a similar question recently. Using this technique may avoid stalling the GPU each frame.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement