Fast copying to rendertarget in D3D9

Started by
3 comments, last by Dr_Asik 10 years, 8 months ago

Hello,

I'm a newbie (as ever) with Direct3D and I'd like to know what is the most efficient way to do the very simple thing I want to do. I have a bunch of pictures in system memory that all get updated 60 times per second, and after every update I want to get each of them into a different IDirect3DSurface9 that has the USAGE_RENDERTARGET flag set (for use with WPF). I am using SharpDX but I can easily translate from native C++ if you're more comfortable using that.

I have a working sample but I'm not happy with the performance. It starts choking with about 8 640x480 renderers, while I can get 50 renderers using a software approach. It doesn't seem right that a hardware approach would be slower. :S

So far here's what I've done (using SharpDX, I hope d3d afficionados will understand easily):

Device creation (where s_hwnd is the window handle of the application):


            s_d3d = new Direct3DEx();
            var d3dpp = new PresentParameters
            {
                BackBufferCount = 1,
                BackBufferHeight = 1,
                BackBufferWidth = 1,
                BackBufferFormat = Format.Unknown,
                DeviceWindowHandle = s_hwnd,
                SwapEffect = SwapEffect.Discard,
                Windowed = true
            };
            s_device = new DeviceEx(s_d3d, 0, DeviceType.Hardware, s_hwnd, 
                CreateFlags.HardwareVertexProcessing | CreateFlags.FpuPreserve | CreateFlags.Multithreaded,
                new PresentParameters[] { d3dpp }, new DisplayModeEx[] {});
This only happens once and the device is used to create all textures.
Each renderer gets its own surface, created thus:

Surface.CreateRenderTargetEx(s_device, 640, 480, Format.X8R8G8B8, MultisampleType.None, 0, true, Usage.None);

At render time we copy the data (a byte array) to the renderer's surface; this happens for each renderer.


        public void Render(byte[] data)
        {
            // wpf code
            m_d3dImage.Lock();


            // d3d code
            var textureData = m_surface.LockRectangle(LockFlags.None);
            Marshal.Copy(data, 0, textureData.DataPointer, data.Length);
            m_surface.UnlockRectangle();


            // wpf code
            m_d3dImage.SetBackBuffer(D3DResourceType.IDirect3DSurface9, m_surface.NativePointer);
            m_d3dImage.AddDirtyRect(new Int32Rect(0, 0, 640, 480));
            m_d3dImage.Unlock();
        }
I'm thinking that invidually locking each texture is probably what is taking most time; perhaps there is a texture format more appropriate for this task, or an asynchronous way to lock all textures all once? Would Direct3D10 or 11 provide a faster method? I've searched online but without success so far.
Thanks for your help!
Advertisement

It looks like you're just using RenderTargets to display texture data you have CPU side and sending them to GPU each frame. That hardly qualifies as "hardware accelerated".

Perhaps it would be better if you try to tells us what are you trying to achieve and how are you currently doing it.

May be this reading can help you understand better what GPUs excel at and at what don't

It looks like you're just using RenderTargets to display texture data you have CPU side and sending them to GPU each frame. That hardly qualifies as "hardware accelerated".

Perhaps it would be better if you try to tells us what are you trying to achieve and how are you currently doing it.

May be this reading can help you understand better what GPUs excel at and at what don't

I told you exactly what I'm trying to achieve. I need to update a number of IDirect3DSurface9s with arbitrary image data that resides in system memory, several times (30-60) per second, to display them in WPF using the D3DImage control. To do this I'm currently using the code shown above, but not achieving good performance compared to a another approach (using WriteableBitmap). I want to know if there's a more optimal way of transferring data in this manner to the GPU than what I'm currently doing, for instance (perhaps?) by batching calls to lock, perhaps using another texture format, I don't know. WPF itself does this way more efficiently so I suspect there's something I'm missing.

It looks like you're just using RenderTargets to display texture data you have CPU side and sending them to GPU each frame. That hardly qualifies as "hardware accelerated".

Perhaps it would be better if you try to tells us what are you trying to achieve and how are you currently doing it.

May be this reading can help you understand better what GPUs excel at and at what don't

I told you exactly what I'm trying to achieve. I need to update a number of IDirect3DSurface9s with arbitrary image data that resides in system memory, several times (30-60) per second, to display them in WPF using the D3DImage control. To do this I'm currently using the code shown above, but not achieving good performance compared to a another approach (using WriteableBitmap). I want to know if there's a more optimal way of transferring data in this manner to the GPU than what I'm currently doing, for instance (perhaps?) by batching calls to lock, perhaps using another texture format, I don't know. WPF itself does this way more efficiently so I suspect there's something I'm missing.

Mmm... not exactly what I'm looking for.

What you're doing is simply not fast because it just stresses bus bandwidth without ripping the benefits of using a GPU. Btw, check the surfaces may not been created with the dynamic flag (and your locks may not be using the discard flag), if that's the case it would help you with performance a lot.

The intrigue lies in what you mean by "arbitrary image data". If "arbitrary image data" means (for example) you're using libcairo to render nice & complex 2D graphics and then send it to multiple D3D Surfaces, then that's not going to be fast. You're wasting your time trying to use the GPU.

Just combine them on CPU and send the final result to only one D3D surface

If by "arbitrary image data" you mean loading a few icons or pictures from a file, then you should do the update only once, not every frame.

If by "arbitrary image data" you mean images created through compositing (eg. static images, or rectangles layered on top of each other with different alpha blending operations, eg. photoshop-like blend modes) then your method is not the right way to do it; you should upload the static data once, and then use pixel shaders to do the operations you were doing on the CPU to achieve the same result.

Cheers

What you're doing is simply not fast because it just stresses bus bandwidth without ripping the benefits of using a GPU. Btw, check the surfaces may not been created with the dynamic flag (and your locks may not be using the discard flag), if that's the case it would help you with performance a lot.

I tried to pass Usage.Dynamic to CreateRenderTargetEx, however this fails with D3DERR_INVALIDCALL. Is it possible to create a "dynamic" render target, or should I use an intermediate surface, or...?

The intrigue lies in what you mean by "arbitrary image data". If "arbitrary image data" means (for example) you're using libcairo to render nice & complex 2D graphics and then send it to multiple D3D Surfaces, then that's not going to be fast. You're wasting your time trying to use the GPU.
Just combine them on CPU and send the final result to only one D3D surface
If by "arbitrary image data" you mean loading a few icons or pictures from a file, then you should do the update only once, not every frame.
If by "arbitrary image data" you mean images created through compositing (eg. static images, or rectangles layered on top of each other with different alpha blending operations, eg. photoshop-like blend modes) then your method is not the right way to do it; you should upload the static data once, and then use pixel shaders to do the operations you were doing on the CPU to achieve the same result.
Think sequence of pre-generated images, i.e. video. Every tile renders a different sequence of images. These cannot be computed on the GPU. The composition is done by WPF so it cannot be done beforehand either.
Sending all as one D3D surface looks like an interesting optimisation, but they may all be of different sizes and they're all potentially rather large so I'm not sure there's a way of always efficiently combining them.

This topic is closed to new replies.

Advertisement