Jump to content

  • Log In with Google      Sign In   
  • Create Account

Fast copying to rendertarget in D3D9


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 Asik   Members   -  Reputation: 136

Like
0Likes
Like

Posted 29 July 2013 - 04:02 PM

Hello,

 

I'm a newbie (as ever) with Direct3D and I'd like to know what is the most efficient way to do the very simple thing I want to do. I have a bunch of pictures in system memory that all get updated 60 times per second, and after every update I want to get each of them into a different IDirect3DSurface9 that has the USAGE_RENDERTARGET flag set (for use with WPF). I am using SharpDX but I can easily translate from native C++ if you're more comfortable using that.

 

I have a working sample but I'm not happy with the performance. It starts choking with about 8 640x480 renderers, while I can get 50 renderers using a software approach. It doesn't seem right that a hardware approach would be slower. :S

 

So far here's what I've done (using SharpDX, I hope d3d afficionados will understand easily):

 

Device creation (where s_hwnd is the window handle of the application):

 

            s_d3d = new Direct3DEx();
            var d3dpp = new PresentParameters
            {
                BackBufferCount = 1,
                BackBufferHeight = 1,
                BackBufferWidth = 1,
                BackBufferFormat = Format.Unknown,
                DeviceWindowHandle = s_hwnd,
                SwapEffect = SwapEffect.Discard,
                Windowed = true
            };
            s_device = new DeviceEx(s_d3d, 0, DeviceType.Hardware, s_hwnd, 
                CreateFlags.HardwareVertexProcessing | CreateFlags.FpuPreserve | CreateFlags.Multithreaded,
                new PresentParameters[] { d3dpp }, new DisplayModeEx[] {});
This only happens once and the device is used to create all textures.
Each renderer gets its own surface, created thus:
Surface.CreateRenderTargetEx(s_device, 640, 480, Format.X8R8G8B8, MultisampleType.None, 0, true, Usage.None);

At render time we copy the data (a byte array) to the renderer's surface; this happens for each renderer.

        public void Render(byte[] data)
        {
            // wpf code
            m_d3dImage.Lock();


            // d3d code
            var textureData = m_surface.LockRectangle(LockFlags.None);
            Marshal.Copy(data, 0, textureData.DataPointer, data.Length);
            m_surface.UnlockRectangle();


            // wpf code
            m_d3dImage.SetBackBuffer(D3DResourceType.IDirect3DSurface9, m_surface.NativePointer);
            m_d3dImage.AddDirtyRect(new Int32Rect(0, 0, 640, 480));
            m_d3dImage.Unlock();
        }
I'm thinking that invidually locking each texture is probably what is taking most time; perhaps there is a texture format more appropriate for this task, or an asynchronous way to lock all textures all once? Would Direct3D10 or 11 provide a faster method? I've searched online but without success so far.
 
Thanks for your help!

Edited by Asik, 29 July 2013 - 04:11 PM.


Sponsor:

#2 Matias Goldberg   Crossbones+   -  Reputation: 3470

Like
0Likes
Like

Posted 29 July 2013 - 05:34 PM

It looks like you're just using RenderTargets to display texture data you have CPU side and sending them to GPU each frame. That hardly qualifies as "hardware accelerated".

 

Perhaps it would be better if you try to tells us what are you trying to achieve and how are you currently doing it.

May be this reading can help you understand better what GPUs excel at and at what don't



#3 Asik   Members   -  Reputation: 136

Like
0Likes
Like

Posted 29 July 2013 - 06:08 PM

It looks like you're just using RenderTargets to display texture data you have CPU side and sending them to GPU each frame. That hardly qualifies as "hardware accelerated".

 

Perhaps it would be better if you try to tells us what are you trying to achieve and how are you currently doing it.

May be this reading can help you understand better what GPUs excel at and at what don't

I told you exactly what I'm trying to achieve. I need to update a number of IDirect3DSurface9s with arbitrary image data that resides in system memory, several times (30-60) per second, to display them in WPF using the D3DImage control. To do this I'm currently using the code shown above, but not achieving good performance compared to a another approach (using WriteableBitmap). I want to know if there's a more optimal way of transferring data in this manner to the GPU than what I'm currently doing, for instance (perhaps?) by batching calls to lock, perhaps using another texture format, I don't know. WPF itself does this way more efficiently so I suspect there's something I'm missing.


Edited by Asik, 29 July 2013 - 06:17 PM.


#4 Matias Goldberg   Crossbones+   -  Reputation: 3470

Like
0Likes
Like

Posted 29 July 2013 - 08:45 PM

 

It looks like you're just using RenderTargets to display texture data you have CPU side and sending them to GPU each frame. That hardly qualifies as "hardware accelerated".

 

Perhaps it would be better if you try to tells us what are you trying to achieve and how are you currently doing it.

May be this reading can help you understand better what GPUs excel at and at what don't

I told you exactly what I'm trying to achieve. I need to update a number of IDirect3DSurface9s with arbitrary image data that resides in system memory, several times (30-60) per second, to display them in WPF using the D3DImage control. To do this I'm currently using the code shown above, but not achieving good performance compared to a another approach (using WriteableBitmap). I want to know if there's a more optimal way of transferring data in this manner to the GPU than what I'm currently doing, for instance (perhaps?) by batching calls to lock, perhaps using another texture format, I don't know. WPF itself does this way more efficiently so I suspect there's something I'm missing.

 

Mmm... not exactly what I'm looking for.

 

What you're doing is simply not fast because it just stresses bus bandwidth without ripping the benefits of using a GPU. Btw, check the surfaces may not been created with the dynamic flag (and your locks may not be using the discard flag), if that's the case it would help you with performance a lot.

 

The intrigue lies in what you mean by "arbitrary image data". If "arbitrary image data" means (for example) you're using libcairo to render nice & complex 2D graphics and then send it to multiple D3D Surfaces, then that's not going to be fast. You're wasting your time trying to use the GPU.

Just combine them on CPU and send the final result to only one D3D surface

 

If by "arbitrary image data" you mean loading a few icons or pictures from a file, then you should do the update only once, not every frame.

 

If by "arbitrary image data" you mean images created through compositing (eg. static images, or rectangles layered on top of each other with different alpha blending operations, eg. photoshop-like blend modes) then your method is not the right way to do it; you should upload the static data once, and then use pixel shaders to do the operations you were doing on the CPU to achieve the same result.

 

Cheers


Edited by Matias Goldberg, 29 July 2013 - 08:46 PM.


#5 Asik   Members   -  Reputation: 136

Like
0Likes
Like

Posted 30 July 2013 - 08:27 AM

What you're doing is simply not fast because it just stresses bus bandwidth without ripping the benefits of using a GPU. Btw, check the surfaces may not been created with the dynamic flag (and your locks may not be using the discard flag), if that's the case it would help you with performance a lot.

I tried to pass Usage.Dynamic to CreateRenderTargetEx, however this fails with D3DERR_INVALIDCALL. Is it possible to create a "dynamic" render target, or should I use an intermediate surface, or...?

The intrigue lies in what you mean by "arbitrary image data". If "arbitrary image data" means (for example) you're using libcairo to render nice & complex 2D graphics and then send it to multiple D3D Surfaces, then that's not going to be fast. You're wasting your time trying to use the GPU.
Just combine them on CPU and send the final result to only one D3D surface
 
If by "arbitrary image data" you mean loading a few icons or pictures from a file, then you should do the update only once, not every frame.
 
If by "arbitrary image data" you mean images created through compositing (eg. static images, or rectangles layered on top of each other with different alpha blending operations, eg. photoshop-like blend modes) then your method is not the right way to do it; you should upload the static data once, and then use pixel shaders to do the operations you were doing on the CPU to achieve the same result.
Think sequence of pre-generated images, i.e. video. Every tile renders a different sequence of images. These cannot be computed on the GPU. The composition is done by WPF so it cannot be done beforehand either.
 
Sending all as one D3D surface looks like an interesting optimisation, but they may all be of different sizes and they're all potentially rather large so I'm not sure there's a way of always efficiently combining them.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS