D3DXLoadSurfaceFromMemory is slow?

Started by
5 comments, last by jollyjeffers 18 years, 6 months ago
I have a very simple application that is trying to render 30 fps live video in a window. The capture device delivers frames as UYVY surfaces so I have been making use of the D3DXLoadSurfaceFromMemory() function. My render loop is as follows:


bool MyDraw::UpdateWindow( void *pBuffer, int width, int height )
{
    HRESULT ddrval;

    if( NULL == _d3d9_device )
        return false;
    

    RECT _rc_source = { 0, 0, width, height };

   
    //  Begin the scene

    if( SUCCEEDED( ddrval = _d3d9_device->BeginScene() ) )
    {        
        IDirect3DSurface9 * back_buffer;

        ddrval = _d3d9_device->GetBackBuffer ( 0, 0, D3DBACKBUFFER_TYPE_MONO, & back_buffer );

        DWORD start = ::GetTickCount();


        //  Let DirectX handle the conversion of our input format, UYVY, to the current display mode

        if( SUCCEEDED( ddrval = D3DXLoadSurfaceFromMemory ( back_buffer,
                                                            NULL,
                                                            NULL,
                                                            pBuffer,
                                                            D3DFMT_UYVY,
                                                            1440,
                                                            NULL,
                                                            & _rc_source,
                                                            D3DX_FILTER_NONE,
                                                            0 ) ) )
        {


        }

        DWORD elapsed = ::GetTickCount() - start;
        _RPT1 ( _CRT_WARN, "Elapsed time %lu\r\n", elapsed );

        back_buffer->Release ( );
    
        //  End the scene

        _d3d9_device->EndScene();
    }

    //  Present the backbuffer contents to the display

    _d3d9_device->Present( NULL, NULL, NULL, NULL );

    return true;
}


While this all works, the problem is the performance. It doesn't matter whether my video settings are RGB-16 (565) or RGB-32 (xRGB), I can't seem to render any more than 11-12 fps. Am I doing something wrong? It seems to me that if hardware overlay is available, DirectX should recognize this and this method should be very fast indeed (I am trying to move away from DirectDraw). Thanks, Jim
Advertisement
Whilst you might be able to do a few things to up the performance, I think the fundamental problem is your "algorithm" design. It, quite simply, is not a high-performance route.

Whilst you still have access to the "core" swap-chain and depth-stencil surfaces, access to them is never particularly efficient. As of D3D8, the API hid them enough to allow the drivers/hardware more control over what/how/where things were stored. As a consequence, any optimizations may need to be "undone" before you can gain access. A simple example is if it's in a proprietary format not directly mappable to a IDirect3DSurface9 then some sort of format conversion will have to occur before the D3D API can return you the data.

Using surfaces for rendering is generally slow. There can be more limits involved, but generally speaking filling a IDirect3DTexture9 and then rendering it to the screen with a TLQuad will be a lot faster.

Also, locking/modifying resources can be very slow - best performance typically comes from "static" resources that don't change. Okay, so this isn't possible in your case, but you might be able to implement some sort of threading+buffering algorithm to offset the delay in transmitting data across the bus. Also check the usage and pool flags for any resources you create - they can be vital to performance.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Thanks for the response Jack.

Since the app is basically a monitor of a live feed, there is not much I can do short of rendering less than the actual number of FPS. Guess that means I need to go back to DirectDraw. Hmm.... Seems like such a basic need.

Jim
Can you not perform some sort of buffering, or hand over the format conversion to a worker thread?

Basically, if you have a 30hz live feed that gives you ~33ms to get the data in and on the screen. If, for whatever reason, the process takes ~50ms then you could use threading (or similar) to buffer one frame behind.. it's a bit difficult to show without a diagram [smile]

0ms - first frame delivered
- begin processing first frame, converting formats and preparing data (possibly upload in small chunks)
33ms - second frame delivered
- begin same process with second frame, but a different thread
50ms - first frame's processing is complete, display it to the screen.
66ms - third frame delivered
83ms - second frame's processing is complete, display it to the screen.
99ms - fourth frame delivered
109ms - third frame is ready, display it to the screen
132ms - fifth frame delivered
149ms - fourth frame ready, display it to the screen
...

After the initial delay you should be able to get into a suitably regular pattern with a relatively small delay.

If you do this, there are various changes you can implement to maximise concurrent processing abilities... the main thread should be responsible for uploading the data (multithreaded D3D access can be a pain!), but the other thread should deal in whole chunks of data straight from the live feed so as to be able to work on the CPU without potentially getting stuck (or involved with) the GPU side of things.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

That is a very good suggestion, Jack. I hadn't thought about it in that manner.

D3DXLoadSurfaceFromMemory seems to take approximately 66 ms per frame to convert from UYVY to RGB16 or 32 (the only 2 video formats supported by the video driver on the target platform) so it would take 3 or 4 buffers to make this method viable. Easily doable with the amout of memory on today's systems (even this hunk of junk Dell server with its feeble XGI Volari video ).

Thanks for the assist.

Jim
Glad I could help / my pleasure [smile]

Given the necessity for high performance you may well find that it's worth implementing your own form of D3DXLoadSurfaceFromMemory() as you, with added knowledge of what it's being used for, might be able to optimize it more aggressively.

A simple example might be that you can re-use fixed buffer(s) between frames rather than allocate/de-allocate each time you want to load some new data.

Also, if you're going to take a multi-threading approach then I'd highly suggest you do the conversion/processing manually and don't call D3DX from a worker/slave thread.

Get the slave thread to just run through the raw data, format it into a correct representation for a IDirect3DTexture9 and then have it throw it back at your master thread to be uploaded via a IDirect3DTexture9::LockRect() call.

If you can also get your master/D3D thread to re-use IDirect3DTexture9's that would help you - a general rule of thumb with D3D performance is to try and avoid resource allocation/de-allocation inside the main application loop.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Just reading some of my emails and I spotted a useful thread on the DIRECTXDEV mailing list:

Load texture From File - Reply 1, 2, 3, 4, 5, 6

Might make for a useful secondary source of ideas...

Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

This topic is closed to new replies.

Advertisement