Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 18 Sep 2009
Offline Last Active Jun 09 2014 11:15 AM

Topics I've Started

Smeared edges pattern using HLSL for color space conversion

07 June 2014 - 09:34 AM

I've asked the same question on Stackoverflow but I doubt I'll get a good answer there.

I'm trying to write a YUV to RGB shader in HLSL. Specifically, it converts the Yuv420p format which consists of an N*M plane of Y values, followed by an (N/2)*(M/2) plane of U values and then an (N/2)*(M/2) plane of V values. For example this 1280x720 picture:


looks like this in YUV format interpreted as an 8-bit, 1280x1080 texture:


In Direct3D11, I'm loading this as a Texture2D with format R8_UNorm and dimensions 1280x1080. The tricky part is reconstituting the U and V planes, because as you can see, half the lines are on the left side of the texture, and the other half is on the right side (this is simply due to how Direct3D views it as a texture; the lines are simply one after the other in memory). In the shader, I do this like so:

    struct PS_IN
        float4 pos : SV_POSITION;
        float2 tex : TEXCOORD;

    Texture2D picture;
    SamplerState pictureSampler;
    float4 PS(PS_IN input) : SV_Target
        int pixelCoord = input.tex.y * 720;
        bool evenRow = (pixelCoord / 2) % 2 == 0;
        //(...) illustrating U values:
        float ux = input.tex.x / 2.0;
        float uy = input.tex.y / 6.0 + (4.0 / 6.0);
        if (!evenRow)
            ux += 0.5;
        float u = picture.Sample(pictureSampler, float2(ux, uy)).r;
        u *= 255.0;
        // for debug purposes, display just the U values
        float4 rgb;
        rgb.r = u;//y + (1.402 * (v - 128.0));
        rgb.g = u;//y - (0.344 * (u - 128.0)) - (0.714 * (v - 128.0));
        rgb.b = u;//y + (1.772 * (u - 128.0));
        rgb.a = 255.0;
        return rgb / 255.0;

However, for some strange reason, this seems to produce a weird horizontal pattern of smeared edges:


Note that if put either true or false as the condition (so that ux is either always or never incremented by 0.5f), the pattern doesn't appear - although of course we get half the resolution. Also note that I did a basically copy-paste C# translation of the HLSL code and it doesn't produce this effect:


FWIW, I'm using WPF to create the window and initialize Direct3D using its HWND via WindowInteropHelper. The size of the window is set to exactly 1280x720.

SharpDX Sharing textures between 32-bit and 64-bit process

27 August 2013 - 12:19 PM

I have a bit of a strange use case. My application uses 2 processes. One process is 32-bit and will create a texture (IDirect3DSurface9, i.e. SharpDX.Direct3D9.Surface) for sharing. It will send the shared handle to another process which is 64-bit and also uses SharpDX. From this process it will read the shared handle and attempt to open the shared texture. Ultimately the use case is that the 32-bit process writes into the texture using some unmanaged 32-bit libraries, and the 64-bit process displays the texture using AnyCPU/64-bit libraries.


I wonder if this can work at all considering IntPtr is 64-bit on 64-bit processes and 32-bit on 32-bit processes, and this is the type used to represent shared texture handles in SharpDX. Direct3D is a 32-bit-only API, right? Therefore the upper 32 bits of an IntPtr used to store a Direct3D handle should be unused, right?

Fast copying to rendertarget in D3D9

29 July 2013 - 04:02 PM



I'm a newbie (as ever) with Direct3D and I'd like to know what is the most efficient way to do the very simple thing I want to do. I have a bunch of pictures in system memory that all get updated 60 times per second, and after every update I want to get each of them into a different IDirect3DSurface9 that has the USAGE_RENDERTARGET flag set (for use with WPF). I am using SharpDX but I can easily translate from native C++ if you're more comfortable using that.


I have a working sample but I'm not happy with the performance. It starts choking with about 8 640x480 renderers, while I can get 50 renderers using a software approach. It doesn't seem right that a hardware approach would be slower. :S


So far here's what I've done (using SharpDX, I hope d3d afficionados will understand easily):


Device creation (where s_hwnd is the window handle of the application):


            s_d3d = new Direct3DEx();
            var d3dpp = new PresentParameters
                BackBufferCount = 1,
                BackBufferHeight = 1,
                BackBufferWidth = 1,
                BackBufferFormat = Format.Unknown,
                DeviceWindowHandle = s_hwnd,
                SwapEffect = SwapEffect.Discard,
                Windowed = true
            s_device = new DeviceEx(s_d3d, 0, DeviceType.Hardware, s_hwnd, 
                CreateFlags.HardwareVertexProcessing | CreateFlags.FpuPreserve | CreateFlags.Multithreaded,
                new PresentParameters[] { d3dpp }, new DisplayModeEx[] {});
This only happens once and the device is used to create all textures.
Each renderer gets its own surface, created thus:
Surface.CreateRenderTargetEx(s_device, 640, 480, Format.X8R8G8B8, MultisampleType.None, 0, true, Usage.None);

At render time we copy the data (a byte array) to the renderer's surface; this happens for each renderer.

        public void Render(byte[] data)
            // wpf code

            // d3d code
            var textureData = m_surface.LockRectangle(LockFlags.None);
            Marshal.Copy(data, 0, textureData.DataPointer, data.Length);

            // wpf code
            m_d3dImage.SetBackBuffer(D3DResourceType.IDirect3DSurface9, m_surface.NativePointer);
            m_d3dImage.AddDirtyRect(new Int32Rect(0, 0, 640, 480));
I'm thinking that invidually locking each texture is probably what is taking most time; perhaps there is a texture format more appropriate for this task, or an asynchronous way to lock all textures all once? Would Direct3D10 or 11 provide a faster method? I've searched online but without success so far.
Thanks for your help!


05 July 2013 - 02:36 PM

I have difficulty understanding the difference between those two values of the DXGI_SWAP_EFFECT enumeration. MSDN documentation states:

Use this flag to specify the bit-block transfer (bitblt) model and to specify that DXGI discard the contents of the back buffer after you call IDXGISwapChain1::Present1. This flag is valid for a swap chain with more than one back buffer, although, applications only have read and write access to buffer 0. Use this flag to enable the display driver to select the most efficient presentation technique for the swap chain.
Use this flag to specify the bitblt model and to specify that DXGI persist the contents of the back buffer after you call IDXGISwapChain1::Present1. Use this option to present the contents of the swap chain in order, from the first buffer (buffer 0) to the last buffer. This flag cannot be used with multisampling.
So, Sequential is for displaying the contents of the chain "in order", in other words, in the same order as you called Present(). If so, is Discard not in order? Obviously an older picture should never be shown before a newer one. What kind of "most efficient presentation technique" would this flag enable then?
What if your monitor's refresh rate is 60fps and your code is able to render at 90fps. In this case, every one out of two Presents, the queue will be full. What happens then? Does Present block until the next vsync, capping your rendering rate to 60fps and introducing input lag, or does it discard the oldest buffer in the queue with the new one and allows you to go on with your rendering code as fast as you can? Do these DXGI_SWAP_EFFECT flags have any bearing on the issue?

How to correctly synchronize a shared surface?

31 May 2013 - 08:51 AM



I am trying to share a direct3d9 surface between two processes. One process (let's call it A) writes to the surface, and the other (B) displays it on screen. Currently, process A does a StretchRect of its rendering surface to a shared surface, and then sets a flag in shared system memory to tell B that it's done. When B sees the flag, it then does a StretchRect of the shared surface to its own display surface. Process B then sets the flag again to tell A it is done.


It seems however that after the StretchRect on the shared surface returns, the texture has not necessarily finished copying, because sometimes Process B gets the previous picture, or sometimes even there is tearing (i.e. one half of picture N + one half of picture N + 1).


As I understand it, Direct3D is largely asynchronous under the hood, and does not ensure synchronisation between processes. I therefore need to ensure by myself that Process A has finished copying before displaying in process B, and vice-versa. Am I correct in my interpretation of the situation, and how would I achieve this? I am experimenting with LockRect() but I'm not sure if that's optimal or even guaranteed to work.