Jump to content
  • Advertisement

MysteryX

Member
  • Content Count

    142
  • Joined

  • Last visited

Community Reputation

285 Neutral

About MysteryX

  • Rank
    Member
  1. I fixed the issue by compiling the 20 files as 55 resource files within the DLL. Initialization is considerably faster too.
  2. Except that D3DCompile didn't exist when D3DXCompileShader was created.   Now I have D3dcompiler_47.dll ... it's a 3.53mb file. I'm not sure about creating a 3.53mb file dependency for a 400kb library just to simplify the use of a few files.   Either that, or I see how I can generate and use all the necessary CSO files with the various combinations of settings.
  3. I'm also using D3DXLoadSurfaceFromSurface. According to them, it works when running an older script containing only .cso files but fails with the newer script containing .hlsl files.   It's it's *only* D3DXCompileShader that fails and it has all these dependencies, then another option is to instead use D3DCompile and to ship D3dcompiler_47.dll with it.   And then I need to install the Windows SDK just for this...
  4. That's what I'm thinking, that there's some dependency that must be installed. It's working fine for everything else, but compiling the HLSL files requires something that is missing. Does anyone know what DLL dependency that would be?
  5. One thing I was thinking about is that I'm using a deprecated function. They recommend to use D3DCompile instead. After doing some search, with D3DCompile, some users on Windows 7 get an error they're missing D3dcompiler_47.dll so it's not any better.   Writing a log would be complicated... I'd have to write the code for it and send it to that user. It can be done but I'd first want to get an idea of where it could be failing.   Yes the code is multi-threaded. In this case, it is failing during the initialization phase that is being called here. In his case, a single instance of the class gets created so there cannot be a race condition during initialization.   Any other idea? Is there some kind of system or DLL dependency for calling this function?
  6. Debug build works fine and I shipped release.   I was previously shipping compiled shaders but some shaders have many variants and pre-compiling with all the variants would become complicated while they compile on the fly very quickly.
  7. I have DX9 code working perfectly fine on my computer. However, some users are telling me that they're getting an error opening up and compiling HLSL files. Here's where the files get compiled. https://github.com/mysteryx93/AviSynthShader/blob/master/Src/D3D9RenderImpl.cpp#L379   One such user told me he's using Windows 7 Ultimate 64bit (6.1 Build 7601), DirectX version 11.   Why is this failing for some users?
  8. MysteryX

    HLSL Dithering

    This looks close enough. It appears to be working, unless I'm missing something. sampler s0 : register(s0); uniform sampler s1 : register(s1); // the Bayer Matrix texture float4 p0 : register(c0); float2 p1 : register(c1); uniform float4 MatrixSize : register(c2); // width, height, 1/width, 1/height #define width (p0[0]) #define height (p0[1]) #define px (p1[0]) #define py (p1[1]) float Bayer(float2 uv) { uv = uv * MatrixSize.zw; float val = dot(tex2D(s1, uv).bg, float2(256.0 * 255.0, 255.0)); val = val * MatrixSize.z * MatrixSize.w; return val; } // -- Main code -- float4 main(float2 tex : TEXCOORD0) : COLOR { float4 c0 = tex2D(s0, tex); c0.xyz += ((Bayer(tex) - 128.0) / 256.0 / 255.0); return c0; }
  9. MysteryX

    HLSL Dithering

    I wrote the code to create the Bayer Matrix, trimming the 32x32 matrix from MPC-HC at 16x16 and copying each value into the B and G fields of a BGRA texture (does the order between both byte fields matter?) #include "Dither.h" // Dither matrix in 16-bit floating point format const unsigned short DITHER_MATRIX[DITHER_MATRIX_SIZE][DITHER_MATRIX_SIZE] = { 0x2c90, 0x38f4, 0x3bba, 0x29e0, 0x35f4, 0x3230, 0x3bbc, 0x3924, 0x3a46, 0x3644, 0x39e2, 0x370c, 0x3444, 0x3b1a, 0x3140, 0x39d2, 0x385a, 0x3b24, 0x2c10, 0x38c6, 0x3808, 0x2780, 0x3bbe, 0x37f8, 0x350c, 0x3a6c, 0x3368, 0x3bc0, 0x3000, 0x3886, 0x31b0, 0x3554, 0x3a94, 0x3618, 0x3430, 0x3a34, 0x3834, 0x39fe, 0x2740, 0x3758, 0x3494, 0x3b7a, 0x2700, 0x3958, 0x3858, 0x3a24, 0x364c, 0x3bc2, 0x3278, 0x3a22, 0x353c, 0x39de, 0x3268, 0x3a98, 0x36fc, 0x2ed0, 0x39e0, 0x30f0, 0x381a, 0x3996, 0x35ac, 0x3af2, 0x39b8, 0x37bc, 0x3250, 0x39dc, 0x3800, 0x30e8, 0x3b42, 0x34d4, 0x3970, 0x3afe, 0x3020, 0x3898, 0x33e8, 0x3b34, 0x2e10, 0x3320, 0x391a, 0x26c0, 0x3784, 0x38de, 0x3060, 0x3b5c, 0x3600, 0x38e6, 0x3490, 0x3b2a, 0x387a, 0x365c, 0x3b3c, 0x2be0, 0x37ac, 0x33d8, 0x2680, 0x3b98, 0x38d6, 0x2a60, 0x3b7e, 0x391e, 0x36d0, 0x2fe0, 0x3812, 0x32a0, 0x3a84, 0x36b0, 0x3a50, 0x357c, 0x37dc, 0x3b68, 0x3594, 0x3aca, 0x344c, 0x3a7c, 0x3674, 0x3884, 0x2d30, 0x3a48, 0x3170, 0x398e, 0x2900, 0x3a30, 0x34bc, 0x38ea, 0x3b70, 0x3a3c, 0x3852, 0x3460, 0x3b04, 0x37a0, 0x351c, 0x2d40, 0x3a80, 0x394e, 0x3b84, 0x3614, 0x3900, 0x2b20, 0x396c, 0x31b8, 0x38ca, 0x3a0c, 0x3038, 0x385c, 0x39a2, 0x2c70, 0x3ba2, 0x3464, 0x3992, 0x36dc, 0x3bc4, 0x3580, 0x3824, 0x32d0, 0x3abc, 0x2ec0, 0x3560, 0x30f8, 0x3974, 0x3610, 0x3a12, 0x3110, 0x3aaa, 0x38a2, 0x35e4, 0x341c, 0x28c0, 0x3a02, 0x34a8, 0x3b60, 0x3790, 0x3aa2, 0x2c40, 0x346c, 0x373c, 0x3bc6, 0x32f0, 0x37e8, 0x391c, 0x3100, 0x3af6, 0x2640, 0x3868, 0x3098, 0x3b3e, 0x3944, 0x3620, 0x3870, 0x39da, 0x374c, 0x3bc8, 0x2e20, 0x3804, 0x3932, 0x3660, 0x3260, 0x3bca, 0x38ce, 0x3ade, 0x382e, 0x30a0, 0x389e, 0x33a0, 0x363c, 0x3b86, 0x3910, 0x3a58, 0x2820, 0x36a0, 0x3b28, 0x34e0, 0x3a40, 0x3768, 0x3510, 0x3a54, 0x390e, 0x36e8, 0x2ae0, 0x3bcc, 0x31a0, 0x3aa4, 0x2600, 0x38cc, 0x3400, 0x3ac4, 0x2800, 0x3b4a, 0x39ee, 0x2cc0, 0x3764, 0x31c8, 0x35cc, 0x3bb6, 0x39a8, 0x2f30, 0x3a1e, 0x3816, 0x3160, 0x35b0, 0x389a, 0x3a86, 0x3070, 0x3848, 0x2d70, 0x38ba, 0x3baa, 0x2e60, 0x3414, 0x3ae4, 0x3544, 0x3a06, 0x37fc, 0x347c, 0x36d8, 0x3b12, 0x35a4}; HRESULT __stdcall CopyDitherMatrixToSurface(InputTexture* dst, IScriptEnvironment* env) { // Copy into BG values of BGRA texture int TempMatrix[DITHER_MATRIX_SIZE][DITHER_MATRIX_SIZE]{ }; short* pOut; for (int i = 0; i < DITHER_MATRIX_SIZE; ++i) { for (int j = 0; j < DITHER_MATRIX_SIZE; ++j) { *(short*)&TempMatrix[i][j] = DITHER_MATRIX[i][j]; } } HR(CopyAviSynthToBuffer((byte*)&TempMatrix, 4 * DITHER_MATRIX_SIZE, 1, DITHER_MATRIX_SIZE, DITHER_MATRIX_SIZE, dst, env)); return S_OK; } HRESULT __stdcall CopyAviSynthToBuffer(const byte* src, int srcPitch, int clipPrecision, int width, int height, InputTexture* dst, IScriptEnvironment* env) { // Copies source frame into main surface buffer, or into additional input textures RECT SrcRect; SrcRect.top = 0; SrcRect.left = 0; SrcRect.right = width; SrcRect.bottom = height; HR(D3DXLoadSurfaceFromMemory(dst->Surface, nullptr, nullptr, src, GetD3DFormat(clipPrecision, false), srcPitch, nullptr, &SrcRect, D3DX_FILTER_NONE, 0)); return S_OK; } However, I don't understand the logic of the shader. sampler s0 : register(s0); uniform sampler s1 : register(s1); // the Bayer Matrix texture float4 p0 :  register(c0); float2 p1 :  register(c1); uniform float4 MatrixSize : register(c2); // width, height, 1/width, 1/height   #define width  (p0[0]) #define height (p0[1]) #define px (p1[0]) #define py (p1[1])   float Bayer(float2 uv) {     uv = uv * MatrixSize.zw;     float2 val = dot(tex2D(s1, uv).bg, float2(256.0 * 255.0, 255.0));     val = val * MatrixSize.z * MatrixSize.w;     return val; }   // -- Main code -- float4 main(float2 tex : TEXCOORD0) : COLOR {     float4 c0 = tex2D(s0, tex);     c0.x = c0.x + 1 / Bayer(tex);     c0.y = c0.y + 1 / Bayer(tex);     c0.z = c0.z + 1 / Bayer(tex);     return c0; }   Can someone review this HLSL code?   Thanks
  10. MysteryX

    HLSL Dithering

    I'm really not good with HLSL but here's what I managed to fetch. Is this script correct? Either way it compiles and will be easy to change after your feedback.   It just gives "warning X3206: implicit truncation of vector type" on "return val" sampler s0 : register(s0); sampler s1 : register(s1); // the Bayer Matrix texture float4 p0 : register(c0); float2 p1 : register(c1); float2 MatrixSize : register(c2); #define width (p0[0]) #define height (p0[1]) #define px (p1[0]) #define py (p1[1]) float Bayer(float2 uv) { uv = uv * p1.xy; float2 val = dot(tex2D(s1, uv).rg, float2(256.0 * 255.0, 255.0)); val = val * MatrixSize.x * MatrixSize.y; return val; } // -- Main code -- float4 main(float2 tex : TEXCOORD0) : COLOR { float4 c0 = tex2D(s0, tex); c0.x = c0.x + Bayer(tex); c0.y = c0.y + Bayer(tex); c0.z = c0.z + Bayer(tex); return c0; } Do I apply the same noise on all 3 channels?   If I'm copying these values (0x2c90, 0x38f4, 0x3bba, 0x29e0, etc) into a D3DFMT_A8R8G8B8 texture, it would make more sense to use 'bg' instead of 'rg' and copy each value as the first 2 bytes of the 4-byte pixel.
  11. MysteryX

    HLSL Dithering

      OK. This needs to be called from another PS_3_0 HLSL file that has a Main entry point, correct? What would that main function look like?     I found this 32x32 Bayer Matrix in MPC-HC's source code. I can just trim it into 16x16 and discard the extra values, correct?
  12. MysteryX

    HLSL Dithering

    OK, your version allows using a 4x4, 5x5 or 6x6 matrix while D3DX_FILTER_DITHER uses only 4x4. Result-wise, I suppose it should be the same?   However, I came into an implementation problem. It needs to be applied when I downgrade from 16-bit to 8-bit. It also needs to be done on the GPU because transfering back to the CPU is a serious bottleneck. I need to read from the Render Target, copy into another texture while applying Dither, and then transfer to the CPU. However, it appears that transferring back to the CPU with D3DXLoadSurfaceFromSurface is *MUCH* slower than using GetRenderTargetData (no idea why).   One work-around would be to re-run the dithered image through another loop of processing (ugly design).   Another work-around would be to use D3DRS_DITHERENABLE while rendering, but I have no clue what kind of dithering it applies.   Or perhaps I'm best to adapt the code you posted. I'll need to translate it into PS_3_0 and C++ (and I know nothing of writing HLSL)   Oh, your code uses 16x16, 32x32 or 64x64 matrix instead of 4x4! That's a considerable difference, although I'm not sure which is best before doing x265 encoding.     Which means that to convert 16-bit (65536) to 8-bit (256), I need a ratio of 256, which means I need a 16x16 grid. The 4x4 grid definitely won't be optimal. I could code the generation of the matrix, or perhaps the best is to simply hardcode the values. Anyone knows where I can find it?
  13. MysteryX

    HLSL Dithering

    Thanks! So I'd have to translate that TypeScript into C++?   I just found this.   D3DXLoadSurfaceFromSurface has the option D3DX_FILTER_DITHER "The resulting image must be dithered using a 4x4 ordered dither algorithm."   Contrary to D3DRS_DITHERENABLE, this one clearly defines that it does.   Is that what I'm looking for or your implementation is better?
  14. MysteryX

    HLSL Dithering

    I'm processing video data through a series of HLSL shaders with 16-bit-depth before returning 8-bit-depth data back to the CPU, using DirectX 9.   I just realized I'm not applying any dithering!   I saw this option in DirectX. What kind of dithering does it apply? Some people told me it probably depends on the GPU and I shouldn't rely on it. m_pDevice->SetRenderState(D3DRS_DITHERENABLE, TRUE);   I would like to apply Ordered Dither at the last stage of processing. Has anyone done such an implementation?
  15. MysteryX

    Using Max Capacity with 1 Engine

    I found the solution. The main issue was that I was using the flag D3DPRESENT_INTERVAL_DEFAULT instead of D3DPRESENT_INTERVAL_IMMEDIATE which limited performance as if I wanted to display on 60hz (except that I don't display anything to the screen).   From there, I did several optimizations.   With one engine with 8 threads calling it, and locking the thread for the part that uses the renderer (excluding memory transfer in/out), I can now get decent performance.   I get best performance by creating 2 devices and alternating between them. Frame 1 goes to device A, frame 2 goes to device B and so on.   Now the performance is pretty good! Almost twice as before.
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!