Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

230 Neutral

About yuri410

  • Rank
  1. The constant table is reconstructed by D3DX by extracting the CTAB section in the bytecode data. It is just a comment section.   http://www.gamedev.net/topic/648016-replacement-for-id3dxconstanttable/   Form the looks of it, sebi707 provided some code from Wine on reading the data from this section. Then you can write data as well.   The way I did it was to strip that comment section and append one in my own format describing the constants.   Edit: unfortunately his does not seem to handle the default values, but wines does seems to check it.
  2. Have you considered using a smaller format for the packed texture? This may be a little less relevant, in my D3D9 case I had a depth texture of A32B32G32R32F while only sampling one channel compared to sampling a R32F, the later is obviously faster. Because bottleneck is the sampling amount(14 in my case) there on the depth value and the bandwidth would be higher for the larger format. This is just my case though. I believe your blur pass(3) is using a similar calculation pattern right? (calculating from a series of samples with something from a previous texture lookup)   Also, my noise texture is just 2x2, so I put it as an array lookup. IIRC this is insignificant in terms of performance for such a small texture. Not sure about rand() though. I also used a 16bit fp buffer for normal input, it is a tiny bit faster. I believe this is because the dependent calculation later on is using a normal vector that easier to get.
  3. yuri410

    Camera transformation enquiry

    I think what he is saying is to extract 3 side ways vectors from the current matrix, being dx = { M11, M12, M13 } dy = { M21, M22, M23 } dz = { M31, M32, M33 } Then you need to pick one or a combination of these. Think of it as the local coordinate space with xyz axis equals to these, represented in the global coordinate system. When you have the translation vector, add it to your {M41, M42, M43}. In your case, if sideways can be represented by dx alone, that is simpler.
  4. Well, one of the ideas is to make secret checks deep inside the game code. When a pirated copy is detected, in addition to prevent game loading up, also deliberately mess up some game logic/rendering. Make it look intentional, or the software will look buggy. Like the large toilet thing in Sims 4, it is harder to track down a multiplying in dissembly and there is no obvious API calls around. Try not to use a single type of executable check or that would be a weakness point for hacker to focus on.
  5. When trying to blur a monochromatic in a separate X-Y Gauss blur, one can have the following PS code. #define SAMPLE_COUNT 25 sampler2D tex : register(s0); float weights[SAMPLE_COUNT]; float2 offsets[SAMPLE_COUNT]; float4 main(float2 uv : TEXCOORD0) : COLOR { float color = 0; for (int i = 0; i < SAMPLE_COUNT; i++) color += tex2D(tex, uv + offsets[i]).r * weights[i]; return float4( color, color, color, 1); } The issue is, even if only taking the only one channel out of it or just use L8 format, the amount of texture look up is there. With large amount of samples this can have significant performance loss. So the idea is, why not pack each 2x2 area into a A8R8B8G8 texture and blur it together? After packing the texture size and the amount of blur samples will be half.   So I had 2 types of blur to compare, one traditional blur, and one packed blur has the process like: Pack->BlurX->BlurY->Unpack. When I render a 1280x720p blur, it took 1.51ms(fps=661) for the traditional mode, with packed blur it is 0.58ms(fps=1716).   There is one downside, that the pixels within the 2x2 quad are not blurred so there might be details remaining. I ended up a bilinear filtering when unpacking the texture. I checked the diff shown below.   [table][tr][td=15]Original[/td][td][/td][/tr][tr][td]Traditional Gauss Blur[/td][td][/td][/tr][tr][td]Packed Gauss Blur[/td][td][/td][/tr][tr][td]Diff with a high scale which seems in hundreds range in linear.[/td][td][/td][/tr][/table] I guess this would come in handy for blurring the objects' drop shadows on screen.  Here is the packer float4 main(float2 uv : TEXCOORD0) : COLOR { float4 r; r[0] = tex2D(tex, uv).r; r[1] = tex2D(tex, uv + float2(1.0 / TEX_WIDTH, 0.0)).r; r[2] = tex2D(tex, uv + float2(0.0, 1.0 / TEX_HEIGHT)).r; r[3] = tex2D(tex, uv + float2(1.0 / TEX_WIDTH, 1.0 / TEX_HEIGHT)).r; return r; } Here is the unpacker float4 main(float2 uv : TEXCOORD0) : COLOR { float2 c = float2(TEX_WIDTH, TEX_HEIGHT) * uv / 2; float2 cfrac = frac(c); float4 colors = tex2D(tex, uv); float lvl = lerp( lerp(colors[0], colors[1], cfrac.x), lerp(colors[2], colors[3], cfrac.x), cfrac.y); return float4(lvl, lvl, lvl, 1); }
  6. Finally, worked this out. There are 2 issues combined. First that getPosition(), when compared to sampling position from a texture, this is actually different behavior, because it does not simulate the correct texture filtering. In this case NearestPoint, getPosition() is a continuous function, but the depth value is discrete. When sampling other surrounding samples in a SSAO, if the texture coordinate is not aligned like NearestPoint sampling, the sample will be incorrect, with continuous xy and discrete z as a result. float3 positionFromDepth(float2 uv) { float2 alignedUV = uv; alignedUV = saturate(alignedUV); // clamp alignedUV = floor(alignedUV * screenSize) * invScreenSize; // NearestPoint float3 dir = 1; dir.xy *= alignedUV * float2(2, -2) - float2(1, -1); dir.xy *= unprojScale.xy + unprojScale.zw; // this is just to increase precision, seems useless, just use unprojScale as float2. dir *= tex2D(DepthTex, uv).zzz; return dir; } So I tried to make it behave like NearestPoint, but never made it perfect. I am not sure, but it always appears the intrinsic functions in HLSL has less accuracy. I simulated the SSAO effect on CPU, using functions like floor() to align texture coord, it works perfect. Not on shaders though. I always get noise with horizontal/vertical lines mixed with the result.   But luckily, I found the distortion always shows up when the camera is near +45, -45, +135 -135 degrees(top down RTS camera). So I used a workaround by pre-rotating the matrix by 30 degree around Y axis. So now they are temporarily gone. I suspect this is caused by edge conditions in the floor function. The SSAO sample offsets may gone unwanted to the floor() function at those angles.   I am going to test this a bit more and see if the distortion shows up in other conditions.   Edit/Update:  The source of this noise distortion has been identified. floor() is innocent. Correct texture filtering still need to be simulated if unprojecting. But taking a look at the following only involving textures. DepthTex is the view space position buffer. When only sampling the z component, treat it as view Z depth buffer. float3 positionFromDepth(float2 uv) { float3 dir = tex2D(SameDepthTex, uv).xyz; dir = normalize(dir); dir /= dir.z; // normalize to z=1 plane dir *= tex2D(DepthTex, uv).zzz; // SameDepthTex and DepthTex are the same return dir; } And/or (is equivalent to the following) float3 positionFromDepth(float2 uv) { float3 dir = 1; dir.xy = tex2D(xyTex, uv).rg; // xyTex is the same as unprojecting similar to the ones before, but calculated on CPU. dir *= tex2D(DepthTex, uv).zzz; return dir; } When compared to  float3 getPosition(float2 uv) { return tex2D(DepthTex, uv).xyz; }  The first two have precision loss, because of multiplying the big number Z. The good news is that I don't need to worry too much about the "mysterious" part of it since the source of the problem is identified. Just add an option in the options menu to turn off optimization.
  7. Based on my testing, the precision is like follows:   Z/W <= Frustum corner ray < Normalized unprojected ray to far plane * ray length <  Unprojected ray to Z=1 plane * ViewZ The idea is to use as less division and intrinsic functions as possible for less error. Maybe it is just my implementation issue.   However, I figured out something interesting, that this SSAO artifact has nothing to do this precision error.   The weird thing is, for the exact same data, if you sample xyz from a texture and use it, the result is correct. But if you sample z and calculate xy', there are artifacts, given xy' == xy. I achieve this by forcing xy in the texture always the same as the calculated ones (xy'). So now I am playing with compiler options in case this has anything to do with optimization.
  8. Alright, I tried simulating the result in a c# program in double precision. Turns out, there is no way this can get accurate. All needed for proof is to unproject a ray to the far plane, compare the direction of it against the ray to the corresponding position stored in the position buffer, both in view space. And it will not 100% match, even with standard way of unprojecting, not the simplified one. The pattern of this precision error matches the one above. I suspect this is caused by imprecision in rasterization. Now the only way to get this optimization is to use a SSAO that tolerates error better. That is going to be very complicated since the current one adds to the graphical style very well...
  9. Hello,   I had a SSAO effect that operates on positions and normals. Initially it samples the position buffer(A32B32G32R32F) in a way that uses a lot of bandwidth. So I decided to optimize it. I attempted to reconstruct the position from a R32F buffer of view space z, so that the usages will be much smaller. Based on my testing it should save 2 ms of GPU time on my computer at 1080p.   The issue is, the SSAO effect is sensitive to precision errors. When I use the large position buffer, it works just fine. However, if using depth then I got annoying artifacts that jitters based on my camera. Well I have been trying to figure out the potential cause of this and made a little progress. It seems there are precision errors from the position constructed. And at least the error is from the x and y of the calculated position, not z. So I am asking to see if there are any potential areas to look at improving the precision.   The way I do this is to store the view space Z in an earlier pass, like: VS: float4 viewPos = mul(input.Position, worldView); output.ViewZ = viewPos.z; PS: output.Color = input.ViewZ; In a later pass, I reconstruct the position like this: float2 unprojScale : register(c0); float2 invScreenSizeHalf : register(c1); sampler2D DepthTex : register(s0); float3 getPosition(float2 uv) { float2 b = invScreenSizeHalf; float3 dir; dir.xy = (uv - b) * float2(2, -2) - float2(1, -1); dir.xy *= unprojScale; dir.z = 1; return dir * tex2D(DepthTex, uv).r; } unprojScale.x is the first element in the first row in the inverse projection matrix, and unprojScale.y is the second element in the second row of the matrix. It appears this is just identical to using inverse projection directly. Because other elements are either 0 or not used. invScreenSizeHalf is 0.5 / ViewportSize. Here is a diff comparing the original position against the reconstructed position. The precision shows up is after 10000 times scale, or it will be too small to be visible. float2 diff = posFromDepth.xy - posOrig.xy; float d = length(diff) * 10000; And as mentioned earlier the z of the position is identical to the source, also checked the render target dump.   The error values flash a lot when the camera moves. It also appears the error is correlating to the mesh triangles even if they are completely flat. The bright object on the right is far, and obviously the error has scaled up after multiplying high view z in the calculation. I am not sure how this error on x&y is caused by unprojection. The values in unprojScale are around 0.x, not too extreme.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!