• Content count

  • Joined

  • Last visited

  • Days Won


galop1n last won the day on August 29

galop1n had the most liked content!

Community Reputation

977 Good

About galop1n

  • Rank

Personal Information

  • Industry Role
  • Interests


  • Twitter
  • Github
  • Steam

Recent Profile Visitors

3727 profile views
  1. Sky Domes

    2 triangles ? a sky dome is a infinite distant surface, all you need is a view vector from your pixel position and use it to intersect a virtual geometry to sample your sky textures
  2. The article that was given to you shown the issue nicely with the normal and edge flipping, i won't go over it again, but just accept that there is no single answer to the question "what is my vertex normal?". For using a texture instead of vertex color. If you use vertices, the interpolation is using barycentrique coordinate to interpolate between 3 values, this lead to your triangle not aware that it was part of a quad, losing a precious information. If you use a texture, and derive the texture coordinate in your pixel shader from the interpolated position instead to align the texel center to the vertex centers, you are using bi-linear interpolation now, and doing so make the interpolation consider the 4 vertex values. Now, it is time to blow you a fuse with sRGB. Luminance and brightness are two different things. Luminance is the physical property, twice the amount of light is twice the value, it is a linear. Brightness is not, twice as bright is not twice the value. It is your gamma space. The swap chain in RGBA8 is expecting sRGB values, aka the brightness. Now if you are not careful and just treat your values as luminance when it matters, you will screw up light accumulation, blending and gradients. Let's take an example, you have a pure grex 128 surface that reflect all light to you. Let say one one part of the screen, it is lit by a powerful light of intensity 1 and on another part, an all mighty light of 2. The result is pixels that are either 0.5 * 1 and 0.5 * 2, giving you 0.5 and 1. But in sRGB, 1 brightness is 1 luminance and 0.5 brightness is only 0.18 luminance, and now you realize that your second light was not twice as bright but 5 times as bright !!!! This only apply to the surface you lit because of the mistake, a different color would have get a different amount of boost or damp ! This is why it is important to work in linear space, and the same problems exists for the same reasons with gradient and alpha blending. What you need to be sRGB compliant : 1. albedo texture use FORMAT_SRGB variants, they are sRGB content (brightness value ) that need to be convert to luminance first when read in your shader 2. All colors if edited like in a color picker like in photoshop are brightness too and given as constant or vertex color need to be convert manually to luminance with the proper formula 3. Light intensity are physical values, they don't change ( but their color need to be converted remember ) 4. The render target view is also a _SRGB format, so the GPU can do the opposite conversion at write, for your swap chain to receive the proper content. The only reason we need to switch between this different representation is because we are more sensitive to low luminance and we need more bits in the dark to store values without banding. And it is what sRGB does
  3. I see a few ways to fix that. 1. use 5 vertices, with a point at the center you can fill with the proper interpolated value, but you now have 4 triangles instead of 2. 2. Store the vertex color in a texture, and read it directly in the pixel shader, it will provide you not the quad, but the losange shape that is more logical in a sense Something worth to mention, if you stay with 2 triangles per quads, you will have issues with the normal generation and the lighting as soon as the quads are not planar. The solution 1 is in that case the simplest again as the height will be derived from the quad and you do not have to deal with swapping edges. And another thing, you do not seems to be sRGB compliant and your gradients are messed up. Do not forget that the display buffer is a sRGB content, textures are sRGB content but lighting and color interpolation should all be in linear space inside a shader. You have to use the "_SRGB" variant of the DXGI_FORMAT for shader and render target view to let the hardware do conversion for you. The colors in vertices or constant buffer have to be converted either manually in code ( need at least 16bits per value ) or in the vertex shader (before interpolation).
  4. Below is a possible implementation, i don't say it is the fastest, but it show the logic clearly and it is quite easy to understand. Only profiling and tweak of the group count and parsing of the texture will lead to the optimum, but it should already be quite blazing fast This is just two dispatch with one small intermediate texture of w/8 by 1 pixel. The first pass is computing one average per column of 8 pixels width, write the value to the intermediate resource, then the second pass compute the average of the columns. Each pass compute first a local average for his own thread, then average the value for the group with a groupshared storage and finaly write the value if it is the first thread in the group. There is potential for errors in the code, i did not test it, but it should be quite close. EDIT: On hold, the missing float atomics on PC make it a little harder to implement than on PS4/XboxOne, this need some adjustement, i will fix that later // i assume the original image has dimensions that are multiple of 8 for clarity // you will create a texture of dimension [w/8, 1] of type float with uav/srv binding, call it Columns // you will create a texture of dimension [1,1] of type float with uav/srv binding, call it Result // At runtime : // SetCompute 1 // Set Rows to U0 // Set SourceImage to T0 // Dispatch( width / 8, 1, 1); // SetCompute 2 // Set Rows to T0 // Set Result to U0 // Dispatch( 1, 1, 1 ); // Voilà // Common.hlsli float Lum( float3 rgb ) { return dot(rgb,float3(0.25,0.60,0.15)); } // Pass1.hlsl #include "Common.hlsli" Texture2D<float3> sourceImage : register(t0); RWTexture2D<float> columns : register(u0); groupshared float intermediate; [numthreads(8, 8, 1)] void main(uint2 GTid : SV_GroupThreadID, uint gidx : SV_GroupIndex, uint2 Gid : SV_GroupID) { intermediate = 0; uint2 dim; sourceImage.GetDimensions(0,dim.x,dim.y); uint rowCount = dim.y / 8; float tmp = 0.f; for(uint row = 0; row < rowCount; ++row ) tmp += Lum(sourceImage[ GTid + uint2(Gid.x,row) * 8 ]) / float(rowCount); // this use the operator[], you can try to use a sampler+Sample to hit half pixels uvs here. GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0; InterlockAdd(intermediate,tmp / 64.f); GroupMemoryBarrierWithGroupSync(); // for the interlock if (gidx == 0) columns[Gid.x] = intermediate; } // Pass2.hlsl #include "Common.hlsli" Texture2D<float> columns : register(t0); RWTexture2D<float> average : register(u0); groupshared float intermediate; [numthreads(64, 1, 1)] void main(uint GTid : SV_GroupThreadID) { intermediate = 0; uint2 dim; columns.GetDimensions(0,dim.x,dim.y); float tmp = 0.f; for(uint col = 0; col < dim.x; col += 64) tmp += columns[col + GTid]; GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0; InterlockAdd(intermediate,tmp); GroupMemoryBarrierWithGroupSync(); // for the interlock if (GTid == 0) columnLums[Gid.x] = intermediate / dim.x; }
  5. Are you interested only in the 1x1 version ? or do you need all the chain ? To do short, Are you computing the average exposure for exposure adaptation or something else ? If you are interested only in the 1x1 result as i understand your question, you should forget about pixel shader, running some compute looping over the image, keeping the averaging in groupshared memory or in register will outperform the bandwidth of writing and reading full surface plus you get rid of expensive pipeline flush between the different reduction pass ( because of going from rtv to srv ). If you are interested in the full chain, running compute can also outperform, you can for example again save on reads by having a compute generating 3 mip in one run, doing the extra 2 by reusing what it read for the first reduction and working in groupshared memory. Forget also about the legacy GenerateMips, it is not a hardware feature and usually does a sub optimal job compared to a hand crafted solution.
  6. 32Bit Depth

    It is worth to note that there is one case you may want the unorm behavior and d16 is usually enough. It is with orthographic projection ( directional lights ), in that case, there is no perspective divide and you will achieve better precision by using a UNORM versus FLOAT. Plus you save half the memory of going 32bits.
  7. append/consume buffers, speed penalty?

    This is shader model independent and the only documentation you need is CreateUnorderedAccessView :
  8. This two things are unrelated, TTF files are just a storage for vector graphic glyphe representations. They are no different than PNG to be a storage for bitmap images. DirectX11 is a graphic API to draw stuff ( in a very broad way ). It is true that some draw operations are simpler than other, it has for example the concept of texture that map directly to the bitmap stored any image format of your preference. As you may have noticed, tho it does not have a native way to render vector graphics. It does not means it is impossible, there is techniques to render true type fonts directly from the original TTF data, it involves usually some pre-processing with weird triangulations and fancy shaders, but as it is complex, most of the time, it is enough to just rasterize the font into a bitmap, using stb_truetype, freetype or any alternative on the cpu, and fallback to the simpler native way to display stuff : textures. The details of how you will pack the various glyph into your direct X texture is totally under your control and depends on your needs, would it be static allocation, a dynamic cache of glyphes, will you generate a distance field representation, etc.
  9. append/consume buffers, speed penalty?

    The dx11 version usually implement the counter in the global data storage, it is like the local data storage of your computes, limited to 64K. (You can access it with cuda/opencl i believe but not direct x). The dx12 version is explicit, you create the counter buffer yourself, so in theory, no GDS for you, but the driver is still free to sneak in between, but who knows if it does it or not
  10. Swap chain creation failed

    With DX11, it is easy, you want to write to an UAV, you have to bind it at a fixed slot with a single API, and it is represented with a fat iUnknow object. With DX12, all you need is to allocate a slot in a descriptor heap, set the offset as a descriptor table and voilà. The thing is that the descriptor validity is up to the user, they are volatile, you could even set them after the command list is closed if you feel like it, or set some fence and trigger some nifty callback to late set a descriptor based on whatever. To be short, who knows what may be done ! This is why they can't assume anything and for the back buffer that has stronger requirements for being used by the compositor, it was not ok.
  11. Swap chain creation failed

    This is the answer of a Microsoft guy on the topic
  12. The space is a shader model 5.1 to allow two or more unbounded arrays of resources like "texture2D foo[]: register(t0,space1)" it does not concern you. And no, you can't bind the same writable object to two registers, it is a violation, and would lead to cache coherency issues if it was doable. If you want two variables to reference the same resource, then use one variable and rewrite your code to not use globals in the first place !
  13. Human eyes are more sensitive to low luminance, because of that, images are stored in gamma space or sRGB to give them enough bits in the dark area ( a 50% gray at 128 in your texture is in fact only 18% luminance in linear space). It means that when you read your albedo, you first need to convert it back to linear space, it is as simple as using a format for your texture view : DXGI_FORMAT_BC7_UNORM_SRGB versus DXGI_FORMAT_BC7_UNORM. The GPU will do the conversion for you, this is step one. Step two, you are in linear space, and it is the correct one to accumulate lighting information ( also the proper space to do alpha blending ). Step three if you were in a LDR pipeline ( so not our case ), the rendertarget is also a sRGB variant and the GPU do the opposite conversion for you. BUT, you want to do HDR, it means you do not have a sRGB variant format and that step is of your responsibility later ! Last step, now it is time to display your final image, you have linear space values in the range [0..infinity] and they need to map to [0..1] in gamma space, it is what your display understand. This is in two sub steps, first, the tone mapping pass, the formula rgb/(rgb+1) is called a tonemap operator. if you plot the curve, you will see the result is a compressed range of [0..infinity] scaled back to [0..1], it is what we need. Second substep, take the [0..1] range that is still in linear space to gamma space, either by applying the sRGB formula yourself or using a sRGB format variant to let the GPU do it fort you.
  14. I'm having trouble for texturing.

    You are using clamp addressing, you need wrap addressing. It is a sampler state parameter, you have to set it for u and v ( w is for 3d textures ).
  15. Yes, so does not make sense afterall There is only two things to get the basics. One, render to a surface that can represent values above one, like r11g11b11_float or r16g16b16a16_float. Second, Apply tonemapping to compress the range back to 1 the simplest be "rgb/(rgb+1)" That is all ! Then to looks good is all in details, pbr materials to looks good under various lighting condition and plausible, the choice of a brdf (usually ggx in game)and to be energy conservative, and do not screw up colorspace( albedo are srgb to be linearize, lighting in linear space, display go back to gammaspace srgb)