# Tsus

Member

337

1186 Excellent

• Rank
Member
1. ## How to calculate Lumens?

@Hodgman: I have this vague feeling that it's not my place to correct you, but since you asked...     You're mixing photometric and radiometric terminology, here. Candela (cd) is the unit of luminous intensity (photometric). The equivalent radiometric quantity is radiant intensity and it's unit is watts per steradian.     Candela is lumen per solid angle.     The unit of luminance (photometric) is: cd/m²     @CGEngine: If I'm not totally mistaken, you can just plug the photometric units into the rendering equation. For some reason the (English) rendering literature often uses the radiometric names for the quantities (radiance, irradiance, ...), but is using photometric units (lumen, candela, ...), which is totally confusing.   The radiometric units are used in radiation physics. The photometric units are used whenever the human perception is concerned. (They just differ in the way how they weight light of different wavelengths. If you're not working with spectral lighting effects, you don't really need to care.)   The dot product formula you posted earlier is the photometric way of mixing RGB values into some luminance value (the weighting is according to the spectral response curves of the "average" eye).
2. ## Recommend a book for algorithmic 3D modelling theory

Hi axefrog,   I have some book references for you that cover at least the NURBS part.   "Fundamentals of Computer Aided Geometric Design" by Hoschek and Lasser is a standard reference for modeling with curves and surfaces. It also covers various methods for interpolation, intersections, blending and smoothing of surfaces.   In Farin's "Curves and Surfaces for Computer-Aided-Geometric-Design", there's a bit more background on continuity, splines, and different kinds of operations you can do with curves and surfaces. It also contains exercises.   Here are some applets that demonstrate a few of the algorithms discussed in those books.   Hope that helps a little. ;) Cheers!
3. ## Depth stencil state issue

Hi!   The black depth channel is somewhat expected, due to the non-linear distribution of the z-values. Right beside the combobox where you selected the "depth" channel, you have a gray bar that allows you to filter the range displayed linearly. There are tiny triangles (top-left at the bar, and bottom-right). These are the lower and upper bounds. You can move them around, so that you have the lower bound around 0.95. That should show you something.   On the top left of the window is a small "save" icon. You could save the PIXrun and upload it, so that we might have a look. Best, Tsus
4. ## Depth stencil state issue

Ok. So, with all resources being successfully created and no errors in the output window during draw calls, it's time to use a graphics debugger like PIX. PIX is contained in the DxSDK and allows you to capture the draw calls and states for a frame (every time you hit F12). When clicking on a draw call in the list presented to you in PIX, the vertices of the triangles are shown before and after transformation. If you do a right click on the rendering output you can debug individual pixels and see, why fragments were discarded. Also, you can dig into the states bound at a draw call. Hopefully, this sheds some light on the matter.
5. ## Depth stencil state issue

Hi eltharynd, Usually, CPU access flags are not required for the depth buffer. Try to turn on the debug layer, as eppo suggested. D3D will most probably tell you what is wrong. The behavior you described sounds like invalid parameters. Pass the flag D3D11_CREATE_DEVICE_DEBUG into the Flags parameter of the D3D11CreateDevice call. Could you check that all your mDevice->Create... methods return S_OK? If not, the debug layer will tell you in the Output window of visual studio, why the Create-method failed. (If you run in Debug mode with F5, not Ctrl+F5.) Clearing structs with ZeroMemory is a good practice (in my eyes). Could you do it for the depth stencil desc, too? (Just to be safe.) ZeroMemory (&dsDesc, sizeof(D3D11_DEPTH_STENCIL_DESC)); Also, you have some copy paste error when clearing the depth stencil view desc. It should go: ZeroMemory (&depthStencilViewDesc, sizeof(D3D11_DEPTH_STENCIL_VIEW_DESC)); Just to be safe, set a rasterizer state. Whenever you move code around, you don't want to depend on states of some previous code block. D3D11_RASTERIZER_DESC rsDesc; ZeroMemory(&rsDesc, sizeof(D3D11_RASTERIZER_DESC)); rsDesc.CullMode = D3D11_CULL_BACK; rsDesc.DepthBias = 0; rsDesc.DepthBiasClamp = 0; rsDesc.FillMode = D3D11_FILL_SOLID; rsDesc.AntialiasedLineEnable = false; rsDesc.DepthClipEnable = true; rsDesc.FrontCounterClockwise = true; rsDesc.MultisampleEnable = true; rsDesc.ScissorEnable = false; rsDesc.SlopeScaledDepthBias = 0; if (FAILED(mDevice->CreateRasterizerState(&rsDesc, &mRsBackfaceCulling))) return false; Then, later: mDeviceContext->RSSetState(mRsBackfaceCulling); And don't worry. You'll figure it out. Best, Tsus
6. ## Depth stencil state issue

Hi eltharynd! If you don't need the stencil test, you probably want to disable it. dsDesc.StencilEnabled = false; In this case, you can put full precision into the depth value by using a full float for it: depthTexDesc.Format = DXGI_FORMAT_D32_FLOAT;   Currently, your depth testing function permits all fragments to be written. Instead, let only fragments through that have a smaller depth value. dsDesc.DepthFunc = D3D11_COMPARISON_LESS_EQUAL;   It might be just due to copy-pasting into the browser, but did you use the correct depth stencil view description, when creating the depth stencil view? (2nd argument changed) mDevice->CreateDepthStencilView(depthStencilTexture, &depthStencilViewDesc, &mDepthStencilView); I guess you did, but for completeness I'm just saying it: You bound your depth buffer like so, right? ID3D11RenderTargetView* rtvs[] = { mRenderTargetView_Backbuffer }; ImmediateContext->OMSetRenderTargets(1, rtvs, mDepthStencilView); Do you clear the depth stencil view before drawing? mImmediateContext->ClearDepthStencilView(mDepthStencilView, D3D11_CLEAR_DEPTH, 1, 0); Also, were all your resources successfully created? (That is, they are not NULL?) Best, Tsus
7. ## blending question

Hi! Will your float texture ever contain negative values? It's not strikingly elegant, but: in case you need to clear a pixel, you could write out a large negative value, giving you after the additive blending something that is smaller than zero. When you use the texture later, you could clamp the value again up to zero: max(0, texture_color).
8. ## OIT with weighted average

Hi,   Your suggestion is already the solution. So, yeah, you got it right. I'll summarize the steps briefly for you: Your transparent rendering would happen in the rendering loop after the deferred pass: Fill your deferred buffer (color, depth, …) Do the deferred lighting Render the transparent objects For the last step, do the following: Disable depth writing (glDepthMask) -> we want no transparent object to be culled by other transparent objects Enable depth test (glDepthTest) -> compare with your deferred depth buffer (cull by opaque geometry) Use additive blending (source: GL_ONE, dest: GL_ONE) Bind two new render targets; let’s call them AccumColors (four (half) float components) and AccumCount (single float component) Render the transparent objects into those two targets (explained in a moment) Bind deferred color buffer as render target Disable depth test (glDepthTest) Use back-to-front blending (source: GL_ONE_MINUS_SRC_ALPHA, dest: GL_SRC_ALPHA) Bind AccumColor and AccumCount as textures Full screen pass: compute the average color/transparency (explained in a moment) and blend with the deferred color buffer The idea is to compute the average of the colors, weighted by their transparencies. For blending with the background, we additionally need the average opacity. (FYI: I assume that alpha = 1 means opaque, and alpha = 0 means transparent)   Step 5 computes the sums. This is what the fragment shader does: Input: vec3 color, float alpha Output: AccumColor = vec4(color*alpha, alpha); // color multiplied (=weighted) with alpha! AccumCount = 1;   Step 10 compute the average color. Again, the fragment shader: Input: vec4 AccumColor, float AccumCount if (AccumCount < 0.00001 || AccumColor.w < 0.00001) {   discard;  // nothing here; discard the fragment. } else {   vec4 avgColor = vec4(   AccumColor.xyz / AccumColor.w,    // weighted average color   AccumColor.w / AccumCount);        // average alpha   // the alpha, used to blend with the background is computed by assuming // that all transparent layers have the average alpha:   float dstAlpha = 1-pow(max(0,1-avgColor.w), AccumCount);   // write out the average color and the alpha, used for compositing   result = vec4 (avgColor.xyz, dstAlpha); } Hope that gives some insights. Best regards!
9. ## DX11 D3D11 - RenderTargetView at slot 0 is not compatable with the DepthStencilView

Hi! Your render target view (that views the texture to render into) and your depth stencil view (the corresponding depth buffer) have different multi-sampling settings. When using multi-sampling, every pixel needs to store extra data for the sub-samples (their depth and the coverage bits). Each texture resource is prepared for one certain multi-sampling setting and to make them work together, both the color texture and the depth buffer (after all, it’s just another texture) need the same setting. You can have resources with different settings, but you can only bind them together, if their settings coincide. If you disable multi-sampling, it would be: sampleDesc.Count = 1; sampleDesc.Quality = 0; High-quality 4x MSAA (for instance) is achieved by setting: sampleDesc.Count = 4; sampleDesc.Quality = 16. Keep in mind that MSAA comes at a cost. If you don’t need it, try to avoid it. Also, combining multi-sampled with single-sampled textures requires you to convert one into the other. (I'd advise you to first familarize yourself with the rendering to textures, before working with multi-sampling)   Okay, so try to set in your default texture (system class): texd.SampleDesc.Count = 1; and make sure that your view dimensions are set to the single-sampled types, e.g., for the depth stencil view: D3D11_DSV_DIMENSION_TEXTURE2D  (not: D3D11_DSV_DIMENSION_TEXTURE2DMS).   Cheers!
10. ## Waterfall textures

I'm not quite sure, but.. right next to the big rock in the center seems to be a pattern that repeats (the faster texture layer). Somehow the layer seems to move very, very slowly perpendicular to the flow direction, because this patterns moves after 3 or 4 iterations closer to the rock. From that I'd guess they used static textures. Perhaps just try to implement this with static textures first. I guess, having nice textures does the trick, here.
11. ## rendering to 3DTexture performance question

Hi!   Since you have to specify which slice you want to render into in a geometry shader by using the semantic SV_RenderTargetArrayIndex, I'd suggest to use the geometry shader approach. I would render 100 vertices as a point list, without actually binding a vertex buffer (so that you get 100 vertex shader invocations). The vertex shader does nothing. In the geometry shader you can take the SV_PrimitiveID (in 0..99) to specify the target slice to render into. Expand then a quad (as a triangle strip) with the coordinates (-1,-1,0) (-1,1,0) (1,-1,0) (1,1,0). The pixel shader receives with SV_Position the texel coordinate in the slice. (Additionally you can pass the SV_PrimitiveID from the geometry shader to the pixel shader, if you need the slice id.) Be careful, though, because SV_Position comes in as float, so better cast it to int.   You can also use a compute shader, if you want to avoid the rasterization pipeline. (Hard to tell, whether it will be faster, since you'd need to switch from the rasterization pipeline to the compute pipeline and back. If I recall correctly, I heard in a talk at gamefest someone mentioning that frequent switches are to avoid.)   Cheers!
12. ## DX11 (Order Independent) Transparency

Hi!   What you ended up using is called „alpha test“ and is not truly a solution to the blending of many transparent objects, though it is a common hack to avoid the problem. Whenever you really need to approach true order-independent transparency, there are a number of methods: Depth peeling [Everitt 01, Bavoil & Myers 08] (Requires to know the number of layers, but was used quite often in research papers. Also works on Dx9.) Concurrent linked list construction [Yang et al. 10, Yakiimo 10]-> do not look into the DxSDK sample (OIT11). Last time I looked, it was poorly implemented (didn’t use shared memory at all).  Check the version of Yakiimo. He implemented it faster, even with multi-sampling. This technique grew more useful by the recent advances of the graphics hardware. Requires Dx11. Stochastic transparency [Enderton et al. 10] (also rather a research thingy. It can handle an arbitrary number of layers, but consumes much performance to get it frame-to-frame coherent). Needs Dx 10.1 if I recall correctly, as it works on the coverage mask. The list goes on... stencil routing etc... (Look into the related work of the papers I linked, if you want to learn more.) AFAIK, true order-independent transparency was yet too expensive for games. (If someone knows a game that used OIT, please let me know!) It is easier to work around it or just sort the transparent objects by depth. (Of course, this doesn’t work in all cases, but you can tell your artist to circumvent the ugly cases.) I’m creating technical demos and prototypes, so I can afford to use the concurrent linked list construction.   Best regards!   PS: Links don't seem to work at the moment, so I post the URLs. Everitt 01: http://gamedevs.org/uploads/interactive-order-independent-transparency.pdf Baviol & Myers 08: http://developer.download.nvidia.com/SDK/10/opengl/src/dual_depth_peeling/doc/DualDepthPeeling.pdf Yang et al. 10: http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8659.2010.01725.x/abstract Yakiimo 10: http://www.yakiimo3d.com/2010/07/19/dx11-order-independent-transparency/ Enderton et al. 10: http://www.enderton.org/eric/pub/StochasticTransparency_I3D2010.pdf
13. ## Efficient Rendering of Hundreds of Billboards

Hi!       Tsus,   I see now ; I would have an "artifically" large VB and alter the vertex position in the shader. Even though they are all the same mesh, I would do the location transform when loading up the VB rather than using a matrix to transform it at render time. Is the resultant call so quick that I dont need to bother about culling ? Or would you subdivide the IBs into sets based on your world graph so that you dont end up rendering invisible billboards ?   Thanks,   Using one large VB in a single draw call is definitely faster than drawing each quad individually. Though, sooner or later you will run into scalability problems, too. Thus, at some point culling is advisable. You could try it out and see, whether you can live without culling. (No point in optimizing things, if you don’t know how hard they hit the performance, right?) The actual performance depends on the size of the quads on the screen (and their overlap, i.e., increase in fillrate), the complexity of the shaders, the blending operations you apply, etc. For the culling, two options come to my mind:   1. As you said, you can update your buffers (either having a static vertex buffer and a dynamic index buffer, or directly a dynamic vertex buffer) to draw only quads that are visible. The problem is you need dynamic resources even for objects that are actually static.   2. You can divide your scene into small static blocks (aligned in a grid or a grid hierarchy) and cull them conservatively. This means, you would render sometimes a few billboards that are not on the screen, since you only cull entire blocks. On the plus side, all static objects would reside in static buffers.   In the second approach you would need one draw call per block (for the static objects). Depending on your scene data structure, you may already have a space-partition of your scene. If your scene is already organized in a grid, you could cook up the vertex buffers for each grid cell individually. Essentially, the grid size is a trade-off of the number of objects you can cull and the size of the batches. (The smaller the grid cells, the better you can cull. The larger the grid cells, the faster are the draw calls.) Billboards are usually pretty light-weight so it is no catastrophe if you miss a few in your culling.   Best regards!
14. ## Efficient Rendering of Hundreds of Billboards

Hi! You’re using XNA, right? A common trick to render many billboards (if you don’t have geometry shaders available) is to send four vertices with identical position to the vertex shader and to expand the quad in view space, by offsetting each of the vertices using their texture coordinates. (The texture coordinate identifies in which corner to move the vertex.) This way, you don't need to calculate rotation matrices on the CPU (or the GPU). Thus you don't have to set additional effect constants and therefore you don't need to call Apply for every single billboard, but only if the texture changes. (So, batching, i.e. sorting per material, would be a good idea.) Even better, you can render all billboards that share the same texture with a single draw call, by throwing all billboards into a single vertex buffer. See here, for an example (Section 1.2). Best regards!   PS: You do not necessarily need a quad. You could also use a single right triangle that covers the whole quad. (Of course, some area would be unused.) If you go down this road, you trade input assembler load against rasterizer load. You would need to profile to see what's better in your situation. For starters I would suggest to use quads, since they are more intuitive.
15. ## reading from texture performance

Hi! Reading from the same address in texture memory, won’t be as bad as writing to the same address. But, for the reading from one address rather use constant buffers instead of a texture, because of two reasons (cf. [url="http://wwwae.ciemat.es/~cardenas/CUDA/T6-ConstantMemory.pdf"]Montes[/url]):[list=1] [*]Constant memory is optimized for broadcasts. This means, if all threads (i.e., pixel shader threads) in a warp (i.e., group of 16 or 32 threads dependent on the hardware) read from the same constant buffer address, only one memory access is requested and its result is “broadcasted” (send) to all threads. This saves 15 or 31 memory accesses and therefore memory bandwidth. [*]Each multiprocessor has its own constant memory cache (8 KB), thus only the first warp will read from memory, all others directly find the constant in the cache. [/list] Best regards!