  1. here is one from PowerVR, quite different from PC graphics card: https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/
  2. hi! vkCmdPipelineBarrier needs srcStageMask and srcAccessMask, what should be the initial value for these parameters right after a resource is created? In DirectX 12, you can specify initial state for resources, what about vulkan?
  3. In Metal, MTLVertexDescriptor + [[stage_in]] is very similar to input layout in DX11, but [[ stage_in ]] does not support packed vector type, why? The problem is metal::float3 need to be align on 16Byte boundary, while metal::packed_float3 does not. If i don't use [[ stage_in ]],  vertex buffer and constant buffer will be forced to use the same [[ buffer(...) ]] name space, while in DX11 constant has it's own namespace, which means the same cb would have to use different index in DX and metal shader. Any suggestions? Thank you! 
  4. for question 1: We use UvAtlas in production to unwrap uv. after uv is unwrap,  face number remain the same, but vertex number may increase. Try to image unwarp a teapot onto a texture, parts of the teapot must be separated, so some vertex is duplicated, and position at different place on uv space. to use UvAtlas: 1.DirectX::GenerateAdjacencyAndPointRep 2.DirectX::UVAtlasCreate 3.DirectX::UVAtlasApplyRemap Can't remember the detail, take a look at the command line tools MS provided will help.
  5. Still investigating this issue. We clear color, depth and stencil every frame, but in two call, one clears color, the other clear depth and stencil. What is strange is that other game's present packet is in the 3D queue, but ours is in the copy queue. And our game run faster on a notebook with NV 635m than the brand new NV960m, using the same configuration. :blink:
  6. Our game has a huge frame rate drop on a notebook equipped with Nvidia 960M GTX Optimus. Turn out it spend a large amount of time on the copy queue. And there are many "signal command packet" and "wait command packet". Looks like they make the render thread blocking until copy finished. Any idea why this thing happened? Thank you.   By the way, I also check league of legends on this machine, they have "signal command packet" as well, but no sign of "wait".   [Update] After twiddling with the nvidia control panel, frame rates double. There are still present packet in the hardware copy queue, but it use significantly less time than previous runs.  I am still confused which option did the magic, any way, it works, for now... thanks everyone for your suggestion.
  7. I'm stuck with DX9 api.  LogLUV did have artifact when bilinear filter is on. Is LogLUV  only good at being a slim HDR color buffer? Can I implement a HDR deferred shading pipeline,  while using LDR light map at the same time? 
  8. hello! I'm using LogLUV to encode HDR light map. I think I can use two DXT5 texture to compress the xy and zw components separately with 1 to 2 compression ratio. Is this the way to go? Thank you!
  9. Suggest we have a 2D curve, and we want to generate a ribbon with width r. At the begin, we draw a circle with radius r, then we move to the next extrude point alone the curve, and we draw another circle. We do the previous step until we reach the end of the curve. We end up with a curve like a pearl necklace. Then we extrude. Say we extrude at a point P on the curve, if the extruded vertex is inside a circle which do not belong to P, it's a self-intersection.
  10. how about this: For each vector in the series, first recover the (x,y,z) from (x/y, face), than dot(v, c), where c is the direction of the center of the circular region.  This give you the cos(a) of the angle between the vector and the direction of the circular region.  The radius of the circular region is proportional to the cos(a)
  11. I use _mm_loadu_si128 and process 4 normal at a time, and the stall is gone, thank you!
  12. It's like a regular shadowmap with an extra twist. If a pixel is not in shadow, it's not in shadow. If a pixel is in shadow, you can look up the "transparent texture" to tell whether it's in transparent shadow or in opaque shadow,  and what the transparent value should be.   StarCraft2 use a similar approach http://developer.amd.com/wordpress/media/2013/01/Chapter05-Filion-StarCraftII.pdf section 5.8
  13. hi! I try to convert a normal vector from a DWORD to float3, using the following code:   __m128i n_i; n_i.m128i_i32[0] = n&0xff;   n_i.m128i_i32[1] = (n >> 8)&0xff; n_i.m128i_i32[2] = (n >> 16)&0xff; n_i.m128i_i32[3] = 0; __m128 n_f = _mm_cvtepi32_ps(n_i); ...   here are the assembly:   ... mov dword ptr [esp], ecx mov dword ptr [esp+0x4], edx mov dword ptr [esp+0x8], eax mov dword ptr [esp+0xc], 0 movdqa xmm0, xmmword ptr [esp] cvtdq2ps xmm0, xmm0 ...   And "cvtdq2ps xmm0, xmm0" has a high CPI rate(1.65).  According to https://fgiesen.wordpress.com/2013/03/04/speculatively-speaking/  , CPU can not forward multiple store to one big load. I wonder whether this is a load hit store or not.    
  14. This one should be helpful. Don't Throw it all Away: Efficient Buffer Management https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/gdc12/Efficient_Buffer_Management_McDonald.pdf
  15. Nice screen shot!  Here are some questions: What about instancing? Like i have 10 trees and each of them would have different occlusion data,  how do you manage to put them into one draw call? And the dark part between glass blade, are they made by ssao?