• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Endemoniada

      Hi guys, when I do picking followed by ray-plane intersection the results are all wrong. I am pretty sure my ray-plane intersection is correct so I'll just show the picking part. Please take a look:
       
      // get projection_matrix DirectX::XMFLOAT4X4 mat; DirectX::XMStoreFloat4x4(&mat, projection_matrix); float2 v; v.x = (((2.0f * (float)mouse_x) / (float)screen_width) - 1.0f) / mat._11; v.y = -(((2.0f * (float)mouse_y) / (float)screen_height) - 1.0f) / mat._22; // get inverse of view_matrix DirectX::XMMATRIX inv_view = DirectX::XMMatrixInverse(nullptr, view_matrix); DirectX::XMStoreFloat4x4(&mat, inv_view); // create ray origin (camera position) float3 ray_origin; ray_origin.x = mat._41; ray_origin.y = mat._42; ray_origin.z = mat._43; // create ray direction float3 ray_dir; ray_dir.x = v.x * mat._11 + v.y * mat._21 + mat._31; ray_dir.y = v.x * mat._12 + v.y * mat._22 + mat._32; ray_dir.z = v.x * mat._13 + v.y * mat._23 + mat._33;  
      That should give me a ray origin and direction in world space but when I do the ray-plane intersection the results are all wrong.
      If I click on the bottom half of the screen ray_dir.z becomes negative (more so as I click lower). I don't understand how that can be, shouldn't it always be pointing down the z-axis ?
      I had this working in the past but I can't find my old code
      Please help. Thank you.
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By evelyn4you
      Hello,
      in my game engine i want to implement my own bone weight painting tool, so to say a virtual brush painting tool for a mesh.
      I have already implemented my own "dual quaternion skinning" animation system with "morphs" (=blend shapes)  and "bone driven"  "corrective morphs" (= morph is dependent from a bending or twisting bone)
      But now i have no idea which is the best method to implement a brush painting system.
      Just some proposals
      a.  i would build a kind of additional "vertecie structure", that can help me to find the surrounding (neighbours) vertecie indexes from a given "central vertecie" index
      b.  the structure should also give information about the distance from the neighbour vertecsies to the given "central vertecie" index
      c.  calculate the strength of the adding color to the "central vertecie" an the neighbour vertecies by a formula with linear or quadratic distance fall off
      d.  the central vertecie would be detected as that vertecie that is hit by a orthogonal projection from my cursor (=brush) in world space an the mesh
            but my problem is that there could be several  vertecies that can be hit simultaniously. e.g. i want to paint the inward side of the left leg. the right leg will also be hit.
      I think the given problem is quite typical an there are standard approaches that i dont know.
      Any help or tutorial are welcome
      P.S. I am working with SharpDX, DirectX11
        
    • By Luca Davidian
      Hi, I'm implementing a simple 3D engine based on DirectX11. I'm trying to render a skybox with a cubemap on it and to do so I'm using DDS Texture Loader from DirectXTex library. I use texassemble to generate the cubemap (texture array of 6 textures) into a DDS file that I load at runtime. I generated a cube "dome" and sample the texture using the position vector of the vertex as the sample coordinates (so far so good), but I always get the same face of the cubemap mapped on the sky. As I look around I always get the same face (and it wobbles a bit if I move the camera). My code:   
      //Texture.cpp:         Texture::Texture(const wchar_t *textureFilePath, const std::string &textureType) : mType(textureType)         {             //CreateDDSTextureFromFile(Game::GetInstance()->GetDevice(), Game::GetInstance()->GetDeviceContext(), textureFilePath, &mResource, &mShaderResourceView);             CreateDDSTextureFromFileEx(Game::GetInstance()->GetDevice(), Game::GetInstance()->GetDeviceContext(), textureFilePath, 0, D3D11_USAGE_DEFAULT, D3D11_BIND_SHADER_RESOURCE, 0, D3D11_RESOURCE_MISC_TEXTURECUBE, false, &mResource, &mShaderResourceView);         }     // SkyBox.cpp:          void SkyBox::Draw()     {         // set cube map         ID3D11ShaderResourceView *resource = mTexture.GetResource();         Game::GetInstance()->GetDeviceContext()->PSSetShaderResources(0, 1, &resource);              // set primitive topology         Game::GetInstance()->GetDeviceContext()->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);              mMesh.Bind();         mMesh.Draw();     }     // Vertex Shader:     cbuffer Transform : register(b0)     {         float4x4 viewProjectionMatrix;     };          float4 main(inout float3 pos : POSITION) : SV_POSITION     {         return mul(float4(pos, 1.0f), viewProjectionMatrix);     }     // Pixel Shader:     SamplerState cubeSampler;     TextureCube cubeMap;          float4 main(in float3 pos : POSITION) : SV_TARGET     {         float4 color = cubeMap.Sample(cubeSampler, pos.xyz);         return color;     } I tried both functions grom DDS loader but I keep getting the same result. All results I found on the web are about the old SDK toolkits, but I'm using the new DirectXTex lib.
    • By B. /
      Hi Guys,
      i want to draw shadows of a direction light but the shadows always disappear, if i translate my mesh (cube) in the world to far of the bounds of my orthographic projection matrix.
      That my code (Based of an XNA sample i recode for my project):
      // Matrix with that will rotate in points the direction of the light Matrix lightRotation = Matrix.LookAtLH(Vector3.Zero, lightDir, Vector3.Up); BoundingFrustum cameraFrustum = new BoundingFrustum(Matrix.Identity); // Get the corners of the frustum Vector3[] frustumCorners = cameraFrustum.GetCorners(); // Transform the positions of the corners into the direction of the light for (int i = 0; i < frustumCorners.Length; i++) frustumCorners[i] = Vector4F.ToVector3(Vector3.Transform(frustumCorners[i], lightRotation)); // Find the smallest box around the points BoundingBox lightBox = BoundingBox.FromPoints(frustumCorners); Vector3 boxSize = lightBox.Maximum - lightBox.Minimum; Vector3 halfBoxSize = boxSize * 0.5f; // The position of the light should be in the center of the back pannel of the box. Vector3 lightPosition = lightBox.Minimum + halfBoxSize; lightPosition.Z = lightBox.Minimum.Z; // We need the position back in world coordinates so we transform // the light position by the inverse of the lights rotation lightPosition = Vector4F.ToVector3(Vector3.Transform(lightPosition, Matrix.Invert(lightRotation))); // Create the view matrix for the light this.view = Matrix.LookAtLH(lightPosition, lightPosition + lightDir, Vector3.Up); // Create the projection matrix for the light // The projection is orthographic since we are using a directional light int amount = 25; this.projection = Matrix.OrthoOffCenterLH(boxSize.X - amount, boxSize.X + amount, boxSize.Y + amount, boxSize.Y - amount, -boxSize.Z - amount, boxSize.Z + amount); I believe the bug is by cameraFrustum to set a Matrix Idetity. I also tried with a Translation Matrix of my Camera Position and also the View Matrix of my Camera, but without success
      Can anyone tell me, how to draw shadows of my direction light always where my camera is current in my scene?
      Greets
      Benjamin
  • Advertisement
  • Advertisement
Sign in to follow this  

DX11 [DX11] Tile-based Deferred Shading in BF3 discussion

This topic is 2226 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

DICE released this presentation that talks about how their renderer uses tile-based deferred shading with DX11:

http://publications.dice.se/attachments/GDC11_DX11inBF3_Public.pptx

The tile-based approach starts on slide 10.

On slide 12 they say they use 1 thread per pixel, and 16x16 thread groups per tile. To process the entire screen, I assume they use the ID3D11DeviceContext::Dispatch() parameters to spawn a bunch of those 16x16 thread groups. For example, for a resolution of 1360x768, they'd call Dispatch( 85, 48, 1 ). Does that sound about right?

On slide 13 they have each thread group determine the min/max depth for its 16x16 pixel screen tile. This is done through groupshared data and interlocked instructions.

Slide 15 describes how they perform culling of the light list vs. the screen aligned bounding box established on slide 13. Instead of each thread in the 16x16 thread group processing a pixel, now each thread processes a light from the incoming light list and, if that light intersects the bounding box, that thread adds the light index to the group shared list of lights. At the end of this phase, each thread group has a list of lights that potentially intersects the pixels in that tile.

Slide 15 handles only point lights. What if we wanted to handle both point and spot lights? Two ideas come to mind. One is to expand struct Light to include additional parameters needed for spot lights. Another is to use two independent structures, one for point and the other for spot. In the first case, we continue to use a single for() loop and conditionally select which intersect test to use based on the light type. In the second case, we use two for() loops, first processing the point lights and then another for() loop to process the spot lights. The second approach feels like it should be more efficient than the first due to coherency between the threads in the thread group.

Slide 16 switches back to processing pixels. Each thread iterates through the list of lights potentially intersecting its bounding box and performs the lighting calculation for its pixel. This all makes sense. Is there further culling that should be performed at this stage? For example, would it be beneficial to test each pixel to determine whether it intersects the spot light cone? Or probably better to simply use a clamp instruction?

One thing not mentioned in the presentation is how they make the initial unculled list of lights available to the Compute Shader, other than that they use a StructuredBuffer for the light data and a Constant Buffer for the # lights. According to NVIDIA, if a Buffer is created as Dynamic, it resides in AGP memory all the time. You can lock it, update selective portions, and unlock it and yet nothing will get uploaded to the graphics card. When the shader reads from the buffer, only the needed data is uploaded at PCI speeds, but the entire buffer is never uploaded to video memory. In contrast, non-dynamic buffers reside in video memory. They can be updated with UpdateSubresource, in which case the data updated is copied to a temporary buffer in system memory and eventually uploaded to video memory before the shader needs it. The first method is slower for the graphics hardware (reading memory over PCI is slower than reading it from video memory), and the second method imposes more overhead on the CPU (from all that copying).

Since the unculled list of lights probably changes every frame, it's unclear which method would be faster. But it's easy to switch between the two methods, so once I get to that point, I'll try them both. My gut feel is that with so many threads accessing the light buffer, it's probably best to go with the UpdateSubresource method and have everything reside in video memory.

Share this post


Link to post
Share on other sites
Advertisement
Hey,

If you want to see some actual code of a tile-based deferred renderer: Deferred rendering for current and future rendering pipelines by Andrew Lauritzen.

He dispatches as you mention. And he calculates, like dice probably does, a mini frustum for each tile (znear and zfar are the min and max values of the depth buffer of the tile) and culls the point lights via: point light sphere vs frustum. He doesn't do any (per pixel) culling after that.

It only uses point lights. And the way you are mentioning about how to include different type of lights is also the only way I can think of but I'm curious of other reactions.But yeah, I also have that same feeling like "wow, there is a lot of dynamic branching going on".

Share this post


Link to post
Share on other sites
To get the light data into a GPU memory resource, you can upload the data into a staging buffer and then copy it to a default usage buffer - there shouldn't be any big issue with having to stream the light data into the buffer from AGP memory.

It does mention in their slides that they support the other light shapes, it just doesn't provide the sample code for it. I don't have a copy of the game, but I assume the shader code exists somewhere in the installation - so you might check that out if you have already purchased it.

One other thing that I would find interesting is to find out if there is any benefit to pre-sorting the lights on the CPU and then passing a semi-sorted listing of lights in the structured buffer. This would probably drastically cut down on the number of lights needed to be processed in each thread group, but at the expense of building the sorted light spatial data structure. However, if the structure is maintained from frame to frame, then it could be an overall win...

I think my engine needs a tile based renderer sample :)

Share this post


Link to post
Share on other sites
@Litheon - thanks for the link. This will really help out. In his code, he's using the Map/Unmap method and so his light data stays in host memory. Even so, he's able to render 1024 lights in around 6ms on my 450 GTS.

@Jason Z:
Wouldn't using UpdateSubresource() do the same thing as the staging buffer method, only with less implementation work? UpdateSubresource() copies the data to a temp buffer in host memory and then uploads that data to video memory before the shader executes. So either method performs the data copy / upload steps.

I have the 360 version of BF3. Great game BTW. Very pretty graphics.

Regarding pre-sorting the lights, pre-sort with respect to what? Do you mean pre-cull against the frustum? We're using Umbra 3 in our game and so it would be trivial to have Umbra cull out all non visible lights before I upload them to the card.

Share this post


Link to post
Share on other sites
What I mean about the pre-sorting is that the mini-frustums for each tile is known before hand (since it is a function of the camera orientation and position). If the lights are already sorted in some spatial hierarchy, then it should be possible to determine fairly efficiently which lights intersect (or could potentially intersect) each tile. That would effectively reduce the amount of tests that each tile needs to do before the threads are even dispatched. The sorted data could be provided in some data structure (i.e. something in a raw byte address buffer) or perhaps in a number of structured buffers...

About the resource updating, it depends on how the destination buffer is being used. If you explicitly copy the data between resources yourself then you have a little more control over how the update occurs. If you can ensure that your staging buffer won't have any contention, then your copy should choose the fastest method available.

Share this post


Link to post
Share on other sites
I'm forging ahead on my implementation of tile based CS lighting. One thing I ran into is that since the mini-frustum vs. light culling that the threads do is in view space, my light data (position and direction) needs to be in view space, too. In my game, all lights are stored in world space, so I could simply transform them to view space on the CPU as they're being written to the StructuredBuffer. I'm not too excited about doing this since our games tend to be CPU limited.

One idea that came to mind is that I can upload the light data in world space and have the CS transform them into view space. I'm currently using a StructuredBuffer. Could I change that to a RWStructuredBuffer so the CS can make a pass at the data and transform it in place, writing it back into the same buffer? Would there be any conflict with the game code on the CPU updating the buffer at the same time the CS is writing to it? I'd think not because the CPU would get a fresh buffer when it calls Map().

Since the work of transforming the lights can be distributed across the threads in the CS, there's no chance of conflict where two or more threads are trying to transform the same light.

I'm new to CS programming, so if there's a better way to do this, I'd love to hear about it!

Share this post


Link to post
Share on other sites
Another thought is that I could have the CS transform the light from world space to view space just during the mini-frustum phase and then discard the transformed data, and do the lighting computations in world space. This would eliminate the need to store the view space data back to a buffer at all because it won't be needed again (I think).

Share this post


Link to post
Share on other sites
Currently I store the worldLightPos and viewLightPos matrixes in 1 RWStructuredbuffer, and I transform them from world to view with a ComputeShader to the same RWStructuredBuffer. But I haven't measured the performance.

I don't think you will have conflicts with a Map/Unmap, but maybe the staging buffer is a good way to go. Then you have more control of what is allocated in the memory.


Please keep posting your results, it is an interesting read! 

Share this post


Link to post
Share on other sites

I'm forging ahead on my implementation of tile based CS lighting. One thing I ran into is that since the mini-frustum vs. light culling that the threads do is in view space, my light data (position and direction) needs to be in view space, too. In my game, all lights are stored in world space, so I could simply transform them to view space on the CPU as they're being written to the StructuredBuffer. I'm not too excited about doing this since our games tend to be CPU limited.

Why not convert the mini-frustums to world space instead? This would effectively require you to get the world space position and orientation of the camera, then you can generate your mini-frustums from that. That way your lights stay in world space, your mini-frustums are in world space, and no transformation is required on the CPU or GPU.

Would that work in your use case?

Share this post


Link to post
Share on other sites
I captured a quick video of my progress and put it up on YouTube. It's a cube being lit by 6,000 tiny moving point lights. It runs at 60 FPS on a GeForce 460 GTX. Sorry for the bad quality - I'll upload something better in the future. More info is in the description of the video.



Next step is implementing projected spot lights. But I won't be able to start that for another week.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement