Jump to content
  • Advertisement


  • Content count

  • Joined

  • Last visited

  • Days Won


MJP last won the day on July 2

MJP had the most liked content!

Community Reputation

19972 Excellent

1 Follower

About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Generally you won't store irradiance * albedo in your lightmaps, because that would mean that your albedo resolution is limited by your lightmap resolution. Typically you'll store some form of irradiance in your lightmap (usually irradiance / Pi), and then compute the diffuse reflectance in your fragment shader by sampling both the lightmap and albedo map and multiplying the results together.
  2. MJP

    what is Texture Streaming?

    Like ChuckNovice explained, it's the concept of streaming texture data in and out of GPU memory based on which textures are necessary to render the current viewpoint. The basic idea is that you only need high-resolution textures when the camera is close to a surface that uses the texture, so you can drop the higher-resolution mip levels as you get further away. So for instance if you have a 2048x2048 texture, as you get further away you would drop down to 1024x1024, and then 512x512, and so-on. Generally there's two main parts of this that can be tricky. The first is the actual streaming infrastructure for streaming texture data off disk (possible decompressing), and then making sure that they find their way into GPU memory using whichever graphics API you're currently utilizing. You usually want to stream as fast as you can, but you also want to make sure that you don't use too much CPU/GPU resources since that can interfere with the game's performance. The other hard part is actually determining which mip level should be used for each texture. There's several possible avenues for doing this, with different tradeoffs on accuracy, runtime performance, and the ability for things to move around at runtime. Many games with mostly-static worlds will chunk up their scenes into discrete regions, and pre-compute the required texture mip levels for each region. Then as the camera moves through the world from one region to the next, the engine's streaming systems moves textures in and out of memory based on the current visible set for that region. You can see this presentation about Titanfall 2 for an example of this approach. Other games will try to compute the required texture set on the fly, possible by reading back information from the GPU itself. RAGE took this approach, where they rendered out ID's to a low-resolution render target and read that back on the CPU to feed into their streaming system. In their case they were streaming in pages for their virtual texture system, but the basic concept is the same for normal texturing. More details on that here. The upside of that approach is that it handled fully dynamic geometry, and gave rather accurate results for the current viewpoint. The downside was that it was always at least a frame behind the current camera movement, and couldn't predict very quick camera motions. So if you spun the camera around 180 degrees really quickly, you might see some textures slowly pop in as they get streamed off disk. In their case the issue was made worse by their choice of have totally unique texture pages for the whole world, but you could still have the same problem with traditional textures.
  3. 2 * 512 threads is 1024 threads per-CU, which works out to an occupancy of 4 for each the 4 SIMD's on a CU. While an occupancy of 4 isn't amazing, it's not terrible either. I would be careful about using more VGPR's than 64, since if you go above 64 your occupancy will drop below 4 and your performance may suffer.
  4. Yup, it's exactly the same as Load().
  5. You certainly don't want anisotropic filtering for a per-pixel noise texture. In fact, you don't want any filtering at all. Instead, you probably want to sample the texture with integer coordinates and bypass filtering entirely. If you do it this way, you can use an integer mod operation to "tile" the 64x64 noise texture across the entire screen: float4 PS_main( in float4 screenPos : SV_Position) : SV_TARGET0 { uint2 rvecCoord = uint2(screenPos.xy) % 64; float3 rvec = randomTexture[rvecCoord].xyz; return float4(rvec, 1.0f); }
  6. The counter is 32-bits, so you will easily overflow that if you never clear it. But of course you'll be limited by the actual size of your structured buffer resource long before you hit a count of 0xFFFFFFFF, at which points writes will be discarded like turanszki already explained.
  7. Engines commonly handle this sort of thing using some flavor of an entity/component system (ECS). For instance you might have a TransformComponent, ScoreComponent, NicknameComponent, and a MeshComponent all assigned to the same entity/actor. The score and nickname component can store their data separately without having to care about the mesh and transform, while the mesh component can pull the actor's current transform from the transform component when its time to render (the same transform component can also be queries by other components that rely on it, for instance a PhysicsComponent). For this kind of setup, you probably want a simplified interface on your MeshComponent (or whatever your equivalent is) that only exposes the things relevant to other components, without exposing unnecessary implementation details. So for instance you might let other components hide/show the mesh, but other components don't necessarily care that your lower-level renderer uses a deferred rendering setup that rasterizes the mesh to a G-Buffer.
  8. MJP

    Floating poing luminosity

    Floating point values actually work pretty nicely for storing HDR (scene referred) intensities that ultimately end up getting mapped to a visible range. This is because the floating point is inherently exponential in how it represents numbers (due to the exponent and mantissa), which effectively means that as you get further you end up with larger and larger "buckets" that serve as the smallest difference the value can represent between two values without having to round up or down. So if you're at a higher intensity like 10,000, you're not going to be able to represent 10,000.000001 in a 16-bit float. However that doesn't really matter, since the visual system really works more on a logarithmic scale. What that means is that Going from 0.00001 to 0.00002 may be perceived as big difference to a human (because relatively speaking you've doubled the intensity), but going from 10,000.00001 to 10,000.00002 is imperceptible. So floating point tends to naturally work out for that by "discarding" very small differences as you get into the higher ranges (or vice versa as you get into lower ranges closer to 0).
  9. For UpdateSubresource the "left" and "right" members of the box should be offsets in bytes when updating a buffer resource, so the code that you posted for your setData function looks correct. What exactly do you mean when you say that "the data is not smooth"?
  10. MJP

    SSAO running slow

    There's no option to do that here. Sometimes people like to jump in with an extra question or comment even after the initial question was answered, so we leave the threads open.
  11. I agree with pcmaster: if you're going to try to have your own abstractions over D3D (or other API's), you're best off if you structure your engine's "internal" API in a way that requires you to specify all render targets simultaneously. That you way you can map that to any backends that your engine supports. In our engine we have something we call a "render target state", which handles setting render targets + depth targets, optionally clearing them, and also specifying the necessary barriers (or low-level sync operations for consoles) that are necessary for using those render targets and potentially reading from them afterwards. This setup initially came from working with the Vita, which has a mobile GPU that requires you to be rather explicit about your render target usage due to the way that tile-based deferred rendering (TBDR) works. But it turned out to have worked out well for modern consoles and PC graphics API's.
  12. That depends on how you combine the blurred texture with your scene. The two simplest ways to do that are to combine in a pixel shader, or use alpha blending. Doing it in a shader is more flexible, but requires an additional render target that will contain the result of combining the two images. Doing it with alpha blending is less flexible (you have to use fixed-function blend equations), but allows you to modify the "main" render target in-place. Even if you use hardware alpha blending, you may not need the alpha value at all. The typical alpha blend equation looks like this: (SrcColor.rgb * SrcColor.a) + (DstColor.rgb * (1 - SrcColor.a)) Where "SrcColor" is what comes out of your pixel shader, and DstColor is what's in your render target. This is basically doing a linear blend of SrcColor and DstColor, similar to doing lerp(SrcColor.rgb, DstColor.rgb, SrcColor.a) in a shader. But that's usually not what you want for a glow effect, which is usually composited with a straightforward additive operation. In that case you don't need to use the alpha value in your blend equation at all, so it won't matter. You may still want to multiply your glow texture by some scale factor before combining it with your scene, but you can do that very easily by just multiplying it with your pixel color in the shader.
  13. Are you using bilinear filtering when you downscale and upscale? It looks to me like you downsampled to 1/4 size in each dimension with point filtering, which is giving you that "blocky" look due to the aliased edges. So to start of I would make sure that you're using bilinear filtering for your sampler states when you're performing the scaling passes. It also looks like your blur is pretty narrow, which isn't helping you to get that "glowy" look that you want. I assume that those kernel weights are from a Gaussian function, and you're applying them to the 9 texels in the neighborhood of the center pixel that you're processing? That should give you as decent amount of blurring, but you may want to tweak the sigma parameter of the Gaussian to get a wider-looking kernel. It also looks like your weights aren't integrating to 1, which is why the resulting image is darker than the source image. You can correct for that by summing up your weights, and then multiplying your result by 1 / sumOfWeights. You may also want to read through this article, which has a good overview of how to do an efficient blur for a bloom/glow effect.
  14. The technique that you've described (downscale + blur passes) is pretty standard for achieving the effect that you want. Could you perhaps explain what looks bad about your current implementation? It's possible that you have a simple bug that you could fix, or that some simple tweaks could give you better results. If you could get some screenshots or a video, that would probably be helpful.
  15. This is before my time and there's no docs available, but from what I gather it's a relic of the days where vertices were fully processed on the CPU. Back then you would transform the vertices and calculate lighting if necessary on the CPU, and send the resulting data to the GPU to be rasterized and textured. These old GPU's typically required triangles to be clipped before they could be rasterized, so that also had to happen on the CPU. The old versions of D3D could do vertex processing and clipping for you, which is what that Nvidia doc that Endurion linked to aludes to. It sounds like D3D would track the extents of processed triangles for you for some reason (perhaps something related to clipping), and that flag would prevent the D3D runtime from doing this. A good chunk of this functionality is still present in D3D9, which was the last version of the API to support fixed-function vertex processing as well as CPU-side vertex processing. However it also supported full GPU-side vertex/pixel processing (with shader support), which is what you would use on any GPU made in the last 15 years. So unless you're targeting some realllllllly old hardware, I would just use full hardware vertex processing and not worry about that flag.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!