Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 29 Mar 2007
Offline Last Active Yesterday, 04:03 PM

#5243906 Normal offsets in view space?

Posted by MJP on 31 July 2015 - 04:14 PM

There should be no difference between doing this in world space or view space, provided that everything you're working with is in the same coordinate space. I would suspect that you have something in world space that hasn't been converted to view space.

#5243437 Is it worth to use index buffers?

Posted by MJP on 29 July 2015 - 03:14 PM

The way most GPU's work is that they have "post-transform vertex cache", which stores the results of the vertex shader for some number of vertices. The idea is that if you have the same index come up multiple times, it won't have to invoke the vertex shader every time. However since the cache size is limited, you want to sort your triangles so that the same indices are near each other in the index buffer.

Another thing to consider is that there is sometimes another cache (or caches) that's used for reading the actual vertex data needed by the vertex shader. For instance on AMD's recent hardware, all vertex fetching is done as standard vector memory loads that go through both the L2 and L1 caches. In light of that, you may also want to sort the order of the elements in your vertex buffer so that you get better locality, which reduces cache misses.

You should be able to find some links if you search for "vertex cache optimization" on Google. You'll probably want to use an existing implementation, like this one.

#5243258 why do we need to divide w in texture projection

Posted by MJP on 28 July 2015 - 03:15 PM

guys, frankly, I thank you all for reply, but I still don't get it.
Why is this divide called perspective divide? and it's a  (x or y / depth) => tan( fov / 2) right? 
I'm gonna go through some basic knowledge again, I must be missing sth, seriously!

The "perspective divide" comes from the use of homogeneous coordinates. If you're not familiar with that term, I would suggest reading some background material. This link that I found explains some of the basics, and how it applies to graphics.

#5243116 why do we need to divide w in texture projection

Posted by MJP on 27 July 2015 - 11:30 PM

tex2Dproj is intended to be used for texture lookups that are the result of applying a perspective projection to a coordinate. The most common example (and almost certainly the only intended use case) is for shadow maps: typically you sample a shadow map by transforming your pixel position by a light-space projection matrix, and then use the resulting projected position to compute the UV address for the shadow map texture. tex2Dproj just saves you from having to do the last step in that process, which is the perspective divide. It was also used as the means of accessing so-called "hardware PCF" functionality on older GPU's, hence the reason for passing the Z coordinate of the projected position (the Z coordinate was used for comparing with the shadow map depth).

#5241843 Shader Returns Black Screen When Compiling With ps_3_0

Posted by MJP on 21 July 2015 - 08:18 PM

Are you using it with a vs_3_0 vertex shader?

#5241828 Lightmapping

Posted by MJP on 21 July 2015 - 06:43 PM

An important thing to consider is that Unity and UE4 both support light mapping out of the box. I don't know what percentage of 3D Unity games use it, but I bet it's pretty high. Especially for games targeting mobile, or other low-spec hardware. I'm sure that plenty of UE4 games also use light mapping, unless they have a particularly strong reason not to (like dynamic TOD, and even in those cases you probably precompute plenty of things). UE4 actually has a really nice baking pipeline, that was also heavily used for UE3 games.

And yes, The Order used lightmaps that stored indirect and environment lighting using spherial radial basis functions.

#5241792 Lightmapping

Posted by MJP on 21 July 2015 - 04:27 PM

Light-maps are no longer used.
Lighting is all real-time these days with rare minor exceptions.

That is not even close to true.

#5241789 Updating large buffers

Posted by MJP on 21 July 2015 - 04:09 PM

The tool that you need to use to dig into problems like this is GPUView. To get the ETW capture, you can try using UIforETW if you don't want to use the command line version. You may also want to read this presentation, and try doing what it suggests.

#5241785 Are there any limitations to the DirectX Toolkit?

Posted by MJP on 21 July 2015 - 04:03 PM

I'd say that the spectrum is something like this:
-- UE4/Unity
-- XNA
-- DirectXTK
-- DirectX/OpenGL

DirectXTK is just a set of simple helper types that you can use for demos, or for common tasks that are tedious to implement yourself (like drawing basic text, or loading a DDS file as a texture). These kinds of helpers were present in XNA (and were quite similar, thanks to sharing the same author), but XNA had a lot more functionality. In particular, XNA had the content pipeline, as well as a more robust set of wrappers around things like audio and save data.

UE4 and Unity are in a totally different class altogether, being that they're full-featured engines that include an entire toolchain for developing content.

#5241577 Real-time 2D camera lens blur achieveable?

Posted by MJP on 20 July 2015 - 01:22 PM

Your typical high-quality defocus blur in 2D goes something like this:

for each pixel p1:    
    color = p1
    weight = 1
    for each pixel p2:
        compute CoC for p2
        if(CoC >= dist(p1, p2))
            color += p2 * bokeh
            weight += bokeh
    p1 = color / weight;
The fact that every pixel considers every other pixel is what makes it so expensive. Most of your optimization would be making that inner loop smaller, so that you can consider less pixels. Generally you'll clamp things to some max CoC size so that you can only consider neighbors in small region around the pixel. You can also do more complex things like pre-computing the maximum CoC per region, and using that to early out of your loop. Or you can do what Kryp0n mentioned, and turn it into a scattering problem by rendering a quad per pixel with the appropriate CoC size. In games it's also common to do most of this work at a lower resolution, which can reduce the cost substantially.

One thing I should mention about doing this in 2D is that you're always going to have issues with out-of-focus areas in the foreground. Handling that correctly requires knowledge of what's behind the object, and you just don't have that in 2D. Instead you have to "make up" the information, usually by using surrounding pixels.

#5241266 Getting rid of platform-specific ifdefs

Posted by MJP on 18 July 2015 - 02:39 PM

We primarily use separate .cpp files for platform-specific code, with a suffix at the end to indicate the platform. So "thread_win.cpp" would be recognized by the build system as being Windows-specific, and would only be compiled/linked for Windows builds. In general we would try to do this for fairly low-level things, and would use a C-like interface for these functions. If we wanted a C++ object-oriented wrapper for something, then that wrapper would call the C-like "system" interface. Doing it this way has 2 big advantages:

1. The platform-specific code is cleanly separated for each platform
2. You know what you need to implement when adding a new platform

That second one is pretty key. It's really easy to fall into the trap of doing something like this in your code:

#if PS4_
    // Console code
    // PC code
Then if you add a new platform, you might miss this bit of code and the new platform will take a path that you assumed would only run on PC. To try to prevent that, we usually do this when using platform-specific macros:

#if PS4_
    // PS4 code
#elif Windows_
    // PC code
    #error "Unsupported platform!"
This forces you to at least look at the code and make a choice when adding a new platform.

For our graphics code, I also used feature macros instead of platform macros. So for instance instead of doing this:

#if PS4_
    // Do async compute stuff
#elif Windows_
    // Do normal compute stuff
    #error "Unsupported platform!"
...I would instead have "platformfeatures_win.h" and "platformfeatures_ps4.h" header files full of feature defines. So in the Windows version it would #define AsyncCompute_ to 0, and in the PS4 version it would have the same macro but defined as 1. Then I could use the following code:

#if AsyncCompute_
    // Do async compute stuff
    // Do normal compute stuff
I found this cleaner to use, and less error-prone. It was really nice for starting out on a new platform, because we could just disable a bunch of fancy features and then bring them online one by one. It also lets you quickly disable something for testing or profiling, or if it turns out that it's not optimal.

#5240966 Books / Tutorials to read to prepare for Vulkan?

Posted by MJP on 16 July 2015 - 07:47 PM

I suspect the part of Mantle/DX12/Vulkan that will give existing developers the most trouble will be the manual synchronization. When you're dealing with graphics on a PC, you really have 3 kinds of parallelism: CPU/CPU parallelism (multiple threads/cores running concurrently on the CPU), CPU/GPU parallelism (CPU and GPU running concurrently), and GPU/GPU parallelism (multiple execution units on the GPU running concurrently with each other). Just like any kind of parallelism, all 3 types listed require careful synchronization in order to avoid race conditions. Historically, the synchronization was either not required or abstracted away from you. You're probably already familiar with CPU-based parallelism and synchronization, since it's a common topic and is already directly exposed to you. However CPU/GPU GPU/GPU synchronization were totally abstracted way in older API's, and so I suspect it will be new territory for a lot of programmers.

#5239967 Is it ok to use texture to store vertices and use it to load vertices into ve...

Posted by MJP on 12 July 2015 - 05:08 PM

You'll see this sort of thing occasionally referred to as "software vertex fetch", since you're basically bypassing any hardware vertex fetch functionality that might be present in the GPU. It's totally valid to do it if it makes sense for your use case, and shouldn't cause any major performance problems. In fact on recent AMD hardware there is now dedicated vertex fetch hardware anymore, under the hood the driver just generates a preamble for your vertex shader (they call it a "fetch shader") that loads the vertex data out of buffers and places it into registers. I think Nvidia and Intel still have a dedicated HW path for vertex fetch, but they should still handle SW fetch just fine.

However as others have mentioned, it will probably be easier to use a StructuredBuffer for this instead of a 1D texture.

#5239146 [D3D11-OGL4] Gamma Correction - sRGB : render target / diffuse texture

Posted by MJP on 09 July 2015 - 01:29 AM

Using the hardware for sRGB conversion has three big advantages over doing it manually:

1. The hardware will convert texels from sRGB->linear before filtering, so that you don't filter in gamma space
2. The hardware will convert from sRGB->linear when performing alpha blending, so that you don't blend in gamma space
3. It's free

#5238487 Update a specific region of the render target?

Posted by MJP on 05 July 2015 - 04:51 PM

"fillrate" is an older term that typically refers to the maximum rate at which the hardware can output to a render target. It's not particularly useful for modern hardware, since you're almost never using a simple pass-through pixel shader anymore. Therefore it's commonly used to refer to bandwidth and ROP resources that can be bottlenecked by rendering lots of "dumb" pixels, especially if alpha blending is used.

As for the scissor test, all hardware that I'm familar with will apply the scissor during rasterization so that it can cull pixels before they'r ever shaded, since there would be no point in executing the pixel shader for a pixel that would just be discarded anyway. In fact in D3D10 they even moved the scissor test into the logical "Rasterizer Stage", although the logical pipeline still doesn't necessarily dictate when something happens on real hardware.