Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Yesterday, 11:43 PM

#5059091 D3DXComputeTangentFrame output?

Posted by MJP on 04 May 2013 - 12:12 AM

^^^ what Nik02 said. If you want to see an example, I'm pretty sure that the old ParallaxOcclusionMapping sample from the DirectX SDK can show you how to load a mesh, clone it, and generate a tangent frame.

#5059090 Performance improvements from Dx9 to Dx10/11

Posted by MJP on 04 May 2013 - 12:10 AM

"Old cruft" would mostly refer to the fixed-function pipeline and everything that went with it.


As for "more functionality", yes this would mostly be the things enabled by the higher feature levels. So things like new shader types, new resource types (texture arrays, structured buffers, append buffers), access to MSAA sample data, new texture sampling functions (Gather, SampleCmp), integer instructions, new render target and compressed texture formats, etc.

#5059053 Performance improvements from Dx9 to Dx10/11

Posted by MJP on 03 May 2013 - 05:00 PM

That's a pretty weird question to ask at an interview, IMO. Off the top of my head...

  • Changed the driver model so that the driver has a user-mode DLL in between the D3D runtime DLL and the kernel-mode driver DLL
  • Changed API's so that you can set multiple textures/buffers/render targets in a single call instead of one at a time
  • Removed a lot of old cruft that was being emulated on GPU's, or not supported at all
  • Introduced constant buffers, which can be much faster to update compared to the old constant registers used in D3D9
  • Moved device states into immutable state objects. This results in less API calls, and allows validation to be done once during initialization.
  • Input layouts allow the driver to create fetch shaders earlier, and re-use them more efficiently
  • D3D11 added multithreaded rendering, which can potentially allow you to improve total performance if you have a few cores to spare.

These are all of the changes I can think of that relate to CPU performance of the API itself, as opposed to performance gains you might get on the GPU due to having access to more functionality.

#5058202 How does VSYNC work?

Posted by MJP on 30 April 2013 - 05:18 PM

I would suggest reading up on raster display devices. A lot of the terminology comes from the days of CRT displays, but most of it still applies to LCD displays.


The display adapter (on your video card) is configured to send frame information at a set refresh rate. For normal windows desktop usage this refresh rate is configured as part of your display settings, for fullscreen D3D applications the refresh rate is specified when you create a swap chain for an adapter. Once the refresh rate is configured, the adapter will send frames to the display at the agreed upon rate. The data is sent one scan line at a time, until the the bottom line is sent and the image is complete. Once complete, the display is now in its VBLANK period. The scanline data that's sent to the display comes from an area of memory on the video card called the frame buffer. Typically you have 2 frame buffers for a double buffering setup, where one buffer (the back buffer) is written to by the GPU while the other (the front buffer) is being read and sent to the display. At the end of each frame once you've finished rendering the back buffer, you'll "flip" the buffers and the back buffer will become the front buffer (and vice versa). If this flip happens during the VBLANK, everything is fine: you'll see one image for an entire refresh period of the display and then you'll see the next image for the next refresh period. However if you flip outside of the VBLANK while the scanlines are being sent to the display, you'll get a "tear". This is because some of the lines from the previous frame will have been sent to the display, and then suddenly the scanlines switch to the next frame.


Now what VSYNC does is it causes the GPU to wait until the VBLANK period before flipping. This ensures you have no tearing, since you always flip in the VBLANK. The downside is that you might have to wait a while for the next VBLANK, especially if you just missed it. So for instance if your refresh rate is 100Hz your VBLANK will be every 10ms. So if it takes you 10.1ms to finish rendering a frame, you'll have to wait until the next VBLANK which means it will have effectively taken you 20ms to display that frame. If VSYNC had been disabled, you would just get a tear at the top of the screen instead. This is why frame dips are more jarring when VSYNC is enabled.

When you're not in fullscreen mode, things are different because the your D3D/OpenGL adapter won't have exclusive access to the display. Instead it might go through a desktop manager, depending on the OS you're using. On Windows Vista/7/8 with desktop composition enabled, the GPU will present the back buffer to the DWM instead of the display, and then the DWM will composite the image with the rest of the desktop. This is why you don't get tearing in windowed mode, even if VSYNC is disabled. Enabling VSYNC will still limit your app to the refresh rate, however the behavior will be different.

#5057675 Performance between half and float under SM3.0

Posted by MJP on 29 April 2013 - 01:24 AM

The only GPU's that ever supported half precision in shaders were Nvidia's FX series, 6000 series, and 7000 series GPU's. On the FX series using half-precision was actually critical for achieving good performance, since full precision came with a significant performance penalty. ATI hardware used a weird 24-bit precision internally for everything on their early DX9 hardware, since the spec for SM2.0 was somewhat loose in terms of how it defined the precision and format of floating-point operations. Later ATI DX9 hardware used full 32-bit precision for everything, since SM3.0 required IEEE compliance (or at least something much closer to it).

For SM4.0, the half-precision instructions and registers were completely removed from the specification. Using the "half" type in HLSL will cause the compiler to use full-precision instructions, and in practice no DX10 or DX11 GPU's support half-precision arithmetic internally. Weirdly enough lower-precision instructions have made a comeback in D3D11.1, primarily for mobile hardware. However in 11.1 the syntax for using it is different, you have to use types like "min16float" and "min16uint".

#5056837 How much to tessellate?

Posted by MJP on 26 April 2013 - 12:02 AM

You might be able to just directly call D3DCompile from that DLL using p/invoke.

#5056708 Ugly lines on screen.

Posted by MJP on 25 April 2013 - 12:27 PM

Size of backbuffer is the same as window, and model was created in blender (smooth normals), and this model looked great in OpenGL smile.png I also tried VSync (off/on), didnt help.


Is it the same size as the window, or the same size as the client area of the window?

It's not a model issue, it's an artifact resulting from upscaling or downscaling an image with point filtering. This is what happens when you create your backbuffer with a different size than the window's client area, which is why I asked about it.

#5056482 How much to tessellate?

Posted by MJP on 24 April 2013 - 04:20 PM

At work we used to do something very similar to what's described in this presentation, back when we were planning on using a lot of tessellation. It seemed to work well enough, although it was never battle-tested in a production setting.

#5056466 Ugly lines on screen.

Posted by MJP on 24 April 2013 - 03:27 PM

Make sure that the size of your back buffer/swap chain is the same as the size of your window's client area.

#5056284 where can i find shader assemble docs

Posted by MJP on 24 April 2013 - 12:41 AM

Vertex shader instructions


Pixel shader instructions


'dcl_color' could be dcl_usage input, dcl_usage output, or dcl_semantics, depending on the context and whether it's used in a vertex shader or a pixel shader.

#5054737 Normal Mapping Upside Down

Posted by MJP on 18 April 2013 - 06:06 PM

Tanget/bitangent vectors need to point in the direction that your U and V texture coordinates are increasing, so it depends on how you setup the texture coordinates of your quad vertices.

#5053678 DirectX ToolKit

Posted by MJP on 15 April 2013 - 08:30 PM

There are a few samples on the CodePlex page, have you had a look at those?

#5053565 Is real time rendering book, third edition, still good?

Posted by MJP on 15 April 2013 - 02:39 PM

Just buy it, you won't regret it.

#5053368 Light halos, lens flares, volumetric stuff

Posted by MJP on 15 April 2013 - 01:48 AM

Lens flares and lenticular halos don't really have to do anything with volumetric lighting, that's phenomena that results from light refracting and reflecting inside of a lens enclosure. Producing a physically-plausible result in a game would require some attempt at simulating the path of light through the various lenses, for instance by using the ray-tracing approach taken by this paper. Just about all games crudely simulate these effects using screen-space blur kernels combined with sprites controlled by occlusion queries.

Volumetrics is mostly concerned with the scattering and absorption of light as it travels through participating media. Most games don't come anywhere close to simulating this, since it's complex and expensive. I don't think you will get very far with pre-computing anything, since you typically want to simulate the fog so that it moves about the level. You also usually want to attenuate the density with noise, to produce more realistic-looking cloud shapes.

#5053041 Point Sprite Vertex Size

Posted by MJP on 14 April 2013 - 12:22 AM

What does a value being in the emitter properties have to do with using shaders?

I'm going to be blunt here: fixed function is a waste of time. All available hardware supports shaders, and uses shaders under the hood to implement the fixed-function feature set from DX9 and earlier. There's absolutely no reason to learn it, and there's no reason to use it in any new project. Anything you can do in fixed-function can be done in shaders, and probably more efficiently since you can tailor it to your exact needs.

FVF codes are outdated cruft from the pre-DX9 era. If you're set on using DX9 then you should at least use vertex declarations. They completely replace FVF codes, and offer functionality that can't be used with FVF codes (for instance, the aforementioned PSIZE). In your case you would add an additional float to your struct for storing the point size, and then you would specify a D3DVERTEXELEMENT9 with D3DDECLUSAGE_PSIZE