Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 04 Apr 2007
Offline Last Active Yesterday, 12:10 PM

#4912452 Multithreading for Games

Posted by on 12 February 2012 - 09:35 PM

But when it comes to processing more complex structures, like the entities in an Octree, I keep finding blocks where the code will end up becoming serialised regardless of threading

You're Doing It Wrong™

Instead of shoehorning parallelism into an already explicitly serial design (trees and linked lists in general are inherently serial data structures; there's no effective way to traverse them in parallel without the threads stepping on each other's toes. Before you ask, skip lists are cheating/hacky) ask yourself if there are any alternative approaches that would instead scale well in a more threaded environment. As DICE found out when building their visibility system for Battlefield 3, the 'dumb' approach can certainly end up a lot faster than clinging to a conventional 'smart' solution simply because that's the way it's been done previously. Data is, ultimately, king Posted Image

#4912449 tga file format and directx 10

Posted by on 12 February 2012 - 09:25 PM

I'd personally suggest ditching TGA, period, and moving to DDS, as it's a much better overall 'game' format. MJP mentions the official texture conversion tool, but there are also excellent libraries out there for texture cooking/compression that may also be worth looking into depending on the needs of your renderer. nvtt, or the nVidia Texture Tools, comes personally recommended.

#4908402 UpdateSkinnedMesh vs GPU Update

Posted by on 01 February 2012 - 10:23 AM

I just wanna know Absolutic common answer.

My English writting skill level is very low...

I'm sorry --a

Your answer is I wanna know answer.

Thank you very much ^^a

Have a Nice DAY !!!

It's fine, we're (or at least, I'm not) not yelling at you, period, or questioning your English ability, just pointing out that you're asking for an absolute answer that doesn't exist, much like, for example, the question "How do I downsample depth?" In that case, you can get a local maximum, a local minimum or a regular, boring old mean; all are useful for different purposes and we need to know more about what else you need it for in order to give you a meaningful answer.

Since you're asking about 'readbacks,' this is simply the process of reading the skinned vertex data back onto the CPU, probably for something like physics calculations or the like. More on that in a second.

As an example of 'what else are you doing,' the use of shadow volumes is an excellent argument in favor of keeping everything CPU-side, as the actual process of getting this sort of stuff back into main memory from the GPU can be rather slow under typical game circumstances due to how the graphics driver actually goes about implementing your draw calls; generally, they're queued up into a big FIFO list behind the scenes and the GPU will need to burn through a few frames(!) worth of commands before it can be in a state where it can copy things out of VRAM. Again, the specifics of what else you're doing will influence the commands actually sent to the GPU and can in turn affect that latency. We just want some detailed information about what you plan on doing with your mesh(es) :)

#4908091 UpdateSkinnedMesh vs GPU Update

Posted by on 31 January 2012 - 12:01 PM

I'm going to echo Adam here and say 'depends what else you're doing.' 'Same' does not answer questions like the relative power of the GPU/CPU combination(s) installed, extra data availability requirements or additional workload.

#4907042 Nvidia SDK sample not working correctly

Posted by on 28 January 2012 - 10:06 AM

You can check this more thoroughly by running the sample in the reference rasterizer. If it looks fine there, smells like a driver and/or hardware problem.

#4897083 is there a way to load shaders faster?

Posted by on 24 December 2011 - 09:47 AM

Yes. Precompile it using fxc or D3DCompile() in your own utility and load that instead. That won't help you if most of your time is spent reading from disk, but I'd wager the compilation and optimization's what's killing you in this instance.

#4896210 [DX11] Battlefield 3 GBuffer Question

Posted by on 21 December 2011 - 10:42 AM

In most lighting models, "Specularity" can't be expressed by a single number. Traditional models use a "specular mask" (intensity) and a "specular power" (glossiness).

Many of the more complex lighting models use two similar but different parameters -- "roughness" and "index of refraction" instead. IOR is a physically measurable property of real-world materials, and using some math, you can convert it into a "specular mask" value, which determines the ratio of reflected vs refracted photons. This alone only describes the 'type' of surface though (e.g. glass and stone will have different IOR values).
Alongside this, you'll also have a 'roughness' value, which is a measure of how bumpy the surface is at a microscopic scale. If you had rediculously high resolution normal maps, you wouldn't need this value, but seeing as normal maps usually only describe bumpiness on a mm/cm/inch scale, the roughness parameter is used to measure bumpiness on a micometer scale.

In simple terms, you can think of the "specular" value as being the same as a "spec mask" or "IOR", and you can think of the "smoothness" as being equivalent to "spec power" or "roughness".

As for the sky-visibility term, I assume it's an input for their ambient/indirect lighting equation.

As an addendum-- the Blinn/Phong specular power bit is actually an approximation to evaluating a Gaussian distribution with a specific variance. As Hodgman points out, pretty much every specular model on the market today works on the idea of microfacet reflections-- that all surfaces are actually perfect, mirror-like reflectors. The catch is that when you look at them at a fine enough level of detail, the surfaces themselves are comprised of really, really tiny facets that reflect light in (theoretically) all directions. The Gaussian term from before uses some probability to get around having to evaluate a bunch of reflection math, essentially estimating what percentage of the surface is actually oriented in such a way that it will reflect light towards the viewer (we can do this because of the Law of Large Numbers, which is slightly outside scope). This is actually what the half vector describes, if you think about it. Recall the Law of Reflection for a moment; this states that the angle of incidence is equal to the angle of exitance/reflectance. Therefore, reflecting the light vector around the half vector would yield the view vector.

This has been Physically-Based Shading 101 with InvalidPointer, thanks for playing! :)

EDIT: Some further clarifications.

#4894550 How do I do this?

Posted by on 16 December 2011 - 11:33 AM

The correct answer is 'Blizzard has a really, really absurdly good art team' as there's nothing super-super technical going on. You can achieve a very similar effect by drawing a little fire dragon mesh like that with additive blending and perhaps some basic shader black magic for coloration. The lightning arcs and little fire bits appear to be made using flipbook textures (the Unity/UDK resources linked above should have some more on that, it's mostly some very basic addition and multiplication applied to where to 'look' to read color data from a texture. If not, Google it and you should get like ten skillion hits) and other than that it's really all in the texture quality.

From a 'how do I do any of this' standpoint, have a gander at oriented and view-facing billboarding, which should get you some of the basics. An understanding of matrix math would be very, very helpful for understanding why things are laid out in the way they are. That obviously won't handle some of the more 'boring' mesh stuff, but hopefully you'll already have some exposure to drawing that sort of thing.

EDIT: Rereading my post a bit, I think some bits came off overly technical for a total, start-from-scratch newcomer-- if I'm being a little too opaque feel free to ask for clarification.

#4890190 Kinect with DirectX11

Posted by on 03 December 2011 - 01:17 PM

UpdateResource() or Map() should do more or less exactly what you need.

#4888468 Shaders and dynamic branching : good practise ?

Posted by on 28 November 2011 - 09:59 AM

Thank you for your replies !

so I rewrited my shaders. They all use these 2 functions :

<snip for conciseness>

compilation defines :
CM = color map
AT = alpha testing color map
SM = specularity map
(CM and AT are mutually excusive)

branching :
g_textureMask & 1 -> uses point sampling for color map
g_textureMask & (1<<1) -> uses point sampling for specularity map

I hope this is betterPosted Image

Not... quite. As was mentioned, you can ditch all the branching, etc. outright by just handling samplers yourself at a higher level. Instead of defining a linear and point sampler, reframe things so you have a 'diffuse map' sampler and a 'spec map' sampler. Then, instead of dinking around with the flag parameter, bind the actual samplers yourself via SetSamplerState() or PSSetSamplers() and eliminate the need for bitwise operations or dynamic branching for that case outright. You can do something slightly similar for alpha tests by reformulating your shader like so:

float4 getColor( in float2 uv )
#ifdef CM
	float4 color =  g_ColorMap.Sample(samColor,uv);

#ifdef AT

	// this is a marginally less verbose method of killing pixels and is functionally identical to what you wrote originally
	clip( color.a - AT_EPSILON );


	return color;
	return float4( g_vMaterialColor, 1 );


Now you can again pull this indirection up and out of the shader. Instead of changing samplers, you can now just pick an actual pixel shader directly based on your app-side 'features' bitfield. Notice how much cleaner the code is overall.

#4887282 Shaders and dynamic branching : good practise ?

Posted by on 24 November 2011 - 09:48 AM

Actually I'd suggest dinking around with the texture coordinates over sampler switches, basically snapping them to the center of a particular texel instead of feeding the 'raw' texture coordinate to the sample function. Of course, it can certainly make sense to take care of this at a higher level if you know you'll only need one kind of filtering per geometry batch (that is, you're not getting this mask value from another texture or a vertex attribute) via SetSamplerState() in D3D9 and PSSetSamplers() in D3D10+.

EDIT: And, to be honest, you're doing one of the most textbook cases of Premature Optimization™ I've seen in some time-- look into getting GPUShaderAnalyzer or Parallel nSight based on your GPU manufacturer and get some hard data about the performance characteristics of what you're writing. Integer instructions are also somewhat slower AFAIK, but that knowledge could be a little outdated now; bitmasks like what you're doing are still something of a last-resort solution. Shader permutations aren't necessarily a problem, either, but can be something of a pain when you have 10,000 handwritten permutations to juggle. You can use conditional compilation and some sort of runtime indexing scheme without too much trouble, I'd wager, and can stand to reduce the overall shader complexity to a considerable degree.

#4882272 C++ unordered vector erasure

Posted by on 09 November 2011 - 02:57 PM

Not sure if there's a standard algorithm for it, but it's only two lines of code:

std::swap( container.back(), *iterator );
container.resize( container.size()-1 );

Dumb question, but is there any reason why you aren't pop_back()-ing in the example?

#4880299 Precomputed Radiance Transfer Question

Posted by on 03 November 2011 - 03:45 PM

You're just writing out SH coefficients in RGBA, just like components in a normal vector-- granted you're going to need a few RTs for acceptable results since you need something like second-order coefficients to get results even approaching usable. This cubemap thing makes me feel like we're talking about entirely different concepts, though, so perhaps more explanation of what you're trying to store is in order.

I understand what I am rendering, but what I don't get it how. Do I create a cube map, and wrap that around the SH, or do I just pre-bake the vertices and render the textures

All SH work is done with coefficients, look at what the DXSDK samples do. The confusion may stem from how the basis function values are precomputed and stored in a texture for later scaling in the shader; it's entirely possible to evaluate the Associated Legendre Polynomials directly via ALU and may actually be faster depending on hardware. I'm still a little lost as to what you're trying to store here, though. Is this like SH lighting calculated by way of some deferred shading whatsit? Visibility coefficients? SH lightmap?

#4879464 Differences between Device9::ProcessVertices and Device9::DrawPrimitive?

Posted by on 01 November 2011 - 06:05 PM

You would still need to use DrawPrimitive(), but you can simplify the vertex shader.

What do you mean with "you can simplify the vertex shader"?

I'm sorry by my silly questions but I'm new in DirectX.

Also note that DrawPrimitive() will be significantly slower than DrawIndexedPrimitive() unless the number of primitives is very small, because the index buffer enables the graphics card to cache results from the vertex shader.

Yes I knew that I just use DrawPrimitive() as an example but thanks anyways.

Well remember that you've already done all the fancy skinning, etc. inside ProcessVertices() so you can make a very simple pass-through vertex shader that just shuttles things off to the rasterizer/pixel shader stage. There's no need to run your transformations, etc. again.

#4879254 Difference between Hardware vertex processing and Software vertex processing?

Posted by on 01 November 2011 - 08:54 AM

The only times you'll ever want to use software vertex processing are A) if you need to use a vertex shader model specification higher than what the card currently supports, or B) you're using debug tools that require it. Otherwise, use hardware processing and on top of that prefer 'pure' hardware vertex processing for a performance boost. This does come with a slight programming effort cost, as I think you lose the ability to query some related device state IIRC, but in general it's worth this loss.