Jump to content

  • Log In with Google      Sign In   
  • Create Account

Adam_42

Member Since 03 Jul 2006
Offline Last Active Private

#5187042 max size for level using floats

Posted by Adam_42 on 14 October 2014 - 05:19 PM

The best option is to simply try it. Just take some 3d movement and physics code, and move the thing your controlling further and further away from the origin until it stops working nicely. Make sure everything is the correct size and moves at the right speeds when doing this.

 

One significant problem with being far from the origin is that movement starts to feel choppy at low speeds. To take an extreme example, if you're ten million units from the origin then at anything less than about one meter per second of velocity you'll be effectively stationary because 0.1 + 10000000 == 10000000 when using floats. That may not be a problem if ship speeds are measured in km/s and not m/s.

 

There's a handy table at http://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/ which gives details of how much precision you get with various ranges. That value is somewhat idealistic because you'll lose some precision when you do calculations, which is why you need to test it. For example your physics library may stop working reliably significantly before movement becomes choppy.

 

Also note that because of the sign bit you get more precision out of a float when the origin is in the middle of the coordinate system (i.e. the coordinates go from -500,000 to +500,000 instead of 0 - 1,000,000).




#5186328 Find distinct colors in texture, using a Compute Shader

Posted by Adam_42 on 11 October 2014 - 03:12 AM

I can suggest a different scheme than keeping a list of colours. I'd try it on the CPU first, but you might persuade it to work on a GPU too.

 

Create a bit array of length 256*256*256 - it should be 2MB. For each colour in the image, just set the appropriate bit in the array.

 

At the end of processing you can convert that bit array into a more useful format by iterating through it.




#5181099 Intrinsics to improve performance of interpolation / mix functions

Posted by Adam_42 on 17 September 2014 - 01:52 PM

This is the most efficient option I could find. Two instructions to compute the value to test, by rearranging it so you do the subtract first. Unless I've messed up somewhere it should give you the same answer :)

    float test = dot(col2-col1, float3(1.0f / 3.0f, 1.0f / 3.0f, 1.0f / 3.0f));
    if (test > threshold) { ... }



#5178422 Intrinsics to improve performance of interpolation / mix functions

Posted by Adam_42 on 05 September 2014 - 06:09 PM

There's a standard trick to improve the performance of blurs by using the bilinear filtering hardware to half the number of texture fetch instructions required.

 

http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/




#5174005 copying back buffer not working

Posted by Adam_42 on 15 August 2014 - 04:57 PM

You can implement split screen without doing any copying of render targets.

 

The way you to that is to use viewports - the default viewport you get when you set a render target is a full size one, but you can change that to cover a different portion of the render target to do things like split screen.

 

The main advantage of that approach is that it will perform better than copying. Especially if your copies go back and forth to system memory instead of staying on the GPU.

 

If you really need to copy bits of render target around, I'd suggest using StretchRect() with the help of GetSurfaceLevel(). That will avoid the copying to and from system memory.




#5172469 cbuffer per object or share one cbuffer

Posted by Adam_42 on 09 August 2014 - 09:34 AM

Have a read of http://fgiesen.wordpress.com/2013/03/05/mopping-up/

 

At least in that case one single dynamic constant buffer that you rewrite repeatedly was significantly faster than lots of individual constant buffers that you rewrite once per frame.

 

However, it may be different if you don't update them all. You need to test it.




#5170458 Profiling results, GPU or CPU bound?

Posted by Adam_42 on 30 July 2014 - 04:48 PM

In general profiling with vsync enabled isn't very useful, because the delays waiting for the vsync tend to hide the real performance. I'd recommend doing all profiling with vsync off.

 

Enabling triple buffering can hide even more performance issues than vsync alone, but it will also generally give a better experience than double buffering with vsync when the game is running slower than the refresh rate.

 

Also note that you want to profile an optimized build, which wasn't started with the debugger attached (starting with the debugger attached puts the Windows heap into a relatively slow debug mode).

 

What's the performance like with vsync off?




#5170180 Profiling results, GPU or CPU bound?

Posted by Adam_42 on 29 July 2014 - 04:11 PM

The standard way to work out what the bottleneck is is to give the GPU less work to do, and see if it goes any faster. If it does then you're not completely CPU bound. You need to do this while still making the same set of draw calls. There are various ways to do this, including:

 

- Reduce the screen resolution or render target size.

- Set a tiny scissor rectangle.

- Replace pixel / vertex shaders with simple ones.

 

Having said that, if you're looking at single frame spikes with D3D a common cause is using something (shader / texture / vertex buffer / etc.) for the first time. This is because D3D drivers are generally lazy and only fully initialize things on first use. While this saves on doing work for things you never use, in most cases it's really unhelpful. The workaround is to do a bunch of off screen draw calls on the loading screen that make use of every texture and shader. That means you don't have to wait for the driver to, for example, upload a bunch of textures to video RAM when the end of level boss appears.




#5165392 Bad Performance On Intel HD

Posted by Adam_42 on 07 July 2014 - 05:08 PM

What you should always do first when you have a performance problem is try and work out what's slow by a process of elimination, or by using profilers. For example does the frame rate goes up significantly if you replace your pixel shader with one that only samples the diffuse texture?

 

Having said that, I'd bet that your pixel shader is the most expensive thing by far. One simple trick to optimize that is to create six different versions of it for different point light counts (zero lights to 5 lights). Pick the correct shader on the CPU based on how many lights actually affect the object (bounding sphere tests are cheap).

 

You can get a rough idea of how expensive a shader is by compiling it with the command line fxc.exe tool, and looking at the instruction count, although that won't take into account loops unless they get unrolled.




#5163113 DXT5 artifacts

Posted by Adam_42 on 26 June 2014 - 04:58 PM


Renormalize the normals in the shader.

 

They come out normalized when reconstructing Z from X and Y. There's no need to do any extra normalization.

 

In fact you can probably drop the normalize on the last line of that shader.




#5159888 Problem with DirectX 11 - depthbuffer

Posted by Adam_42 on 11 June 2014 - 04:59 PM

My guess would be that you've not set up the depth stencil state correctly. A zero near clip plane can also mess things up.

 

http://msdn.microsoft.com/en-gb/library/windows/desktop/ff476506%28v=vs.85%29.aspx

 

http://msdn.microsoft.com/en-us/library/windows/desktop/ff476463%28v=vs.85%29.aspx




#5159075 Smeared edges pattern using HLSL for color space conversion

Posted by Adam_42 on 08 June 2014 - 07:58 AM

Texture.Load is essentially the same as Texture.Sample when you're using a point filter.

 

If you split the data into three separate textures then you can use Texture.Sample as normal, and whatever filtering you want. It also makes the pixel shader much simpler and faster as you don't need to mess about calculating weird texture coordinates.

 

It should also look better if you use bilinear filtering on the chroma channels, instead of making them blocky by using point sampling.




#5157963 Beginners problem

Posted by Adam_42 on 03 June 2014 - 05:18 PM

You should also make sure that you're using D3D11_CREATE_DEVICE_DEBUG at least in debug builds so D3D will pick up obvious errors for you and tell you about them.




#5157303 Identical math code working differently on C# and C++

Posted by Adam_42 on 01 June 2014 - 05:09 AM

The main problem isn't passing values to and from C#.

 

The way the two different languages transform calculations into assembly is different. Most notably in both cases some of the intermediate calculations may be done at a higher precision than you've specified. There will also be small differences in the results from library functions like sin(). In addition C# depending on which CPU it's run on may generate different assembly when it's JITted, so the C# result almost certainly won't be identical between different PCs.

 

If you want reproducibility integer maths is much easier to do that with.

 

C++ can also be easily disassembled. It's impossible to keep any calculation you do on a computer secret from the owner of that computer. If this is for something like license key validation you should switch to something based on public key cryptography.




#5156012 Well I know the Holly Grail is not a background loading thread($#$@...

Posted by Adam_42 on 26 May 2014 - 08:19 AM

This is a whole lot simpler in D3D11. In D3D11 all functions on the device (e.g. the ones for creating textures) are thread safe. All functions on the device context are not thread safe, those are the functions used for rendering. The only thing to watch out for is that Map() requires the device context, so you shouldn't use it during resource creation.

 

The documentation says that D3D10 simply takes a lock for every API call unless D3D10_CREATE_DEVICE_SINGLETHREADED is specified. So I think you still don't need your own locks. I'd try not using ID3D10Multithread on the loader thread at all.

 

If you want to manually lock for some reason, don't hold the lock for more than a few milliseconds at a time off the main thread. You really don't want to hold it when doing a file read for example as that can easily take hundreds of milliseconds, which will block the render thread for several frames.






PARTNERS