Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 03 Jul 2006
Offline Last Active Private

#5303231 Multithreading Library Preference

Posted by Adam_42 on Today, 02:40 PM

Here's a few possibilities:


- The libraries weren't available when the game engine was created. For example PPL first appeared with Visual Studio 2010.

- The libraries may not support all the platforms that the game engine does.

- They may not support the features required for the game (OpenMP certainly falls short there).

- Cost. As far as I can tell, TBB isn't free for commercial use.

#5296730 DirectXMath - storing transform matrices

Posted by Adam_42 on 15 June 2016 - 03:19 PM

If you use the "vectorcall" calling convention it should be returned from the function in registers.

#5296537 DirectXMath - storing transform matrices

Posted by Adam_42 on 14 June 2016 - 05:15 PM

The simple option is to override global operator new and call _aligned_malloc (or similar) from it. If you do that you can make every heap allocation 16-byte aligned, with only a few lines of code in one place.

Doing that also means you can do things like putting aligned types in a std::vector.

Note that there's several variants of operator new and delete that you'll need to replace - you want to change the non-throwing variants as well.


Also note that if you compile as x64 instead of x86 then you don't need to do any of that. The standard heap allocation alignment on 64-bit Windows is 16 bytes.

#5296197 whats the "Big O" of this algo?

Posted by Adam_42 on 12 June 2016 - 05:27 AM

For reference, the sort algorithm that the standard library implements as std::sort() is usually https://en.wikipedia.org/wiki/Introsort which has an O(n log n) average and worst case performance, but it could also be something else.


https://en.wikipedia.org/wiki/Sorting_algorithm has a big table comparing performance and memory use of a variety of sort algorithms.

#5296030 Trivially trivial?

Posted by Adam_42 on 10 June 2016 - 05:32 PM

You could set up your own trait called say IsRelocatable<T>, other people have done it that way - https://github.com/facebook/folly/blob/master/folly/docs/Traits.md


I found some discussions at https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/wphImiqfX7Y[1-25] which suggest that some possible implementations of virtual functions aren't compatible with memcpy() but I don't know of any compilers where that's actually true.

#5295019 [SharpDX] DXGI Adapter Video Memory problem

Posted by Adam_42 on 04 June 2016 - 05:48 PM

Looking at the documentation https://msdn.microsoft.com/en-us/library/windows/desktop/bb173058%28v=vs.85%29.aspx you can see the size is stored in a SIZE_T type, which on a 32-bit PC will be an unsigned 32-bit integer.


If you treat -1073741824 as an unsigned value, it's actually 3GB.


It's possible you will get a different result if you compile your program as 64-bit.

#5293449 DX9 Doubling Memory Usage in x64

Posted by Adam_42 on 25 May 2016 - 05:11 PM

I can guess at one possibility. Some time ago Microsoft optimized the address space usage of D3D9 on Vista - see https://support.microsoft.com/en-gb/kb/940105 It's possible that that optimization was only applied to x86 as you're not going to run out of address space on x64.


Is this extra memory usage actually causing a significant, measurable performance issue? If not I wouldn't worry about it.


If you really want to investigate what's going on, I'd suggest creating the simplest possible test program that shows the memory usage difference, and using a tool like https://technet.microsoft.com/en-us/sysinternals/vmmap.aspx to investigate how memory gets allocated differently.

#5287244 Downsampling texture size importance

Posted by Adam_42 on 16 April 2016 - 07:21 PM

To render to part of a render target, what you usually want to do is to adjust the viewport. This allows you to render anything you want - it will handle the scaling and clipping for you.


The only downside is that when sampling from the render target, you can't clamp or wrap at the edge of the viewport, so it's easy to read outside of the viewport that you wrote to.

#5285988 D3D alternative for OpenGL gl_BaseInstanceARB

Posted by Adam_42 on 09 April 2016 - 05:31 AM

The standard technique to draw multiple copies of the same thing in D3D is called instancing.


There's a decent explanation of how to do that in D3D9 at: https://msdn.microsoft.com/en-us/library/windows/desktop/bb173349%28v=vs.85%29.aspx


You can do the same thing in D3D11, but the API is a bit different. There's some example code at: http://www.rastertek.com/dx11tut37.html

#5283954 Need help understanding line in code snippet for linked list queue please.

Posted by Adam_42 on 28 March 2016 - 05:49 PM

That code appears to be formatted for fitting in limited vertical space for printing - that is it's deliberately making the code less readable to make it fit on one page. It's also using three xors to swap two variables, instead of using std::swap(), which is much more readable (and probably faster too).


The line of code in question would be easier to read if it was split up into two or three separate statements:

int operandCount = (n/2) - 1;
if ( count > operandCount )

I think the reasoning behind the test is that when there's only binary operators available, there will always be a known ratio of operators to operands, and that is testing for it. It's not a very nice way to detect the end of the expression though.


That is:

- With an input of length 3, you must have one operator and two operands.

- Input of length 4 is invalid.

- Input of length 5 will always have two operators.

- etc.

#5281760 casting double* to float*

Posted by Adam_42 on 17 March 2016 - 04:23 PM

The most significant performance hit for using doubles will probably come from the CPU cache misses and memory bandwidth caused by them taking up twice as much memory.


In addition SSE instructions can handle four floats at a time, but only two doubles at a time. So for code which uses them the performance hit can be significant.


For basic operations on values in registers, floats aren't significantly faster than doubles. For some more complex operations (like division) doubles will be slower than floats, as they have more precision, but overall you probably won't notice much difference.


For details on specific instructions look at Intel's optimization manual - http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html (for example you can compare the divsd and divss instructions there, to see the timings for float vs double division).

#5278428 How to find the cause of framedrops

Posted by Adam_42 on 27 February 2016 - 06:38 AM

One possible cause of stalls is that drivers tend to upload textures (and other resources like shaders) on first use.


This can cause a noticeable stall if a large number of things get used for the first time on one frame.



From: https://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/



Incidentally, this is also the reason why you’ll often see a delay the first time you use a new shader or resource; a lot of the creation/compilation work is deferred by the driver and only executed when it’s actually necessary (you wouldn’t believe how much unused crap some apps create!). Graphics programmers know the other side of the story – if you want to make sure something is actually created (as opposed to just having memory reserved), you need to issue a dummy draw call that uses it to “warm it up”. Ugly and annoying, but this has been the case since I first started using 3D hardware in 1999 – meaning, it’s pretty much a fact of life by this point, so get used to it. :)


#5275813 MSVC generating much slower code compared to GCC

Posted by Adam_42 on 15 February 2016 - 03:44 PM

There's one thing Visual Studio can do to trip up performance measurements if you're not aware of it, which has nothing to do with the compiler.


If you run a program by pressing F5 then you will get the Windows debug heapenabled, which is much much slower than the non-debug one.


The simple workaround is to launch it without the debugger attached by using Control+F5 if you're doing performance testing.

#5275681 Multithreading Nowadays

Posted by Adam_42 on 14 February 2016 - 06:18 PM

I think at least for games, another significant problem with hyperthreading is that many workloads don't scale perfectly with core count. See https://en.wikipedia.org/wiki/Amdahl's_law for one cause of poor scaling.


That is, even if you compare say two core performance to four core performance without any hyperthreading, then the four cores probably won't be exactly double the speed of two cores. It might get say 1.8 times faster instead.


This means that the performance benefit from hyperthreading has to be higher than the overheads from using more threads, if it's going to actually improve performance.

#5275578 Do game developers still have any reason to support Direct3D 10 cards?

Posted by Adam_42 on 13 February 2016 - 03:21 PM

It's probably some combination of:


- The Xbox One and PS4 use D3D11 hardware, so for a cross platform game it makes sense to use the same feature set on all platforms.


- It would add more work to target D3D10 cards (which are missing some handy features that D3D11 has, especially if they are 10.0 instead of 10.1). Compute shader limitations spring to mind as an obvious example.


- D3D10 cards are older and therefore slower, for some games they wouldn't be fast enough even if they were supported.


- According to http://store.steampowered.com/hwsurvey only about 15% of Steam users have a DX10 capable PC, 85% are DX11 or better. Looking at the list of DX10 GPUs people have is also informative.