Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 03 Jul 2006
Offline Last Active Private

#5305722 lpCmdLine, open with and spaces

Posted by on 14 August 2016 - 05:22 AM

You have one small bug there - you're not freeing the memory correctly.


From the documentation:


CommandLineToArgvW allocates a block of contiguous memory for pointers to the argument strings, and for the argument strings themselves; the calling application must free the memory used by the argument list when it is no longer needed. To free the memory, use a single call to the LocalFree function.

#5305681 lpCmdLine, open with and spaces

Posted by on 13 August 2016 - 05:47 PM

The standard windows function to parse the command line into the more usual argc and argv format is CommandLineToArgvW().


It does handle quoted arguments.

#5304761 For Loop Max Bug

Posted by on 08 August 2016 - 04:40 PM

However, this means, that when adding points to the volume, I either have to ADD ONE to the maxes when adding points, or add one to the maxes AFTER adding all the points. Adding one to the maxes during the add is a needless extra computation (when you are dealing with thousands of points). But it is possible to FORGET to add one to the maxes after adding the points, leading to subtle bugs, which should be something I am detecting at compile time, rather than relying on remembering to deal with.


I'd recommend writing code primarily for readability and correctness, unless it involves a big sacrifice of performance to do so. You can always try to optimize any bottlenecks found by your profiler later on.


In that particular case there's a good chance that the compiler will work out that it can move the +1 outside of the loop anyway.


It sounds like you might also benefit from some unit tests, so that you can find out about things that get broken at compile time (assuming you run the tests at the end of each successful build).

#5303231 Multithreading Library Preference

Posted by on 30 July 2016 - 02:40 PM

Here's a few possibilities:


- The libraries weren't available when the game engine was created. For example PPL first appeared with Visual Studio 2010.

- The libraries may not support all the platforms that the game engine does.

- They may not support the features required for the game (OpenMP certainly falls short there).

- Cost. As far as I can tell, TBB isn't free for commercial use.

#5296730 DirectXMath - storing transform matrices

Posted by on 15 June 2016 - 03:19 PM

If you use the "vectorcall" calling convention it should be returned from the function in registers.

#5296537 DirectXMath - storing transform matrices

Posted by on 14 June 2016 - 05:15 PM

The simple option is to override global operator new and call _aligned_malloc (or similar) from it. If you do that you can make every heap allocation 16-byte aligned, with only a few lines of code in one place.

Doing that also means you can do things like putting aligned types in a std::vector.

Note that there's several variants of operator new and delete that you'll need to replace - you want to change the non-throwing variants as well.


Also note that if you compile as x64 instead of x86 then you don't need to do any of that. The standard heap allocation alignment on 64-bit Windows is 16 bytes.

#5296197 whats the "Big O" of this algo?

Posted by on 12 June 2016 - 05:27 AM

For reference, the sort algorithm that the standard library implements as std::sort() is usually https://en.wikipedia.org/wiki/Introsort which has an O(n log n) average and worst case performance, but it could also be something else.


https://en.wikipedia.org/wiki/Sorting_algorithm has a big table comparing performance and memory use of a variety of sort algorithms.

#5296030 Trivially trivial?

Posted by on 10 June 2016 - 05:32 PM

You could set up your own trait called say IsRelocatable<T>, other people have done it that way - https://github.com/facebook/folly/blob/master/folly/docs/Traits.md


I found some discussions at https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/wphImiqfX7Y[1-25] which suggest that some possible implementations of virtual functions aren't compatible with memcpy() but I don't know of any compilers where that's actually true.

#5295019 [SharpDX] DXGI Adapter Video Memory problem

Posted by on 04 June 2016 - 05:48 PM

Looking at the documentation https://msdn.microsoft.com/en-us/library/windows/desktop/bb173058%28v=vs.85%29.aspx you can see the size is stored in a SIZE_T type, which on a 32-bit PC will be an unsigned 32-bit integer.


If you treat -1073741824 as an unsigned value, it's actually 3GB.


It's possible you will get a different result if you compile your program as 64-bit.

#5293449 DX9 Doubling Memory Usage in x64

Posted by on 25 May 2016 - 05:11 PM

I can guess at one possibility. Some time ago Microsoft optimized the address space usage of D3D9 on Vista - see https://support.microsoft.com/en-gb/kb/940105 It's possible that that optimization was only applied to x86 as you're not going to run out of address space on x64.


Is this extra memory usage actually causing a significant, measurable performance issue? If not I wouldn't worry about it.


If you really want to investigate what's going on, I'd suggest creating the simplest possible test program that shows the memory usage difference, and using a tool like https://technet.microsoft.com/en-us/sysinternals/vmmap.aspx to investigate how memory gets allocated differently.

#5287244 Downsampling texture size importance

Posted by on 16 April 2016 - 07:21 PM

To render to part of a render target, what you usually want to do is to adjust the viewport. This allows you to render anything you want - it will handle the scaling and clipping for you.


The only downside is that when sampling from the render target, you can't clamp or wrap at the edge of the viewport, so it's easy to read outside of the viewport that you wrote to.

#5285988 D3D alternative for OpenGL gl_BaseInstanceARB

Posted by on 09 April 2016 - 05:31 AM

The standard technique to draw multiple copies of the same thing in D3D is called instancing.


There's a decent explanation of how to do that in D3D9 at: https://msdn.microsoft.com/en-us/library/windows/desktop/bb173349%28v=vs.85%29.aspx


You can do the same thing in D3D11, but the API is a bit different. There's some example code at: http://www.rastertek.com/dx11tut37.html

#5283954 Need help understanding line in code snippet for linked list queue please.

Posted by on 28 March 2016 - 05:49 PM

That code appears to be formatted for fitting in limited vertical space for printing - that is it's deliberately making the code less readable to make it fit on one page. It's also using three xors to swap two variables, instead of using std::swap(), which is much more readable (and probably faster too).


The line of code in question would be easier to read if it was split up into two or three separate statements:

int operandCount = (n/2) - 1;
if ( count > operandCount )

I think the reasoning behind the test is that when there's only binary operators available, there will always be a known ratio of operators to operands, and that is testing for it. It's not a very nice way to detect the end of the expression though.


That is:

- With an input of length 3, you must have one operator and two operands.

- Input of length 4 is invalid.

- Input of length 5 will always have two operators.

- etc.

#5281760 casting double* to float*

Posted by on 17 March 2016 - 04:23 PM

The most significant performance hit for using doubles will probably come from the CPU cache misses and memory bandwidth caused by them taking up twice as much memory.


In addition SSE instructions can handle four floats at a time, but only two doubles at a time. So for code which uses them the performance hit can be significant.


For basic operations on values in registers, floats aren't significantly faster than doubles. For some more complex operations (like division) doubles will be slower than floats, as they have more precision, but overall you probably won't notice much difference.


For details on specific instructions look at Intel's optimization manual - http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html (for example you can compare the divsd and divss instructions there, to see the timings for float vs double division).

#5278428 How to find the cause of framedrops

Posted by on 27 February 2016 - 06:38 AM

One possible cause of stalls is that drivers tend to upload textures (and other resources like shaders) on first use.


This can cause a noticeable stall if a large number of things get used for the first time on one frame.



From: https://fgiesen.wordpress.com/2011/07/01/a-trip-through-the-graphics-pipeline-2011-part-1/



Incidentally, this is also the reason why you’ll often see a delay the first time you use a new shader or resource; a lot of the creation/compilation work is deferred by the driver and only executed when it’s actually necessary (you wouldn’t believe how much unused crap some apps create!). Graphics programmers know the other side of the story – if you want to make sure something is actually created (as opposed to just having memory reserved), you need to issue a dummy draw call that uses it to “warm it up”. Ugly and annoying, but this has been the case since I first started using 3D hardware in 1999 – meaning, it’s pretty much a fact of life by this point, so get used to it. :)