Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 05 Mar 2004
Offline Last Active Dec 07 2014 02:50 AM

#5190633 Is optimization for performance bad or is optimizating too early bad?

Posted by iMalc on 01 November 2014 - 03:10 PM

I should note that there is sometimes a bit of a misconception about premature optimisation.

Avoiding premature optimisation doesn't mean that you ALWAYS just start with the simplest thing that could possibly work. For example, say you are starting out writing what is planned to be a very good chess AI.

You do your research and discover that other people have already studied this area and determined that the only way to get things fast enough is to use bit-boards. Starting out using bit-boards, which is more complex that a straightforward solution, is thus not a premature optimisation. It's an informed need for optimisation, and the only known way to get the required performance.


You either profile then optimise, or you research then optimise, or both.

#5171825 Interrupt handling very slow

Posted by iMalc on 06 August 2014 - 03:51 AM

Generally when the problem appears to be that the Windows message queue mechanism is too slow, that is usually not the case at all.

It can only deliver you the next message when you request it, and you normally only request it when you are finished dealing with the current message. So basically your own slow message processing slows down the receipt of further messages.


One mistake I've come across when processing Windows messages is when you only allow at most one Windows message to be processed each time around your main loop, seemingly giving maximum time to the rest of your drawing code. In reality what happens is that this slows the program down horribly. You should endeavour to process ideally all messages before doing any other stuff in your main loop. Although this may be counter-intuitive, it actually makes a world of difference. Give Windows more attention and promptly process its messages, and your program's performance will be greatly rewarded.


Those are not interrupts and no you do not need a mutex there. In fact the term "interrupts" doesn't have much applicability at all in a Windows program. The terms that you need to be familiar with are synchronous / asynchronous, blocking / non-blocking, or single-threaded / multi-threaded.

You can confirm this for yourself by placing a breakpoint within the HandleInterrupt function, and when the breakpoint is hit, examine the Call Stack window. When you can look back through the call stack of the same thread and find the original GetMessage or PeekMessage call, then what your application is doing is entirely synchronous, or blocking.

#5154160 Most efficient way to write identical data to 2 memory locations simultaneously

Posted by iMalc on 16 May 2014 - 06:40 PM

This sounds similar to the problem of drawing the screen bitmap in one thread and transferring that to the graphics card in another.

When you don't want either thread to have to wait for the other, then the general solution is to use triple-buffering.

Have a read of this and see if it might be what you are looking for:



#5153726 Karatsuba algorithm

Posted by iMalc on 15 May 2014 - 01:41 AM

I implemented that algorithm , but found that the numbers had to be pretty big before it was a win. About 700 bytes IIRC. It would be good to hear how well it works for you.

In case you want to compare notes, my three variations of bigint libraries are here: http://homepages.ihug.co.nz/~aurora76/Malc/Useful_Classes.htm

#5148485 Using C++

Posted by iMalc on 20 April 2014 - 11:50 PM

So I have recently watched several videos on C++. I understand the basics, but how can I apply this into developing a game? 



Let me help put that into perspective for you...


You've just watched several videos on how to use a hammer, and now want to build your own home.

#5148481 C++ std::move() vs std::memcpy()

Posted by iMalc on 20 April 2014 - 10:52 PM

The obvious example that comes to mind, is when the object contains a pointer to one of its other members. I have just been using production code that has exactly that.


The other thing you should not overlook is that memcpy-ing may do most of the job for some types, but it doesn't in itself suppress destruction of the object where the data came from, which you would need to also do to avoid double-deletion. std::move on the other hand, gets implemented to null out pointers from the source location as necessary, causing subsequent destruction to be safe.

#5143168 Random Arbitrary Precision Numbers

Posted by iMalc on 29 March 2014 - 07:13 PM

Okay, here is a rough draft of my new article, comments welcome:


#5142960 Random Arbitrary Precision Numbers

Posted by iMalc on 28 March 2014 - 05:44 PM


one that generates a random number modulo a specified number

Then it isn’t very random. Modulus is absolutely the wrong way to give a range of random numbers from 0 to N, as it favors lower numbers.

This explains why:

However do not use a while loop to generate an unbiased range of random numbers as he suggested; by definition that could be an infinite loop, or at least be severely (order of seconds) slow. Imagine that the range of rand() is 0-268,435,456 (which RAND_MAX is allowed to be by the standard) and you looped until it picked a number below 3.

The correct way is:
unsigned int Rand( unsigned int _uiMax ) {
    double dRand = static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
    return static_cast<unsigned int>(dRand * (_uiMax - 1.0)); // Remove “ - 1.0” to be inclusive instead of exclusive.
L. Spiro


That doesn't produce an even distribution either. That still results in some numbers being more likely than others, but the bias isn't with all the numbers at one end. Rather it might result in every third value being more likely, for example.


It is absolutely not right to call that the "correct" way. It is a way sure, but it by no means a panacea.


The rejection method actually does completely remove bias, not just smear that bias around.

In fact if there were a correct way to do it, it would be the rejection method, which yes involves a while loop. For such a technique to cause an infinite loop, the PRNG you are using would have to be braindead useless, to the point that this attempt at removing bias is futile.


If you do some analysis on how many times a generator is expected to loop around when using the rejection method, you'll find that the number of iterations follows the power law, and realistically the chance of looping much at all is completely insignificant, especially if using a 32-bit generator or larger.

Here is a previous post I have made explaining the issue very clearly:



Ah what the heck, I'm finally starting to write that article on random number generation that I've been meaning to write for a long time. I shall explain everything fully in there, including precisely why the floating point technique used here is actually pretty darn awful. I'll link it in this thread when I'm done.

#5128352 Bit mask math

Posted by iMalc on 03 February 2014 - 02:00 AM

Surely this is more readable?!:

id &= ~(NX::COLLISION_ID_IsPickable | NX::COLLISION_ID_IsHighlightable);
if (id == 200)

#5127156 Generating random points in a sphere

Posted by iMalc on 29 January 2014 - 02:49 AM


Might also make sense to switch over to a spherical coordinate system.  Pick a random value from 0 - 360 for X, 0 - 360 for y, and a random value from 0 - radius for z. Then convert that to the cartesian coordinates.



It's just what I would do, you don't necessarily have to.


How do you pick those random values so the resulting distribution of points in the sphere is uniform? That's precisely what this thread is about. If you do the naive thing you just described, you'll end up with too many points near the center of the sphere and too many points near the poles, or something like that (not sure what "x" and "y" are in your post to know for sure).


Indeed, a spherical coordinate system would create clustering due to the effects of gimbal lock. The same solutions to that problem can be used to overcome this (vector algebra / quaternions)


But yes ultimately the simplest, and quite efficient solution is the rejection method where you try again if the point is outside of the radius. Do a squared-radius check of course, to avoid square roots inside the loop.

#5087534 Generic Datatypes --Are Templates the Way to Go?

Posted by iMalc on 20 August 2013 - 03:58 AM

LinkedList is able to add and remove elements, and more importantly, get an element from a specific index.

That's rather unfortunate that you have done that, because giving a linked-list the ability to access items by index is generally regarded as a design blunder.

It turns any innocent loop using such functionality into a Schlemiel the Painter's algorithm.


LinkedItem and LinkedList also also subclassed by DictionaryItem and Dictionary, respectively. DictionaryItem provides an additional "name" field, and when Dictionary attempts to add a DictionaryItem to the list, it'll check the name field against items currently in the list to make sure it's unique (checking duplicates can be turned off, though).

Generally this is a further mistake. using a linked-list for dictionary-based lookup causes every lookup to be O(n), which get slow very quickly as the number of items increases. It again leads to more occurrences of Schlemiel the Painter's algorithm.


There is a very good reason that the standard library does not do any of this. It totally kills performance for any half decent usage of the container. For containers where key-based lookups are required, you want such an operation to be at most O(sqrt n), and ideally O(1). Typically O(log n) is chosen.
As much as you have a personal attachment to these features that you believe are useful, or even needed, you do need to accept that your implementation will be vastly inferior to the standard C++ library. Your best bet is to spend more time understanding the standard library containers, and come to a better appreciation of just how brilliant they really are.


Would templates work for this situation?

Templates are the solution to making a container generic.

#5074404 Basic Concepts of Programming

Posted by iMalc on 01 July 2013 - 04:32 AM

Agreed. Programming can really only be learnt by doing.


It's just like how you don't learn maths by looking at the questions and then looking at the answers. You might start to get a feel for it, but you really only solidify that knowledge and actually learn it, by solving the problems yourself.


The process of learning is also all about making mistakes, and learning from those mistakes. Being all into the theory side of things might seem great, but the moment something doesn't work as expected, you wont have a clue what is wrong.

#5074399 Display screen remotely

Posted by iMalc on 01 July 2013 - 04:23 AM

Oh, so you modified the Microsoft supplied header files to try and get it to work.


Yeah that's a mistake sorry... You best reinstall what you changed, back to the unmodified versions, and solve the real problem.

#5072414 loop counters as class members? (memory usage)

Posted by iMalc on 24 June 2013 - 01:06 AM

Thanks all, my conclusions:

- making these variables mutable should work, but is tricky (as it is today akready), risk on faulty initializiation etc.
- since they're just simple ints and unsigned ints, no complex / higher level algorithms etc, I will create them all locally within the member functions and cleanup the ones in the class(es)

This also fits perfectly within the given principles.

Don't even look at it as if you have an option or choice there. Misusing a member variable for this purpose has many dire consequences:

  1. It inhibits some optimisation because at the very least, the final value must needlessly be written back into the member variable, and worst case may even cause the compiler to not use a register where it otherwise could.
  2. Stack allocation is cheap, very cheap. So cheap that adding an extra variable your stack frame normally has absolutely zero impact on performance.
  3. As a member it increases the amount of memory your class occupies. With many instances, and/or on smaller objects, this could bump up RAM usage very significantly.
  4. It simply will not work when you try to have nested functions using the same member variable as the for-loop variable.
  5. Making it mutable requires that you also wear the added runtime cost of making it threadsafe.
  6. Trying to use any functions which use the same member variable as a loop counter from multiple threads simply will not work.

Long story short, do not ever misuse use a member variable for the purpose of what should be a local loop variable.

#5071376 Efficiency question

Posted by iMalc on 20 June 2013 - 02:03 AM

You're asked about efficiency and cleanliness. These are two very distinct things.


For efficiency, you're asking about it from a too broad standpoint. It doesn't really make that much sense to talk about a program's overall efficiency. Programs do all sorts of tasks in response to various stimulants. Some of those tasks may get the job done as efficiently as possible, whilst others may burn thousands, millions, billions, or trillions more CPU cycles than are actually required to solve the task at hand.


The most important thing in regards to efficiency is Big-Oh notation. Often you want to be using the algorithms with the best, or close to the best Big-Oh notation. E.g. You avoid this sort of thing: http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm.

Sometimes though, the most efficient thing is the simplest thing, particularly when the numbers concerned are very low.


"Clean" is a very broad term, but it generally means well-designed, well written code, that is as simple as possible but not simpler, and adheres to good practices.

"Clean" is not always equal to "efficient", but the best programmers can pull the two quite close together.