Memory/premature-optimization question

Started by
12 comments, last by ZQJ 17 years, 1 month ago
I've often read that the int data type should be used instead of short/byte, and that it isn't worth it to try to save the extra space. However, I have a case where hundreds of thousands of variables must be kept track of... and none of them could ever get close to the max of int. The highest that any of them could ever go is around 5120. I feel like it's a complete waste. I'm using C#2.0 with the .NET Framework, if this makes a difference. Should I design from the bottom up with the closest data type, or should I use int? Does the .NET Framework automatically optimize the alignment of classes or do I have to "order" them myself?
Advertisement
You can ignore this bit:
In C and C++, the 'int' type is supposed to map to whatever the native machine word size is on the target platform, and so it should be as fast as possible for general use.
In practice, 'int' has been assumed to be 32-bit by so many developers for so long that I wouldn't be surprised if 'int' remains 32-bit for a long time, even as 64-bit processors become the norm.
C# is different though. In C#, 'int' is defined to be 32-bit ('long' is defined to be 64-bit), so in abstract theoretical terms, there is no speed advantage to using 'int', since it isn't supposed to be 'the native integer type'. In practice though, 32-bit integers will often be one of the fastest types to work with.
Despite all of that - it doesn't matter one bit. Trust your compiler - it really does know more about the low-level speed properties of your computer than you do.

This is my actual response:
For the real answer: Profile your code. Estimate the expected and maximum number of data items, and estimate how much memory will be needed to store that number of items if you use int or if you use short. Decide what your target platform is.
There isn't a universal "right answer" to your question, but there probably is a "right answer" for your particular situation - you just have to put the question in context to see it.

Now in my opinion, based on pure guesswork, if you're storing lots of data items, then picking the smallest data type that gives the range you need is probably worth it. It can increase the number of data items that can fit in a cache line, and it can decrease the number of pages needed to store your application's working set. Both of these effects will (again, this is pure guesswork) outweigh the minor speed hit that you might get from using a more processor-friendly data size.
The reduced working set will matter more if your program is dealing with so much data that it's pushing the limits of RAM, or if your program is running in an environment with lots of other processes running at the same time (ie, lots of competition for physical memory).
The increased number of data items in a cache line will matter more if your program has to traverse your entire collection of items in order. If you're accessing items out-of-order then it would be better to have data items that are sized to hit the juicy cache-alignment points.

John B
The best thing about the internet is the way people with no experience or qualifications can pretend to be completely superior to other people who have no experience or qualifications.
If you use individual variables, it's probably better to use 'int'. If you have lots of variables, such as an array, I recommend to use 'short/byte' type instead, as the total memory savings will be more advantageous.

You will find that even if you use individual 'short/byte' declarations, the compiler will most likely expand them to the CPU's native integer size anyway (on the assembler level).
Latest project: Sideways Racing on the iPad
What exactly are you keeping in memory? 1 million integers is only like 3.82 megabytes. :) Have fun filling up your memory with variables, it's not gonna happen.
Quote:Original post by Daniel Miller
I've often read that the int data type should be used instead of short/byte, and that it isn't worth it to try to save the extra space. However, I have a case where hundreds of thousands of variables must be kept track of... and none of them could ever get close to the max of int. The highest that any of them could ever go is around 5120. I feel like it's a complete waste.

Depending on your exact requirements, the answer could be anything from "use all ints" to "use packed unsigned 13-bit integers". Do it different ways and see what space/speed tradeoff is most acceptable for your particular situation. There are no "always"es or "never"s here.
Original post by Sneftel
Quote:Original post by Daniel Miller
Depending on your exact requirements, the answer could be anything from "use all ints" to "use packed unsigned 13-bit integers". Do it different ways and see what space/speed tradeoff is most acceptable for your particular situation. There are no "always"es or "never"s here.


Even at that, using 13 bit over 32 bit isn't that much of a saving. 500 Mb or 1 Gb doesn't really make much difference.

And if we're talking more data, then you need some virtual memory/paging/external storage in the first place.

But if this is about 5 or 12 Mb of memory, then it doesn't make any difference. The .Net framework weighs more than that.

Keep in mind that saying "could never go" in this case is an extremly risky proposition. If the application will be used twice then deleted then memory usage isn't a concern.

But things could get a lot worse. Application could get used. And feature requests start coming in. And at some point, a value larger than that will apear. And this is how maintainance programmer's nightmares begin.

The era of 640k is enough for everyone is long gone (with exception of embedded software)

Quote:Original post by Sneftel
Quote:Original post by Daniel Miller
I've often read that the int data type should be used instead of short/byte, and that it isn't worth it to try to save the extra space. However, I have a case where hundreds of thousands of variables must be kept track of... and none of them could ever get close to the max of int. The highest that any of them could ever go is around 5120. I feel like it's a complete waste.

Depending on your exact requirements, the answer could be anything from "use all ints" to "use packed unsigned 13-bit integers". Do it different ways and see what space/speed tradeoff is most acceptable for your particular situation. There are no "always"es or "never"s here.


I'm storing positions, actions, states, and paths for units/structures in a RTS game.

After thinking about it, it won't occupy more than a few MB, so I am being paranoid.

Thanks for the responses everyone, sorry for the ridiculous question!
Quote:Original post by Antheus
Even at that, using 13 bit over 32 bit isn't that much of a saving. 500 Mb or 1 Gb doesn't really make much difference.

It sure does if you have 768 MB of RAM installed. Big-O notation will only get you so far.
If you have a data arrayed use small data types for sparse data use ints since it will get aligned anyway. I personally think you are making a big problem out of a little one. The problem with starting off on high level langauges like C# is that you won't know the answer to basic questions like this one.
Quote:Original post by Sneftel
Quote:Original post by Antheus
Even at that, using 13 bit over 32 bit isn't that much of a saving. 500 Mb or 1 Gb doesn't really make much difference.

It sure does if you have 768 MB of RAM installed. Big-O notation will only get you so far.

And similarly, 500KB vs 1MB can make a lot of difference when you've got 512KB of cache. You don't need a profiler to tell you that an array of hundreds of thousands of ints will waste a lot of memory compared to an array of shorts, or that cache misses are (in general) quite often a performance bottleneck. You might need one to tell you that cache misses are a significant problem in your actual code; but it doesn't hurt to use short instead of int when you can guarantee the values will always fit (e.g. your game engine will never ever support 32k×32k-tile worlds), and there's no reason to use a not-more-efficient approach if it provides no benefits and has at least potential disadvantages.

This topic is closed to new replies.

Advertisement