Jump to content
  • Advertisement
Sign in to follow this  
Sc4Freak

[C++] Alignment and arrays

This topic is 3127 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

There are a lot of CPU architectures out there that can't perform misaligned memory reads - older ARM and Motorola chips for example. On these platforms, performing a misaligned memory read will often raise a CPU exception and kill your program. So compilers will often pad out structures to fit alignment requirements. But what about arrays? While structures are allowed to be padded out for alignment, the standard guarantees that elements in an array will be stored contiguously - which means that there will never be padding between individual elements of an array. How does this work? Assume that I'm on an architecture that requires memory reads to be aligned to a 4-byte boundary. Take for example the following:
char* arr = new char[n];
arr[0]; // 1
arr[1]; // 2
I believe operator new[] guarantees that the block of memory returned will be aligned correctly, which means that line 1 is fine. But because there can be no padding in between array elements, arr[1] will always be exactly one byte past arr[0]. So on an architecture that requires aligned memory reads, won't line 2 cause the program to explode?

Share this post


Link to post
Share on other sites
Advertisement
Compiler should do the magic. Read 32 bits aligned, shift/mask the redundant data. And you could do worse, imagine:
[ ][ ][ ][3] [4][ ][ ][ ]
To read the 34, first load the lower dword, filter out the 3, store it. Load the second dword, filter out the 4, store it. Combine both into result.

On x86 unaligned access is not much of a problem, but on some the penalty is really bad for this very reason.

Other ways are possible as well, such as padding the array elements. Also, some embedded hardware provides library-level functions to deal with memory, which is why C (or some subset thereof) is still dominant in such world.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sc4Freak
There are a lot of CPU architectures out there that can't perform misaligned memory reads - older ARM and Motorola chips for example. On these platforms, performing a misaligned memory read will often raise a CPU exception and kill your program.


This is also true for recent CPUs, sometimes even worse, because the register get bigger (64-bit and SIMD), but the compiler will turn unaligned reads and writes into aligned reads and writes like Antheus explained.

Since memory reads and writes are the slowest operations on the CPU a lot of performance can be gained by paying attention to memory alignment.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sc4Freak
There are a lot of CPU architectures out there that can't perform misaligned memory reads - older ARM and Motorola chips for example. On these platforms, performing a misaligned memory read will often raise a CPU exception and kill your program.

So compilers will often pad out structures to fit alignment requirements.

But what about arrays? While structures are allowed to be padded out for alignment, the standard guarantees that elements in an array will be stored contiguously - which means that there will never be padding between individual elements of an array.

How does this work? Assume that I'm on an architecture that requires memory reads to be aligned to a 4-byte boundary. Take for example the following:
char* arr = new char[n];
arr[0]; // 1
arr[1]; // 2


I believe operator new[] guarantees that the block of memory returned will be aligned correctly, which means that line 1 is fine. But because there can be no padding in between array elements, arr[1] will always be exactly one byte past arr[0]. So on an architecture that requires aligned memory reads, won't line 2 cause the program to explode?
You may have misunderstood what an unaligned read is. You don't need all alignments to be to n-bytes. The alignment just has to be appropriate for the size of the read being performed.
An unaligned read is a read for which the size of the read does not come from an address that is a multiple of that size.

At the language level:
Since a char is one byte in size, and one goes into every address a whole number of times, a read from a char cant be misaligned.
The smallest data size at which an unaligned read can usually occur is when performing a read that is two bytes in size. If the address is odd (i.e. not a multiple of two) then performing the read of two bytes is unaligned. E.g.
char *arr = new char[n];
short *hack = reinterpret_cast<short>(arr | 1); // ensure the address is now odd
short boom = *hack; // explode!

The next size a read can be performed of is four bytes, which means that the read is unaligned if the address is not a multiple of four. So when reading an int (4 bytes), the read is unaligned if the address is odd, or is an odd multiple of two. (i.e. either of the lowest two bits are set).

Share this post


Link to post
Share on other sites
Ah. I was under the impression that an aligned memory read meant a pointer had to be a multiple of four or something like that.

Your explanation makes a lot of sense, thanks.

Share this post


Link to post
Share on other sites
No, that is the way it works.

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes). A compiler will take care of it if it works on that platform.

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
No, that is the way it works.

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes). A compiler will take care of it if it works on that platform.

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.


If you turned each of these paragraphs into questions on a true-false test you'd mark F for each and every one. I'd recommend ignoring this post completely.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by thatguyfromthething
No, that is the way it works.

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes). A compiler will take care of it if it works on that platform.

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.


If you turned each of these paragraphs into questions on a true-false test you'd mark F for each and every one. I'd recommend ignoring this post completely.


Between this and other threads it seems to be gamedev staff/moderators that are batting zero.

I will explain in simpler terms I guess.

How it works is there's something called malloc. New is built on this and calls it (in c++ standard). For MSVC++ (in 2008 at least), it then prepends allocation size on any allocation.

Then, for new[] as opposed to plain old new, it has to know how many destructors to call. So, how's it do that? Turns out there's two ways commonly used. One, it has a hash table. Two, it prepends the number of elements, after the size of allocation. Otherwise, when it deletes it has to figure out what size the chunk is and keeping track in some other way is even worse.

So, if you ever overload operator[] for a class, it will ask for a chunk of 4 + numberOfItems*sizeOfArray. So you pass it a pointer to location 100000, then it passes on the address 100004 instead. It's easy enough to check this and if you ever overloaded operator[] for msvc++ you'd know this already. If you haven't, why are you attempting to correct someone who has like a schoolteacher?

Share this post


Link to post
Share on other sites
You are right that the compiler prepends your allocation with additional data. You are wrong that this causes any misaligned reads or slowdown thereof.

Share this post


Link to post
Share on other sites
malloc (for vc++ 2008 anyway, and most others) always returns an aligned to 16 (or is it 8?) value. It has allocation size prepended but that is invisible to you even if you overload operator new[].

operator new[] is on top of malloc.

So, after prepending (you cant avoid this), the return value is always whatever malloc returned offset by sizeof(int).

That means you are never starting on 16 byte aligned boundary for array. Just make a couple array with items sized 16 bytes, then use simd instruction. It crashes every time unless you use a compiler directive to align them properly.

It's kind of crappy implementation, but the alternatives have issues too.

The alignment doesn't cause crash on intel except for simd. It just leads to lots of shuffling, which will cause some slowdown.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!