Sign in to follow this  

[C++] Alignment and arrays

This topic is 2840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

There are a lot of CPU architectures out there that can't perform misaligned memory reads - older ARM and Motorola chips for example. On these platforms, performing a misaligned memory read will often raise a CPU exception and kill your program. So compilers will often pad out structures to fit alignment requirements. But what about arrays? While structures are allowed to be padded out for alignment, the standard guarantees that elements in an array will be stored contiguously - which means that there will never be padding between individual elements of an array. How does this work? Assume that I'm on an architecture that requires memory reads to be aligned to a 4-byte boundary. Take for example the following:
char* arr = new char[n];
arr[0]; // 1
arr[1]; // 2
I believe operator new[] guarantees that the block of memory returned will be aligned correctly, which means that line 1 is fine. But because there can be no padding in between array elements, arr[1] will always be exactly one byte past arr[0]. So on an architecture that requires aligned memory reads, won't line 2 cause the program to explode?

Share this post


Link to post
Share on other sites
Compiler should do the magic. Read 32 bits aligned, shift/mask the redundant data. And you could do worse, imagine:
[ ][ ][ ][3] [4][ ][ ][ ]
To read the 34, first load the lower dword, filter out the 3, store it. Load the second dword, filter out the 4, store it. Combine both into result.

On x86 unaligned access is not much of a problem, but on some the penalty is really bad for this very reason.

Other ways are possible as well, such as padding the array elements. Also, some embedded hardware provides library-level functions to deal with memory, which is why C (or some subset thereof) is still dominant in such world.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sc4Freak
There are a lot of CPU architectures out there that can't perform misaligned memory reads - older ARM and Motorola chips for example. On these platforms, performing a misaligned memory read will often raise a CPU exception and kill your program.


This is also true for recent CPUs, sometimes even worse, because the register get bigger (64-bit and SIMD), but the compiler will turn unaligned reads and writes into aligned reads and writes like Antheus explained.

Since memory reads and writes are the slowest operations on the CPU a lot of performance can be gained by paying attention to memory alignment.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sc4Freak
There are a lot of CPU architectures out there that can't perform misaligned memory reads - older ARM and Motorola chips for example. On these platforms, performing a misaligned memory read will often raise a CPU exception and kill your program.

So compilers will often pad out structures to fit alignment requirements.

But what about arrays? While structures are allowed to be padded out for alignment, the standard guarantees that elements in an array will be stored contiguously - which means that there will never be padding between individual elements of an array.

How does this work? Assume that I'm on an architecture that requires memory reads to be aligned to a 4-byte boundary. Take for example the following:
char* arr = new char[n];
arr[0]; // 1
arr[1]; // 2


I believe operator new[] guarantees that the block of memory returned will be aligned correctly, which means that line 1 is fine. But because there can be no padding in between array elements, arr[1] will always be exactly one byte past arr[0]. So on an architecture that requires aligned memory reads, won't line 2 cause the program to explode?
You may have misunderstood what an unaligned read is. You don't need all alignments to be to n-bytes. The alignment just has to be appropriate for the size of the read being performed.
An unaligned read is a read for which the size of the read does not come from an address that is a multiple of that size.

At the language level:
Since a char is one byte in size, and one goes into every address a whole number of times, a read from a char cant be misaligned.
The smallest data size at which an unaligned read can usually occur is when performing a read that is two bytes in size. If the address is odd (i.e. not a multiple of two) then performing the read of two bytes is unaligned. E.g.
char *arr = new char[n];
short *hack = reinterpret_cast<short>(arr | 1); // ensure the address is now odd
short boom = *hack; // explode!

The next size a read can be performed of is four bytes, which means that the read is unaligned if the address is not a multiple of four. So when reading an int (4 bytes), the read is unaligned if the address is odd, or is an odd multiple of two. (i.e. either of the lowest two bits are set).

Share this post


Link to post
Share on other sites
No, that is the way it works.

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes). A compiler will take care of it if it works on that platform.

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
No, that is the way it works.

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes). A compiler will take care of it if it works on that platform.

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.


If you turned each of these paragraphs into questions on a true-false test you'd mark F for each and every one. I'd recommend ignoring this post completely.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by thatguyfromthething
No, that is the way it works.

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes). A compiler will take care of it if it works on that platform.

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.


If you turned each of these paragraphs into questions on a true-false test you'd mark F for each and every one. I'd recommend ignoring this post completely.


Between this and other threads it seems to be gamedev staff/moderators that are batting zero.

I will explain in simpler terms I guess.

How it works is there's something called malloc. New is built on this and calls it (in c++ standard). For MSVC++ (in 2008 at least), it then prepends allocation size on any allocation.

Then, for new[] as opposed to plain old new, it has to know how many destructors to call. So, how's it do that? Turns out there's two ways commonly used. One, it has a hash table. Two, it prepends the number of elements, after the size of allocation. Otherwise, when it deletes it has to figure out what size the chunk is and keeping track in some other way is even worse.

So, if you ever overload operator[] for a class, it will ask for a chunk of 4 + numberOfItems*sizeOfArray. So you pass it a pointer to location 100000, then it passes on the address 100004 instead. It's easy enough to check this and if you ever overloaded operator[] for msvc++ you'd know this already. If you haven't, why are you attempting to correct someone who has like a schoolteacher?

Share this post


Link to post
Share on other sites
malloc (for vc++ 2008 anyway, and most others) always returns an aligned to 16 (or is it 8?) value. It has allocation size prepended but that is invisible to you even if you overload operator new[].

operator new[] is on top of malloc.

So, after prepending (you cant avoid this), the return value is always whatever malloc returned offset by sizeof(int).

That means you are never starting on 16 byte aligned boundary for array. Just make a couple array with items sized 16 bytes, then use simd instruction. It crashes every time unless you use a compiler directive to align them properly.

It's kind of crappy implementation, but the alternatives have issues too.

The alignment doesn't cause crash on intel except for simd. It just leads to lots of shuffling, which will cause some slowdown.

Share this post


Link to post
Share on other sites
Quote:

Memory accesses on many platforms have to have addresses divisible by some set amount (like 16 bytes).

Yes, just about every computer has some addressing scheme that dictates how fine grained each memory access is. And due to how the hardware works, things like caches will change what actually gets read when you access something.

Quote:

A compiler will take care of it if it works on that platform.

Maybe. The compiler will use what it knows about to write code that works. It is entirely possible to dereference bad pointers and crash because the compiler didn't know you were doing something screwy.
On one platform I've been working on, we use aligned allocators for all our containers.
One possible parameter is the alignment, and feeding it a 1byte alignment will cause the code to crash when accessing the allocated structure. The output assembly assumes assumes that the structure was properly allocated and aligned on a 4byte boundry for all non-char data types.

Quote:

On intel, the processor still has problems with misaligned memory access. It's much slower, but won't crash anything.

Yes x86 is mostly lenient. Even the SIMD operations have aligned and un-aligned instructions, with the unaligned ones being slower. In a sense though "unaligned" can mean a lot more if you take into account that there is word, cache-line, and cache level alignment you may need to take into account. See here

Quote:

For visual C++, you are pretty much guaranteed on intel to have your memory misaligned if you don't use some kind of compiler directive because it prepends the number of elements to any array allocation.

I'd say that is just about every compiler. malloc/new will give you a properly aligned pointer for normal use (as above some platforms for instance can only read an int aligned on 4byte boundries). If you need special use aligment, you have to ask for it with _aligned_malloc. Keep in mind the same is doubly true for all containers in the C++Standard Library. They don't guarantee anything about aligment, and even when fed an aligned allocator, ones like std::list will still screw things up.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
It's kind of crappy implementation, but the alternatives have issues too.


Let's be realistic now. 99.5% of active programmers today do not know what memory alignment is.

90% of programmers do not know what memory management is, and have never used anything but memory managed languages.

Disclaimer: Numbers pulled out of my ass.


The fact that default alignment does not fit certain incredibly niche requirements of some niche API is not really a surprise. All tutorials about SIMD make this point very clear. C++ does not have a concept of SSE, so expecting a general purpose language to meet some platform-specific demands of a potentially niche platform (some don't use x86) is not all that productive.

The point is, C and C++ allow such platform-specific quirks to be implemented, sometimes completely transparently. Conversely, most other languages do not even give that option.

But I honestly do not see a big deal here. Anyone who needs and understands the performance requirements will need to tweak things anyway. For everyone else, it simply doesn't matter.

Ruby is enough for most people. It introduced a renaissance to programming that wasn't seen since BASIC and home computing. And it's AST interpreted.

IDEs and programming tools are being given away for free. As such, demanding more is impractical, since these compilers target large number of different types of users, and need to retain backwards compatibility.

But when using Intel compiler, one could demand and expect more, and likely receive that much as well.

For non-SIMD code, when I last test some 2 years ago, the different was not worth writing home about. For SIMD code - it obviously needs to be aligned, which is what special malloc is there for.

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
As far as I am aware, there is no particular requirement on malloc to produce any given alignment.

From the C standard, referring to calloc(), malloc() and realloc():
Quote:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly freed or reallocated).


@thatguyfromthething
You're guilty of something called overgeneralization. You take something that's true in a narrow context and then claim it's true in a broader context where it isn't true. For instance, processors using the x86 architecture are forgiving of unaligned memory access, but you overgeneralized that to all Intel processors. You take the fact that some compilers implement new on top of malloc() to claim that this behavior is part of the C++ standard whereas the standard explicitly makes it clear that new is not required to be built on malloc(). And, of course, you take the fact that MSVC generates unaligned data in one instance to claim "For visual C++, you are pretty much guaranteed on intel to have your memory misaligned". It's not clear to me if you are a sloppy thinker or a just a sloppy communicator, but either way it renders most of your statements unreliable.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
malloc (for vc++ 2008 anyway, and most others) always returns an aligned to 16 (or is it 8?) value. It has allocation size prepended but that is invisible to you even if you overload operator new[].

operator new[] is on top of malloc.

So, after prepending (you cant avoid this), the return value is always whatever malloc returned offset by sizeof(int).

That means you are never starting on 16 byte aligned boundary for array. Just make a couple array with items sized 16 bytes, then use simd instruction. It crashes every time unless you use a compiler directive to align them properly.



struct sixteen_bytes { float a; float b; float c; float d; };

int main(int argc, char** argv)
{
sixteen_bytes* array = new sixteen_bytes[100];

std::cout
<< "Size: " << sizeof(sixteen_bytes) << "\n"
<< "Addr: 0x" << std::hex << array << "\n"
<< "Mod 16: " << std::dec << ((int)array % 16) << std::endl;
}




Size: 16
Addr: 0x00624E10
Mod 16: 0

VC++ 2008

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Quote:
Original post by thatguyfromthething
It's kind of crappy implementation, but the alternatives have issues too.


Let's be realistic now. 99.5% of active programmers today do not know what memory alignment is.

90% of programmers do not know what memory management is, and have never used anything but memory managed languages.

Disclaimer: Numbers pulled out of my ass.


The fact that default alignment does not fit certain incredibly niche requirements of some niche API is not really a surprise. All tutorials about SIMD make this point very clear. C++ does not have a concept of SSE, so expecting a general purpose language to meet some platform-specific demands of a potentially niche platform (some don't use x86) is not all that productive.

The point is, C and C++ allow such platform-specific quirks to be implemented, sometimes completely transparently. Conversely, most other languages do not even give that option.

But I honestly do not see a big deal here. Anyone who needs and understands the performance requirements will need to tweak things anyway. For everyone else, it simply doesn't matter.

Ruby is enough for most people. It introduced a renaissance to programming that wasn't seen since BASIC and home computing. And it's AST interpreted.

IDEs and programming tools are being given away for free. As such, demanding more is impractical, since these compilers target large number of different types of users, and need to retain backwards compatibility.

But when using Intel compiler, one could demand and expect more, and likely receive that much as well.

For non-SIMD code, when I last test some 2 years ago, the different was not worth writing home about. For SIMD code - it obviously needs to be aligned, which is what special malloc is there for.


Those numbers are probably about right. Or more likely sort of know about it but have never had reason to care.

I wasn't lambasting vc++ or intel, just pointing out how it worked was like the OP originally thought, and I think it's very interesting how it does work.

It seems to be to me if every single thing is aligned you get a good performance boost. If even occasional things are misaligned it seems to send you into shuffle hell, so it's kind of all or nothing but getting every single thing aligned is torture.


Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
It seems to be to me if every single thing is aligned you get a good performance boost. If even occasional things are misaligned it seems to send you into shuffle hell, so it's kind of all or nothing but getting every single thing aligned is torture.


There is no business case/market for such a change at compiler level (MS changing their operator new implementation, the way arrays are allocated, the way structures are padded, ...).

- Mission critical (defense, infrastructure, nuclear, ...) - changing default new would require audit of all existing code, potentially some that is 20 years old, partly in C. Many layers are built, some very hackish, some device-specific. Cost of such audit is prohibitive (change specification, process, testing, QA, deployment). We're literally talking billions. There is no gain - the code that currently runs is written to performance specs. Improving performance would have to *not* matter, or it might break the requirements (function X takes 3-9ms to execute, function y depends on x not taking less than 3 ms).

- Gamedevs and similar. They already use custom allocators, or third-party libraries (physics) which use custom containers and allocators as well. They know why, how and where this works, and one cannot sell generic new to them, since it would, by definition of generic, be worse, even if aligned.

- Business apps. Legacy and poor quality galore. Libraries with no sources anymore, being byte-patched. Third-party proprietary library vendors that have gone out of business decade ago. Opaque systems. Lowest bidder development culture. Process-induced friction.

- Hobbysts. The "stick it to the man" mentality. If it's default, they won't use it. And they will improve on anything generic if they feel like it.

- Academia. Climategate and similar disasters. Ego-driven development that ironically defies scientific method scrutiny, or lowest bidder style development (aka interns/students). Paper and conference is where it begins and ends. Code doesn't even need to compile, that is intellectually inferior task.


So that leaves us with... There literally is nobody that could benefit from such drastic change in core operator change, while everyone above has something, potentially a lot to lose.

There simply is no viable business case (not money, but predominantly beneficial outcome) where one could benefit from such language-level improvement.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
just pointing out how it worked was like the OP originally thought

Except that it doesn't work like that. See the posts before yours in this thread as to why that's the case.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by thatguyfromthething
just pointing out how it worked was like the OP originally thought

Except that it doesn't work like that. See the posts before yours in this thread as to why that's the case.


Yes, I saw it. I ignored it because it's meaningless. Of course an aligned struct is going to line up, the compiler specifically aligns it. Which does not go anything against what I said.

You are trying way too hard, dude. You came off making some strong, very wrong statements in a very negative manner, then turn around and start nitpicking instead (mostly wrong).

Then you continue to stir things up even though I ignored your last post. This is a subject you don't know as much about as I do and coming out and saying 'you are guilty of...blah blah blah' when you ought to be humbly apologizing is not a great way to carry yourself.

Now if you want to continue on, go ahead, but seriously if I ever thought I'd get so much of a flamefest from my statements I'd not have bothered.

Share this post


Link to post
Share on other sites
So you honestly believe that reading the second character in an array of chars on architectures that have memory alignment exceptions will cause the program to crash? You don't think the small fact that any software at all runs on these platforms is a counter example?

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
Then you continue to stir things up even though I ignored your last post. This is a subject you don't know as much about as I do and coming out and saying 'you are guilty of...blah blah blah' when you ought to be humbly apologizing is not a great way to carry yourself.


Sorry, I think you are the one that doesn't know what he is talking about. The fact that a particular allocator prepends the block size just before the pointer it returns doesn't mean anything about the alignment of such pointer. How about you run some simple test and show us that indeed you get 4+something_round? I believe you'll find out that you were wrong.

I don't know of any architecture where you cannot access individual bytes anywhere. Almost by definition, a byte is the smallest size you can get a pointer to; at least that's the way I think about them, and I still haven't found a situation where this rule doesn't work. There are machines where 4-byte types have to be aligned to 4-byte boundaries and 8-byte types have to be aligned to 8-byte boundaries (e.g., Sparc), but I don't think there are any where you can't read whatever byte you want.

You are spreading a lot of misinformation and should be more careful about your statements. The issue at hand is confusing enough without your help.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
Then you continue to stir things up even though I ignored your last post. This is a subject you don't know as much about as I do and coming out and saying 'you are guilty of...blah blah blah' when you ought to be humbly apologizing is not a great way to carry yourself.


Considering that one of your statements re:memory alignment has been proven wrong above I think you have that backwards.

While I know that a mod has contacted you about your conduct in this thread I am also going to say here and now that any further posting in this thread on the subject without demonstrable proof (be it in code form or a 3rd party link to back up your claims) is going to lead to your posts being deleted and a 2 day suspension carried out.

Share this post


Link to post
Share on other sites
Quote:
Original post by thatguyfromthething
Yes, I saw it. I ignored it because it's meaningless. Of course an aligned struct is going to line up, the compiler specifically aligns it. Which does not go anything against what I said.


Does the following go against what you said? Or is this an error in my understanding of what you said?

Quote:
Original post by thatguyfromthething
malloc (for vc++ 2008 anyway, and most others) always returns an aligned to 16 (or is it 8?) value. It has allocation size prepended but that is invisible to you even if you overload operator new[].

operator new[] is on top of malloc.

So, after prepending (you cant avoid this), the return value is always whatever malloc returned offset by sizeof(int).

That means you are never starting on 16 byte aligned boundary for array.



struct sixteen_bytes { float a; float b; float c; float d; };

int main(int argc, char** argv)
{
sixteen_bytes* array = new sixteen_bytes[100];

std::cout
<< "Size: " << sizeof(sixteen_bytes) << "\n"
<< "Addr: 0x" << std::hex << array << "\n"
<< "Mod 16: " << std::dec << ((int)array % 16) << std::endl;
}










Size: 16
Addr: 0x00624E10
Mod 16: 0

VC++ 2008


Quote:
Original post by thatguyfromthething
Just make a couple array with items sized 16 bytes, then use simd instruction. It crashes every time unless you use a compiler directive to align them properly.


This is exactly what I did above. I can augment my code to use SIMD instructions if you like, but I think it should be sufficient to demonstrate that the pointer is aligned on a 16-byte boundary. Did I misread your requirements somehow? I'm being serious. You said that nothing went against what you said, so I feel like maybe I just didn't understand what you said. It really looks to me like you said in VC++ allocating an array of values whose size are 16-bytes, the returned pointer will never be 16 byte-aligned. Please elaborate on how my example above does not conflict with this.

[Edited by - cache_hit on March 2, 2010 8:40:37 PM]

Share this post


Link to post
Share on other sites

struct sixteen_bytes { float a; float b; float c; float d; };

int main(int argc, char** argv)
{
for (int i = 0; i < 1000; i++) {
sixteen_bytes* array = new sixteen_bytes[rand() % 4096 + 1];

int m = (int)array % 16;
if (m) {
std::cout
<< "Size: " << sizeof(sixteen_bytes) << "\n"
<< "Addr: 0x" << std::hex << array << "\n"
<< "Mod 16: " << std::dec << m << std::endl;
return 0;
}
}
}





Fails for 32-bit build.
Quote:
i: 1
Size: 16
Addr: 0x00571068
Mod 16: 8

It doesn't fail for 64-bit.

Although it's hard to determine who is right, since details of how an array is allocated are undefined and implementation specific, and which alignment one should expect is also a matter of debate depending on whether SIMD is used or not.

From observations above it would appear that in 32 bit test application all allocations were on 8-byte boundary, while in 64-bit application they were 16-byte aligned (or 32, I didn't test).


Edit: Oh joy.
Quote:
malloc is required to return memory on a 16-byte boundary.

Now either 32-bit version is broken, or documentation is incorrect, or malloc isn't used by new, or arrays are allocated differently, ....

Share this post


Link to post
Share on other sites

This topic is 2840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this