memory alignment in c - atrocious?

Started by
6 comments, last by Jan Wassenberg 15 years, 11 months ago
I was curious about C structures and how they were laid out in memory. I wrote the following programming to give show me a few things about structures.


#include <stdlib.h>
#include <stdio.h>

int type_banana = 1;
int type_orange = 2;
int type_apple = 3;

struct object {
	int type;
	char* size;
};
typedef struct object object;

struct banana {
	int type;
	char* size;
	float softness;
	int length;
};
typedef struct banana banana;

struct orange {
	int type;
	char* size;
	unsigned char peeled;
};
typedef struct orange orange;

struct apple {
	int type;
	char* size;
	short weight;
	char* color;
};
typedef struct apple apple;

banana* make_banana(char* size, float softness, int length) {
	banana* b = (banana*)malloc(sizeof(banana));
	b->type = type_banana;
	b->size = size;
	b->softness = softness;
	b->length = length;
	return b;
}

orange* make_orange(char* size, unsigned char peeled) {
	orange* o = (orange*)malloc(sizeof(orange));
	o->type = type_orange;
	o->size = size;
	o->peeled = peeled;
	return o;
}

apple* make_apple(char* size, float weight, char* color) {
	apple* a = (apple*)malloc(sizeof(apple));
	a->type = type_apple;
	a->size = size;
	a->weight = weight;
	a->color = color;
	return a;
}

int main(int argc, char** argv) {
	apple* a = make_apple("small", 2.5, "red");
	banana* b = make_banana("big", 1.5, 5);

	object* generic1 = (object*)a;
	object* generic2 = (object*)b;

	printf("generic1 size: %s\n", a->size);
	printf("generic2 size: %s\n", b->size);

	printf("\nLocations:\n");
	printf("apple:         0x%x\n", a);
	printf("apple->type:   0x%x\n", &a->type);
	printf("apple->size:   0x%x\n", &a->size);
	printf("apple->weight: 0x%x\n", &a->weight);
	printf("apple->color:  0x%x\n", &a->color);

	printf("\nSizes:\n");
	printf("apple:         %d\n", sizeof(apple));
	printf("apple->type:   %d\n", (long)&a->size - (long)&a->type);
	printf("apple->size:   %d\n", (long)&a->weight - (long)&a->size);
	printf("apple->weight: %d\n", (long)&a->color - (long)&a->weight);
	printf("apple->color:  %d\n", ((long)a + sizeof(apple)) - (long)&a->color);

}

The output is:

generic1 size: small
generic2 size: big

Locations:
apple:         0x100120
apple->type:   0x100120
apple->size:   0x100124
apple->weight: 0x100128
apple->color:  0x10012c

Sizes:
apple:         16
apple->type:   4
apple->size:   4
apple->weight: 4
apple->color:  4

I compiled this Apple's gcc 4.0.1 without any options. Obviously the memory is 4-byte aligned. Is memory alignment something that can be depended on, or is it something so horrific between vendors/compilers/operating systems that you should never make any assumptions on it? If I made assumptions on memory alignment, I would have to ensure that all binaries (such as loadable libraries) were compiled right. Seems pretty scary to me - but there are some definite optimizations that could be made. I'm just curious to hear from anyone who have already felt this out.
Advertisement
You can't make any assumptions about padding afaik. It's a pure performance optimization. All compilers provide means of controlling the padding though. MSVC uses pragma pack, GCC uses pragma pack or __attribute(pack) if I remember correctly.
As far as I'm aware, there's no guarantee about alignment, but compilers will usually align members on 4-byte boundaries in a x86 build (I haven't checked a x64 build, but ISTR that the default alignment is 8-byes).
Generally, you'll have all of your modules compiled with the same compiler, so this isn't much of a concern. If you want proper safety, you can turn on packing for the structs you need (In MSVC that's #pragma pack(1), I belive GCC uses some __attribute).
Quote:Original post by Evil Steve
Generally, you'll have all of your modules compiled with the same compiler, so this isn't much of a concern. If you want proper safety, you can turn on packing for the structs you need (In MSVC that's #pragma pack(1), I belive GCC uses some __attribute).


I'm glad to hear that it's not completely unreliable. Thanks.

I did some research on the reason for memory alignment. The wikipedia article seems like a good summarization:

http://en.wikipedia.org/wiki/Memory_alignment

Essentially, a CPU which requires a 4-byte boundary is assuming that 4 bytes will hold the largest primitive datum, which would be your standard integer? Then all pages/caches/etc. deal with memory in 4 byte chunks, and makes sure no datum is split across pages or memory boundaries.

Does that mean using `long long' variables (meaning 8 byte integers) is dramatically slower than 4 byte integers (on 32-bit architectures)?

I'm also curious about one of the statements in the above article:

"SSE2 instructions on x86 and x64 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures."

16 bytes? Really?

(edit: clarifications)
Quote:Original post by okonomiyaki
"SSE2 instructions on x86 and x64 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures."

16 bytes? Really?
Ja. SSE2 and Altivec both use 128-bit registers internally, meaning that you can operate on 4 floats or 2 doubles simultaneously .

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Adding more addressing modes and formats means that the individual instructions get more complicated (and therefore slower). Since SSE is focused primarily on performance it makes sense to only support the fastest alignment/addressing method possible.
Quote:Original post by okonomiyaki
Essentially, a CPU which requires a 4-byte boundary is assuming that 4 bytes will hold the largest primitive datum, which would be your standard integer? Then all pages/caches/etc. deal with memory in 4 byte chunks, and makes sure no datum is split across pages or memory boundaries.

Pages and caches don't really have anything to do with this. They have different alignment requirements. X86 page size is 4k and they are always aligned with 4k boundaries in the physical memory. Caches tend to be aligned to boundaries equal to the size of cache lines.

Some CPUs, such as ARM, do require that 32-bit entities are aligned to 4-byte boundaries. Unlike x86, where unaligned dword accesses result in multiple memory accesses, ARM just cannot access unaligned 32-bit data directly, there are no such instructions. Of course you can always read 8-bits at a time and shift and or to make a 32-bit value. BTW, on ARM it's faster to copy memory between buffers that are aligned the same way for this reason. E.g. it takes more time to copy between buffers that start at addresses 0xBEEF0001 and 0xDEAD0002 than 0xBEEF0001 and 0xDEAD0001 (probably the same is true of x86 though).

Quote:
Does that mean using `long long' variables (meaning 8 byte integers) is dramatically slower than 4 byte integers (on 32-bit architectures)?

Yes, but not because of alignment issues.
There are no instructions for handling 64-bit integers atomically on 32-bit architectures (or at least on x86 and ARM). For the CPU core, it wouldn't make any difference if the two 32-bit words weren't even adjacent in memory. (Caching and paging is another thing.)

Quote:
16 bytes? Really?

I can't remember for sure and didn't bother checking Intel's docs, but I believe so.
Quote:Since SSE is focused primarily on performance it makes sense to only support the fastest alignment/addressing method possible.

Note: there are two SSE instructions that exist solely for loading unaligned data. Depending on microarchitecture there may or may not be a penalty relative to the normal move instruction if the operand turns out to be aligned, and definitely so if unaligned.

Quote:Pages and caches don't really have anything to do with this. They have different alignment requirements. X86 page size is 4k and they are always aligned with 4k boundaries in the physical memory. Caches tend to be aligned to boundaries equal to the size of cache lines.

Pages and caches are relevant here because it is much more expensive to additionally cross a page border rather than just a cache line (which 'only' seems to cause L1d miss). Cache lines are indeed by definition (rather than "tend to be") aligned to their size due to the way they are addressed.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement