Memory alignment question

Started by
5 comments, last by Bregma 12 years, 10 months ago
http://www.ibm.com/developerworks/library/pa-dalign/

Can anyone plz explain why , according to this article, an x86 processor can't read simply 4bytes from an odd address and has to make 2 reads then shift bytes to get aligned to the correct value? I dot see why a processor can't read from an odd address fine!??
Advertisement
The designers didn't do this because it is simpler and probably faster. The fact that x86 even allows misaligned access is unusual, most architectures disallow them.

Can anyone plz explain why , according to this article, an x86 processor can't read simply 4bytes from an odd address and has to make 2 reads then shift bytes to get aligned to the correct value? I dot see why a processor can't read from an odd address fine!??



This is a design choice with hardware. The simple choice is to require all memory access to be aligned. This is true for almost all CPUs out there. Not requiring it would require significantly more effort within the CPU design, more complex and slower processors, and even more unnecessary processing. One reason that many Computer Science programs require a course on computer engineering or architecture is to give enough background so you'll understand these decisions.


The early x86 family is one of the few chipsets that didn't require alignment when loading and storing integers and later with floats. Instead the engineers allowed misaligned access with a performance penalty. Many people believe it was a mistake to do so (I'm among them).

The new data types they have introduced since the early 90's require proper alignment. If you attempt to load any of the SIMD variables (for MMX, SSE, SSE2, SSE3, ...) and the value is improperly aligned the application will simply crash. Attempting to transfer certain blocks of data across newer hardware must also be properly aligned, or it too will simply crash the application.

The early x86 family is one of the few chipsets that didn't require alignment when loading and storing integers and later with floats. Instead the engineers allowed misaligned access with a performance penalty. Many people believe it was a mistake to do so (I'm among them).


Why do you think it was a mistake? If you want the better performance you have the choice to align your data. Or does this allowance also introduce performance penalties even for aligned memory, as compared to an architecture that didnt allow it?

[quote name='nuclear123' timestamp='1306698262' post='4817216']
Can anyone plz explain why , according to this article, an x86 processor can't read simply 4bytes from an odd address and has to make 2 reads then shift bytes to get aligned to the correct value? I dot see why a processor can't read from an odd address fine!??



This is a design choice with hardware. The simple choice is to require all memory access to be aligned. This is true for almost all CPUs out there. Not requiring it would require significantly more effort within the CPU design, more complex and slower processors, and even more unnecessary processing. One reason that many Computer Science programs require a course on computer engineering or architecture is to give enough background so you'll understand these decisions.


The early x86 family is one of the few chipsets that didn't require alignment when loading and storing integers and later with floats. Instead the engineers allowed misaligned access with a performance penalty. Many people believe it was a mistake to do so (I'm among them).

The new data types they have introduced since the early 90's require proper alignment. If you attempt to load any of the SIMD variables (for MMX, SSE, SSE2, SSE3, ...) and the value is improperly aligned the application will simply crash. Attempting to transfer certain blocks of data across newer hardware must also be properly aligned, or it too will simply crash the application.
[/quote]

Actually you CAN do unaligned stores and loads on SSE registers, but you have to choose the correct instruction. Using the unaligned version is significantly slower if you have a CPU older than a Core i7 though. On Core i7, this penalty has been eliminated.

@ hick18: When a CPU gets a read instruction, it actually fetches a whole cache line worth of data. For example, if you want 4 bytes starting at address '5' and you have a cache with 64-byte lines, then the CPU will read the first 64 bytes of memory (bytes 0-63) and put them in a cache line, then get the requested bytes (5-8) into the register. However, if you want bytes 63, 64, 65, 66, (unaligned access which crosses cache line boundaries) the CPU has to fetch 2 cache lines (bytes 0-63 and bytes 64-127) from memory. If all addresses are 4 byte aligned, the CPU 'knows' that it will never need to fetch 2 cache lines for a single read so the associated checks and mechanisms are avoided and the hardware is simpler. Checking only if an address is 4 byte aligned (or other-power-of-two-aligned) is very very simple (the two least significant bits of the address must be zero), so the CPU can just do this check, throw an exception to you and be done with it! The address alignment is something that can be transparently handled by the compiler without the programmer having to deal with it, unless one is programming in assembly language. I believe this is the main reason many people (like frob) think allowing unaligned accesses to memory is a mistake.

[quote name='frob' timestamp='1306729743' post='4817368']
The early x86 family is one of the few chipsets that didn't require alignment when loading and storing integers and later with floats. Instead the engineers allowed misaligned access with a performance penalty. Many people believe it was a mistake to do so (I'm among them).
Why do you think it was a mistake? If you want the better performance you have the choice to align your data. Or does this allowance also introduce performance penalties even for aligned memory, as compared to an architecture that didnt allow it?[/quote]It makes the processor more complicated for little benefit (the benefit being that it enables programmers to be lazy). The cost is that many more transistors are required to build the part of the CPU that handles the load instruction. Before a load is executed, it's got to be examined to see if it's aligned (1 load) or unaligned (2 loads + extra work). Then to optimize this, extra transistors are added to guess which of the two a load will be in advance and preemptively take the right path...

All of those transistors could instead be going towards something useful, like another core, instead of cleaning up for sloppy programming (or, a sloppy instruction set...)

Also, some other architectures support unaligned loads, but via a separate instruction -- this allows programmers/compilers to perform unaligned loads if they really want to, but keeps the CPU simpler as it doesn't have to spend transistors on figuring out which load is which.


Another reason the the x86 "optionally aligned" support is bad, is because it makes atomic operations ambiguous. If two cores are simultaneously writing a 32-bit int to the same memory location, it's only atomic if that location is properly aligned. If it's unaligned, then you can get a race-condition where one core manages to write 2 bytes of the int, while the other core writes the other 2 bytes, resulting in a completely corrupt result. In this case, it would probably be better for your program to crash, so you at least know you've got a bug.


In console game dev, there's a lot of different alignments that pop up - not just 4 byte. In many cases, 16-byte alignment is required, and in some cases, even more restrictive values like 64/128 bytes, or even large values like 4KB. All of these decisions are made to avoid over-complicating the hardware for little benefit.
If all physical addresses are guaranteed to be 4-byte aligned, you do not need the two least significant address lines. That saves 2 pins in the packaging, real estate on the various wafers, and driving current. Cheaper, faster, lower power-consumption spread over millions of production devices means better bottom line for the manufacturer and a better experience for the consumer. Win-win, unless you get a software developer who doesn't really know what he's doing try to outsmart the system. Natural selection should take care of that.

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement