Data alignment on ARM processors

Started by
6 comments, last by swiftcoder 7 years, 11 months ago

Hello,

I recently faced a problem with some C code, on ARM processors, where data alignment is of importance. I typically have following 2 scenarios:


float value=doLittleBigEndianConversion(((int*)(ucharPtr+offset))[0]);

and


((int*)(ucharPtr+offset))[0]=someIntValue;

Depending on the offset, I get some crashes. So the solution to scenario 1 is (for instance):


float value;
memcpy (&value,ucharPtr+offset,sizeof(float));
value=doLittleBigEndianConversion(value);

Which solves the alignment problem. My question is now: How can I fix the scenario2 code, so that it also works on ARM processors?

Thanks for any insight!

Advertisement

On ARM processor the alignment of data is important. It can read/write much faster at aligned positions. So 2-byte long value should be on an even address, and a 4-byte long value should be on an address dividable by 4. There are codes for unaligned read/write but they are different, longer and slower. The compiler must find out which code has to be used, and generally it does a very good job finding it out, but a few times fails.

These failed cases always contain a reinterpret_cast. In your case this is the C-style (int*) casting. That kind of casting is typically used in loading/saving data, or receiving/sending data through network. In other places it is typically just bad design, and should be avoided.

So I recommend you to concentrate all your code that uses reinterpret_cast into one or few classes, and handle the problem there.

And you can create a function for this kind of reading/writing similar to your doLittleBigEndianConversion. Like this:

float value = doLittleBigEndianConversion(doUnalignedReading(reinterpret_cast<int*>(ucharPtr + offset));

In that function just use the memcpy trick or read the data byte by byte and assemble it with |.

Very few chipsets allow placing values at the wrong alignment.

x86 is one of the few that allowed integers to be placed at any alignment, accessing a 4-byte integer at any offset is allowed but suffers a performance penalty that is not directly visible to you. On most other chipsets that crashes.

Note that other data types, misaligned data can also cause crashes on x86, such as trying to load into XMM, YMM, or other SIMD registers.

float value = doLittleBigEndianConversion(doUnalignedReading(reinterpret_cast<int*>(ucharPtr + offset));

Icky. I disagree, because you are still operating at the wrong type of int*.

When packing and unpacking data by creating an object of the correct type, such as an int32, or float, or double, or whatever, then doing operations that do not rely on alignment such as memcpy or single byte accesses.

Often that means a packer and unpacker class that process your stream. Inside it you have a function similar to this:

float unpackFloat(unsigned char* offset) {

// Other code and static assertions ensure float is four bytes and otherwise that our processor is supported

float result;

*((unsigned char*)(&result)) = *offset;

*((unsigned char*)(&result)+1) = *offset+1;

*((unsigned char*)(&result)+2) = *offset+2;

*((unsigned char*)(&result)+3) = *offset+3;

return result;

}

Repeat with packing functions and unpacking functions for all types you care about.

There are many serialization libraries that are already written and debugged that do this for you. No need to reinvent the wheel.

Thanks to both of you for the very clear explanations!

*((unsigned char*)(&result)) = *offset;

Watch out for compiler optimisations breaking code like this (called type punning). It breaks the language aliasing rules. See this link:

http://stackoverflow.com/questions/20922609/why-does-optimisation-kill-this-function/20956250#20956250

*((unsigned char*)(&result)) = *offset;

Watch out for compiler optimisations breaking code like this (called type punning). It breaks the language aliasing rules. See this link:

http://stackoverflow.com/questions/20922609/why-does-optimisation-kill-this-function/20956250#20956250


Except this is irrelevant since the char-types are explicitly allowed to do this.

The best way to solve this problem really depends on the details.

Given what you have posted: since you have to perform big/little endian reordering anyway, you can get de-aliasing for free if you use a macro instead of a function as-long-as you write the macro to access the data byte-by-byte.

If the function adapts to the system/data then you need a corresponding no-swap macro that is hard-coded to move 4 bytes (that will be much faster than invoking memcpy for such a small amount of data).

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

I'm curious, why are you performing endian conversions?

Just about every vendor I can think of runs their ARM chips in little-endian mode, and since the end of Apple's PowerPC era, the majority of applications don't even use network byte order either.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement