20 issues of porting C++ code on the 64-bit platform
10. Serialization and data exchangeAn important point during the port of a software solution on the new platform is succession to the existing data exchange protocol. It is necessary to provide the read of the existing projects formats, to carry out the data exchange between 32-bit and 64-bit processes etc. Mostly the errors of this kind consist in the serialization of memsize types and data exchange operations using them. 1) size_t PixelCount; fread(&PixelCount, sizeof(PixelCount), 1, inFile); 2) __int32 value_1; SSIZE_T value_2; inputStream >> value_1 >> value_2; 3) time_t time; PackToBuffer(MemoryBuf, &time, sizeof(time));In all the given examples there are errors of two kinds: the use of types of volatile size in binary interfaces and ignore of the byte order. The use of types of volatile size It is unacceptably to use types which change their size depending on the development environment in binary interfaces of data exchange. In C++ language all the types don’t have distinct sizes and consequently it is not possible to use them all for these aims. That’s why the developers of the development means and programmers themselves develop data types which have an exact size such as __int8, __int16, INT32, word64 etc. The use of such types provides data portability between programs on different platforms although it needs the use of odd ones. The three shown examples are written inaccurately and this will show up on the changing of the capacity of some data types from 32-bit to 64-bit. Taking into account the necessity to support old data formats the correction may look as follows. 1) size_t PixelCount; __uint32 tmp; fread(&tmp, sizeof(tmp), 1, inFile); PixelCount = static_cast 2) __int32 value_1; __int32 value_2; inputStream >> value_1 >> value_2; 3) time_t time; __uint32 tmp = static_cast<__uint32>(time); PackToBuffer(MemoryBuf, &tmp, sizeof(tmp));But the given variant of correction cannot be the best. During the port on the 64-bit system the program may process a large number of data and the use of 32-bit types in the data may become a serious obstacle. In this case we may leave the old code for compatibility with the old data format having corrected the incorrect types, and fulfill the new binary data format taking into account the errors made. One more variant is to refuse binary formats and take text format or other formats provided by various libraries. Ignoring of the byte order Even after the correction of volatile type sizes you may face the incompatibility of binary formats. The reason is a different data presentation. Most frequently it is related to a different byte order. The byte order is a method of recording of bytes of multibyte numbers (see also picture 4). The little-endian order means that the recording begins with the lowest byte and ends with the highest one. This record order was acceptable in the memory of PCs with x86-processors. The big-endian order – the recording begins with the highest byte and ends with the lowest one. This order is a standard for TCP/IP protocols. That’s why the big-endian byte order is often called the network byte order. This byte order is used by processors Motorola 68000, SPARC. ![]() Picture 4. Byte order in a 64-bit type on little-endian and big-endian systems. While developing the binary interface or data format you should remember about the byte order. If the 64-bit system on which you are porting a 32-bit application has a different byte order you’ll just have to take it into account in your code. For conversion between the big-endian byte order and the little-endian one you may use functions htonl(), htons(), bswap_64 etc.
If you use bit fields you should keep in mind that the use of memsize types will cause the change of sizes of structures and alignment. For example, the structure shown further will have size 4 bytes on the 32-bit system and 8 bytes on the 64-bit one.
struct MyStruct {
size_t r : 5;
};
But our attention to bit fields is not limited by that. Let’s take a delicate example.
struct BitFieldStruct {
unsigned short a:15;
unsigned short b:13;
};
BitFieldStruct obj;
obj.a = 0x4000;
size_t addr = obj.a << 17; //Sign Extension
printf("addr 0x%Ix\n", addr);
//Output on 32-bit system: 0x80000000
//Output on 64-bit system: 0xffffffff80000000
Pay attention that if you compile the example for the 64-bit system there is a sign extension in "addr = obj.a << 17;" expression, in spite of the fact that both variables addr and obj.a are unsigned. This sign extension is caused by the rules of type conversion which are used in the following way (see also picture 5):
![]() Picture 5. Expression calculation on different systems. So be attentive while working with bit fields. To avoid the described effect in our example we can simply use the explicit conversion from obj.a type to size_t type.
...
size_t addr = size_t(obj.a) << 17;
printf("addr 0x%Ix\n", addr);
//Output on 32-bit system: 0x80000000
//Output on 64-bit system: 0x80000000
12. Pointer address arithmeticThe first example. unsigned short a16, b16, c16; char *pointer; … pointer += a16 * b16 * c16;This example works correctly with pointers if the value of "a16 * b16 * c16" expression does not exceed UINT_MAX (4Gb). Such code could always work correctly on the 32-bit platform for the program has never allocated arrays of large sizes. On the 64-bit architecture the size of the array exceeded UINT_MAX items. Suppose we would like to shift the pointer value on 6.000.000.000 bytes and that’s why variables a16, b16 and c16 have values 3000, 2000 and 1000 correspondingly. While calculating "a16 * b16 * c16" expression all the variables according to C++ rules will be converted to int type and only then their multiplication will occur. During the process of multiplication an overflow will occur. The incorrect expression result will be extended to ptrdiff_t type and the calculation of the pointer will be incorrect. One should take care to avoid possible overflows in pointer arithmetic. For this purpose it’s better to use memsize types or the explicit type conversion in expressions which carry pointers. Using the explicit type conversion we can rewrite the code in the following way. short a16, b16, c16; char *pointer; … pointer += static_castIf you think that only those inaccurate programs which work on larger data sizes face troubles we have to disappoint you. Let’s look at an interesting code for working with an array containing only 5 items. The second example works in the 32-bit variant and does not work in the 64-bit one.
int A = -2;
unsigned B = 1;
int array[5] = { 1, 2, 3, 4, 5 };
int *ptr = array + 3;
ptr = ptr + (A + B); //Invalid pointer value on 64-bit platform
printf("%i\n", *ptr); //Access violation on 64-bit platform
Let’s follow how the calculation of "ptr + (a + b)" expression develops:
In a 64-bit program 0xFFFFFFFFu value will be added fairly to the pointer and the result will be that the pointer will be outbound of the array. And while getting access to the item of this pointer we’ll face troubles. To avoid the shown situation, as well as in the first case, we advise you to use only memsize types in pointer arithmetic. Here are two variants of the code correction: ptr = ptr + (ptrdiff_t(A) + ptrdiff_t(B)); ptrdiff_t A = -2; size_t B = 1; ... ptr = ptr + (A + B);You may object and offer the following variant of the correction: int A = -2; int B = 1; ... ptr = ptr + (A + B);Yes, this code will work but it is bad due to some reasons:
13. Arrays indexingThis kind of errors is separated from the others for better structuring of the account because indexing in arrays with the use of square brackets is just a different record of address arithmetic observed before. In programming in language C and then C++ a practice formed to use in the constructions of the following kind variables of int/unsigned types: unsigned Index = 0; while (MyBigNumberField[Index] != id) Index++;But time passes and everything changes. And now it’s a high time to say - do not do so anymore! Use for indexing (large) arrays memsize types. The given code won’t process in a 64-bit program an array containing more than UINT_MAX items. After the access to the item with UNIT_MAX index an overflow of the variable Index will occur and we’ll get infinite loop. To persuade you entirely in the necessity of using only memsize types for indexing and in the expressions of address arithmetic, I’ll give the last example.
class Region {
float *array;
int Width, Height, Depth;
float Region::GetCell(int x, int y, int z) const;
...
};
float Region::GetCell(int x, int y, int z) const {
return array[x + y * Width + z * Width * Height];
}
The given code is taken from a real program of mathematics simulation in which the size of RAM is an important source, and the possibility to use more than 4 Gb of memory on the 64-bit architecture improves the calculation speed greatly. In the programs of this class one-dimensional arrays are often used to save memory while they participate as three-dimensional arrays. For this purpose there are functions alike GetCell which provide access to the necessary items. But the given code will work correctly only with the arrays containing less than INT_MAX items. The reason for that is the use of 32-bit int types for calculation of the items index.
Programmers often make a mistake trying to correct the code in the following way:
float Region::GetCell(int x, int y, int z) const {
return array[static_cast
They know that according to C++ rules the expression for calculation of the index will have ptrdiff_t type and hope to avoid the overflow with its help. But the overflow may occur inside the sub-expression "y * Width" or "z * Width * Height" for int type is still used to calculate them.
If you want to correct the code without changing types of the variables participating in the expression you may use the explicit type conversion of every variable memsize type:
float Region::GetCell(int x, int y, int z) const {
return array[ptrdiff_t(x) +
ptrdiff_t(y) * ptrdiff_t(Width) +
ptrdiff_t(z) * ptrdiff_t(Width) *
ptrdiff_t(Height)];
}
Another solution is to replace types of variables with memsize type:
typedef ptrdiff_t TCoord;
class Region {
float *array;
TCoord Width, Height, Depth;
float Region::GetCell(TCoord x, TCoord y, TCoord z) const;
...
};
float Region::GetCell(TCoord x, TCoord y, TCoord z) const {
return array[x + y * Width + z * Width * Height];
}
14. Mixed use of simple integer types and memsize typesMixed use of memsize and non-memsize types in expressions may cause incorrect results on 64-bit systems and be related to the change of the input values rate. Let’s study some examples.
size_t Count = BigValue;
for (unsigned Index = 0; Index != Count; ++Index)
{ ... }
This is an example of an eternal loop if Count > UINT_MAX. Suppose this code worked on 32-bit systems with the range less than UINT_MAX iterations. But a 64-bit variant of the program may process more data and it can demand more iterations. As far as the values of the variable Index lie in range [0..UINT_MAX] the condition "Index != Count" will never be executed and this causes the infinite loop.
Another frequent error is a record of the expressions of the following kind: int x, y, z; intptr_t SizeValue = x * y * z;Similar examples were discussed earlier when during the calculation of values with the use of non-memsize types an arithmetic overflow occurred. And the last result was incorrect. Search and correction of the given code is made more difficult because compilers do not show any warning messages on it as a rule. From the point of view of C++ language this is absolutely correct construction. Several variables of int type are multiplied and after that the result is implicitly converted to intptr_t type and assignment occurs. Let’s give an example of a small code which shows the danger of inaccurate expressions with mixed types (the results are got with the use Microsoft Visual C++ 2005, 64-bit compilation mode). int x = 100000; int y = 100000; int z = 100000; intptr_t size = 1; // Result: intptr_t v1 = x * y * z; // -1530494976 intptr_t v2 = intptr_t(x) * y * z; // 1000000000000000 intptr_t v3 = x * y * intptr_t(z); // 141006540800000 intptr_t v4 = size * x * y * z; // 1000000000000000 intptr_t v5 = x * y * z * size; // -1530494976 intptr_t v6 = size * (x * y * z); // -1530494976 intptr_t v7 = size * (x * y) * z; // 141006540800000 intptr_t v8 = ((size * x) * y) * z; // 1000000000000000 intptr_t v9 = size * (x * (y * z)); // -1530494976It is necessary that all the operands in such expressions have been converted to the type of larger capacity in time. Remember that the expression of the kind intptr_t v2 = intptr_t(x) * y * z;does not promise the right result. It promises only that "intptr_t(x) * y * z" expression will have intptr_t type. The right result shown by this expression in the example is good luck caused by a particular compiler version and occasional process. The order of the calculation of an expression with operators of the same priority is not defined. To be more exact, the compiler can calculate sub-expressions in such an order which it considers to be more efficient even if sub-expressions cause (side effect). The order of the appearing of side effects is not defined. Expressions including communicative and association operations (*, +, &, |, ^), may be converted in a free way even if there are brackets. To assign the strict order of the calculation of the expression it is necessary to use the explicit temporary variable. That’s why if the result of the expression should be of memsize type, only memsize types must participate in the expression. The right variant: intptr_t v2 = intptr_t(x) * intptr_t(y) * intptr_t(z); // OK!Notice. If you have a lot of integer calculations and control over the overflows is an important task for you we offer to pay your attention to SafeInt class, the realization and description of which can be found in MSDN. Mixed use of types may occur in the change of the program logic.
ptrdiff_t val_1 = -1;
unsigned int val_2 = 1;
if (val_1 > val_2)
printf ("val_1 is greater than val_2\n");
else
printf ("val_1 is not greater than val_2\n");
//Output on 32-bit system: "val_1 is greater than val_2"
//Output on 64-bit system: "val_1 is not greater than val_2"
On the 32-bit system the variable val_1 according to C++ rules was extended to unsigned int and became value 0xFFFFFFFFu. As a result the condition "0xFFFFFFFFu > 1" was executed. On the 64--bit system it’s just the other way round - the variable val_2 is extended to ptrdiff_t type. In this case the expression "-1 > 1" is checked. On picture 6 the occurring changes are shown sketchy.
![]() Picture 6. Changes occurring in the expression. If you need to return the previous behavior you should change the variable val_2 type.
ptrdiff_t val_1 = -1;
size_t val_2 = 1;
if (val_1 > val_2)
printf ("val_1 is greater than val_2\n");
else
printf ("val_1 is not greater than val_2\n");
|
|