Development of Resource-intensive Applications in Visual C++
Additional ways of increasing productivity of program systemsIn the last part of this article we’d like to touch upon some more technologies which may be useful for you while developing resource-intensive program solutions. Intrinsic- functionsIntrinsic-functions are special system-dependent functions which execute actions impossible to be executed on the level of C/C++ code or which execute these actions much more efficiently. As the matter of fact they allow to avoid using inline-assembler because it is often impossible or undesirable. Programs may use intrinsic-functions for creating faster code due to the absence of overhead costs on the call of a usual function type. In this case, of course, the code’s size will be a bit larger. In MSDN the list of functions is given which may be replaced with their intrinsic-versions. For example, these functions are memcpy, strcmp etc. In Microsoft Visual C++ compiler there is a special option «/Oi» which allows to automatically replace calls of some functions with intrinsic-analogs. Beside automatic replacement of usual functions with intrinsic-variants we can explicitly use intrinsic-functions in the code. It may be used due to the following reasons:
Data blocking and alignment
Let’s look at an example: struct foo_original {int a; void *b; int c; };This structure takes 12 bytes in 32-bit mode but in 64-bit mode it takes 24 bytes. In order to make it so that this structure takes prescribed 16 bytes in 64-bit mode you should change the sequence order of fields: struct foo_new { void *b; int a; int c; };In some cases it is useful to help the compiler explicitly by defining the alignment manually in order to increase productivity. For example, SSE should be aligned at the border of 16 bytes. You can do this in the following way: // 16-byte aligned data __declspec(align(16)) double init_val [3.14, 3.14]; // SSE2 movapd instruction _m128d vector_var = __mm_load_pd(init_val);Sources "Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft Windows" [19], "Porting and Optimizing Applications on 64-bit Windows for AMD64 Architecture" [20] offer detailed review of these problems. Files mapped into memoryWith the appearance of 64-bit systems the technology of mapping of files into memory became more attractive because the data access hole increased. It may be a very good addition for some applications. Don’t forget about it. Memory mapping of files is becoming less useful with 32-bit architectures, especially with the introduction of relatively cheap recordable DVD technology. A 4 Gb file is no longer uncommon, and such large files cannot be memory mapped easily to 32-bit architectures. Only a region of the file can be mapped into the address space, and to access such a file by memory mapping, those regions will have to be mapped into and out of the address space as needed. On 64-bit windows you have much larger address space, so you may map whole file at once. Keyword __restrictOne of the most serious problems for a compiler is aliasing. When the code reads and writes memory it is often impossible at the step of compilation to determine whether more than one index is provided with access to this memory space, i.e. whether more than one index can be a "synonym" for one and the same memory space. That's why the compiler should be very careful working inside a loop in which memory is both read and written while storing data in registers and not in memory. This insufficient use of registers may influence the performance greatly. The keyword __restrict is used to make it easier for the compiler to make a decision. It "tells" the compiler to use registers widely. Keyword __restrict allows the compiler not to consider the marked pointers aliased, i.e. referring to one and the same memory area. In this case the compiler can provide more efficient optimization. Let’s look at the example: int * __restrict a; int *b, *c; for (int i = 0; i < 100; i++) { *a += *b++ - *c++ ; // no aliases exist }In this code the compiler can safely keep the sum in the register related to variable “a” avoiding writing into memory. MSDN is a good source of information about the use of __restrict keyword. SSE- instructionsApplications executed on 64-bit processors (independently of the mode) will work more efficiently if SSE-instructions are used in them instead of MMX/3DNow. This is related to the capacity of processed data. SSE/SSE2 instructions operate with 128-bit data, while MMX/3DNow only with 64-bit data. That’s why it is better to rewrite the code which uses MMX/3DNow with SSE-orientation. We won’t dwell upon SSE-constructions in this article offering the readers who may be interested to read the documentation written by developers of processor architectures. Some particular rules of using language constructions64-bit architecture gives new opportunities for optimizing the programming language on the level of separate operators. These are the methods (which have become traditional already) of “rewriting” pieces of a program for the compiler to optimize them better. Of course we cannot recommend these methods for mass use but it may be useful to learn about them. On the first place of the whole list of these optimizations is manual unrolling of the loops. The essence of this method is clear from the example: double a[100], sum, sum1, sum2, sum3, sum4; sum = sum1 = sum2 = sum3 = sum4 = 0.0; for (int i = 0; i < 100; I += 4) { sum1 += a[i]; sum2 += a[i+1]; sum3 += a[i+2]; sum4 += a[i+3]; } sum = sum1 + sum2 + sum3 + sum4;
A lot of such methods are described in "Software Optimization Guide for AMD64 Processors" [12]. |
|