passing a struct array in a function as refence... ?

Started by
25 comments, last by Zahlman 16 years, 9 months ago
Quote:Original post by SiCrane
Huh? Consider the functions:
int foo(int A[], int n) {  return A[n];}int bar(int * A, int n) {  return A[n];}


Your two functions are identical: A is a pointer in both cases. I suspect that passing an array by reference will pobably result in the same thing, since there is an additional indirection.

What would be interesting is testing the cases with an array and a pointer both on the stack (although the pointer will probably end up in a register), or a pointer and an array both being global, or an array and a pointer both members of a structure.
Advertisement
Quote:Original post by Alpha Brain
but without the vector, it isn't possible? (just want to understand the basic way first)


Well, arrays are the old way, not necessarily the basic way. An std::vector is a normal C++ object which behaves as one would expect at first glance (comparison, copy, assignment, size, pass-by-value, type definitions). By contrast, using C arrays would require that you actually learn new syntax and semantics that are unlike anything else in C++.
Quote:Original post by ToohrVyk
Your two functions are identical: A is a pointer in both cases. I suspect that passing an array by reference will pobably result in the same thing, since there is an additional indirection.

No they aren't. In foo A is a reference to an array. In bar A is a pointer. In fact, under C++ overloading rules you could even have
int foo(int A[], int n);int foo(int * A, int n);

and the compiler will be happy as a clam. However, in both cases you'll note the complete lack of the extra dereference that you claim happens with a pointer.

Quote:What would be interesting is testing the cases with an array and a pointer both on the stack (although the pointer will probably end up in a register), or a pointer and an array both being global, or an array and a pointer both members of a structure.

You have a compiler. Try it. You still won't see that extra dereference.
If you really want *an array* (i.e. you have a specific size in mind for it, for a good reason, which you know ahead of time, and don't want to change during the program execution), but you also want it to behave like an object (i.e. none of this pointer-decay nonsense - if you pass it by value, it gets passed by value), then what you want is boost::array.
Quote:Original post by SiCrane
No they aren't. In foo A is a reference to an array. In bar A is a pointer. In fact, under C++ overloading rules you could even have
int foo(int A[], int n);int foo(int * A, int n);

and the compiler will be happy as a clam.


#include <iostream>int fun(int a[]){ // <--- line 4   return sizeof(a);}	int fun(int *a){ // <--- line 9  return sizeof(a);}int main(){  int a[5];  int *b = a;  std::cout << fun(a) << fun(b) << std::endl;}


g++ 3.4.0 (-W -Wall -ansi -pedantic) replies:
test.cpp: In function `int fun(int*)':test.cpp:9: error: redefinition of `int fun(int*)'test.cpp:4: error: `int fun(int*)' previously defined here


Visual C++ 2005 Express Edition (/W4 /Za) replies:
test.cpp(9) : error C2084: function 'int fun(int [])' already has a body        test.cpp(3) : see previous definition of 'fun'


Besides, even if C++ did implement this as a pass-by-reference, you would still have problems with C compatibility (in C, the two definitions above are Equivalent). Last but not least, the int a[] version behaves like a pointer, but not like an array: it has the same size (sizeof(int*)) regardless of the number of elements, it's an lvalue, and it cannot be passed by reference to a function which expects an array to be passed by reference.

Tested on both g++ and Visual C++ (the above versions).
#include <iostream>int arr1(int a[]){   return sizeof(a); // 4}int arr2(int (&a)[5]){   return sizeof(a); // 20}int main(){  int a[5];  std::cout << arr1(a) << std::endl << arr2(a) << std::endl;}


Quote:However, in both cases you'll note the complete lack of the extra dereference that you claim happens with a pointer.


The line in bold below (from your provided listing) dereferences stack address esp-4 and loads its contents into register ecx.
	mov	eax, DWORD PTR _n$[esp-4]	mov	ecx, DWORD PTR _A$[esp-4]	mov	eax, DWORD PTR [ecx+eax*4]	ret	0


Let us compare this with a naive stack-based array:
int arr(int N) {	int a[5];	return a[N];}	mov	eax, DWORD PTR _N$[esp-4]	mov	eax, DWORD PTR _a$[esp+eax*4]	ret	0


The above uses the base address of a (which is esp) instead of loading the value of the pointer into ecx. One could argue that the pointer was passed by argument, while the array was present on the stack. It's true: you cannot compare an array to a pointer passed by argument, because the two involve different mechanisms.

Quote:You have a compiler. Try it. You still won't see that extra dereference.


I didn't have a compiler this morning, and was on my way to work anyway. Now that I'm here, let's get started. I am using VC++ 2005 EE for generating the listings, with optimization /O2.

First, a comparison of the array and pointer being part of an object. I give the corresponding assembly listings as comments in the code.
struct st{	int *b;	int a[5];};int arr(st & s, int N) {	return s.a[N];//	mov	eax, DWORD PTR _N$[esp-4]//	mov	ecx, DWORD PTR _s$[esp-4]//	mov	eax, DWORD PTR [ecx+eax*4+4]}int ptr(st & s, int N) {	return s.b[N];//	mov	eax, DWORD PTR _s$[esp-4]//	mov	ecx, DWORD PTR [eax]         <-- this one//	mov	edx, DWORD PTR _N$[esp-4]//	mov	eax, DWORD PTR [ecx+edx*4]}


As visible above, there is an additional "load contents of pointer into register" step when using a pointer, whereas the array version simply uses the base array address to access the member.

This still happens if the structure is passed by value (which only removes the initial loading of the address of s from esp-4).

Now, using a global pointer and a global array:
int a[5] = { 1, 2, 3, 4, 5 };int * b = a;int arr(int N) {	return a[N];//	mov	eax, DWORD PTR _N$[esp-4]//	mov	eax, DWORD PTR ?a@@3PAHA[eax*4]}int ptr(int N) {	return b[N];//	mov	eax, DWORD PTR _N$[esp-4]//	mov	ecx, DWORD PTR ?b@@3PAHA      <--- this one//	mov	eax, DWORD PTR [ecx+eax*4]}


Again, there's an additional dereferencing in the pointer version, but not in the array version.

One situation where the additional dereference does not occur is when working on the stack and there are enough registers to store the pointer address. Since the pointer is already in a register, there is no dereference necessary. A typical example is:

int ptr(int N) {	int *b = new int[5];// Note: 'new' loads the pointer address into 'eax' automatically	return b[N];//	mov	ecx, DWORD PTR _N$[esp]//	mov	eax, DWORD PTR [eax+ecx*4]//	add	esp, 4}


int arr(int N) {	int ar[5];	int a = std::rand(); 	int b = std::rand() % a;	int c = std::rand() % b;	int d = std::rand() % c;	N = (N + a + b + c + d) % 5;// Note: This loads N into edx	return ar[N];//	mov	eax, DWORD PTR _ar$[esp+edx*4+20]}int ptr(int N) {	int *p = new int[5];	int a = std::rand(); //	mov	DWORD PTR _p$[esp+20], eax	int b = std::rand() % a;	int c = std::rand() % b;	int d = std::rand() % c;	N = (N + a + b + c + d) % 5;// Note: This loads N into edx	return p[N];//	mov	eax, DWORD PTR _p$[esp+16] <--- here//	mov	eax, DWORD PTR [eax+edx*4]}

As shown above, the sequence of register-occupying numbers forces the compiler to place the pointer value from eax back on the stack, which forces a load back into memory. By contrast, since the address of an array is implicitly encoded in the esp pointer (by means of a constant offset), filling the registers has no impact at all.

So, I stand by my assertion that using a pointer requires an additional dereference when compared to an array, which may be optimized out if the pointer is an auto variable and can be kept on the stack.

An array reference, however, does indeed create the same dereference as a pointer (because, after all, it is a reference).

int arr(int N, int (&ar)[5]) {	return ar[N];//	mov	eax, DWORD PTR _ar$[esp-4]//	mov	ecx, DWORD PTR _N$[esp-4]//	mov	eax, DWORD PTR [eax+ecx*4]}
I concede the point. Interestingly, MSVC 2003 and 2005 generate different symbol mangling for the two types. The two foo()s would be ?foo@@YAHPAHH@Z and ?foo@@YAHQAHH@Z (note the P/Q difference) and link multiple definitions from different translation units just fine.
Quote:Original post by SiCrane
I concede the point. Interestingly, MSVC 2003 and 2005 generate different symbol mangling for the two types. The two foo()s would be ?foo@@YAHPAHH@Z and ?foo@@YAHQAHH@Z (note the P/Q difference) and link multiple definitions from different translation units just fine.


I perceive potential for great, unspeakable evil.

This topic is closed to new replies.

Advertisement