C: Correct my basic thinking on arrays and pointers

Started by
7 comments, last by Aldacron 18 years, 4 months ago
This is for C. I'm working through the theory. Tell me if this right or wrong and feel free to add any helpful detail. Thanks. If I statically create an array, say int a[5] = {1,2,3,4,5}; it exists on the stack and differs from a pointer ONLY in that the array address cannot be changed or a value assigned to it. The actual pointer address itself (not the address contained in the pointer) is itself fixed on the stack and not moveable. I can only alter what it references by changing the value(address) in the pointer. If I dynamically allocate a pointer via malloc, it exists on the heap and is totally equivalent to an array. If I want an array on the heap, I have to create it via pointers. In essence, there are really only pointers on the heap. Is this right or badly twisted and messed up?
Advertisement
Your thinking is almost correct.

The address of an array could not be changed by you; at the moment of allocation the address is given and fixed, regardless whether it is allocated on the stack or else on the heap. If you allocate a pointer to an array, also the address of the pointer is not changeable. You could only change the "content" of the array or of the pointer. Say you could assign the (also fixed) address of another array to a pointer, so that the content of the pointer refers to the other array.

Doing an access by using a pointer means that the compiler generates code getting the address of the pointer, getting the content at that address (in general this is called "pointer dereferencing"), and interpreting it as the base address of the array.

If you allocate an array by malloc, then it is allocated on the heap, yes. You assign the address malloc returns to a pinter variable. This variable may be part of the stack or else the heap, both is possible (e.g. you could malloc an array of pointers to arrays, so pointers need also be exist on the heap).

BTW: I don't know whether "static creation" is the correct word for what you do there. You use C, I know, but I wanted to mention that at least in C++ "static" is an own storage class. There a static array would _not_ reside on the stack.
Further, considering the expression a[5] = {1,2,3,4,5}, a has no physical location of its own on the stack as, say, a strict pointer to a[0] would have. Though &a yields an address which is the same as a[0], the fact that a yields a different "value" than a[0] when evaluated demonstrates that a is more akin to a "virtual" pointer (i.e. one without its own location) than to anything else.

Yes or no?
Local variables exist on the stack, so that array will exist on the stack if it's declared as a local variable (that is, inside of a function). If it's declared as a global variable and initialized, then it will most likely be compiled into the data section of the executable file itself. If it's not initialized, it will most likely end up as part of the .bss section and the operating system's loader will supply the memory for it. If it's declared as a local variable but with the static keyword, if it's initialized then it will most likely end up like an initialized global variable, if it's not, then like an uninitialized global variable.

The heap is a block of memory doled out in chunks of various sizes. The value returned by malloc is a pointer. It points to a chunk of memory as long as however many bytes requested. You can store what ever kind of data you want in this chunk, integers, floats, pointers etc.

"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Quote:Original post by haegarr
Your thinking is almost correct.

The address of an array could not be changed by you; at the moment of allocation the address is given and fixed, regardless whether it is allocated on the stack or else on the heap. If you allocate a pointer to an array, also the address of the pointer is not changeable. You could only change the "content" of the array or of the pointer. Say you could assign the (also fixed) address of another array to a pointer, so that the content of the pointer refers to the other array.

Doing an access by using a pointer means that the compiler generates code getting the address of the pointer, getting the content at that address (in general this is called "pointer dereferencing"), and interpreting it as the base address of the array.

If you allocate an array by malloc, then it is allocated on the heap, yes. You assign the address malloc returns to a pinter variable. This variable may be part of the stack or else the heap, both is possible (e.g. you could malloc an array of pointers to arrays, so pointers need also be exist on the heap).

Thanks for the reply.

Are there even "arrays" on the heap or just pointers for all practical purposes? I'm really trying to nail down the behavioral differences between these things on the stack vs. the heap.

Quote:Original post by Ned_K
Further, considering the expression a[5] = {1,2,3,4,5}, a has no physical location of its own on the stack as, say, a strict pointer to a[0] would have. Though &a yields an address which is the same as a[0], the fact that a yields a different "value" than a[0] when evaluated demonstrates that a is more akin to a "virtual" pointer (i.e. one without its own location) than to anything else.

If instantiating an array "in place", then there is no explicit pointer to the array needed. Accessing a in your example above could be understood as accessing the whole array. Accessing a then means to access the i-th part of the array denoted by a.

If the a would be a pointer, you have the same: Accessing the a means the whole array, so that a still means the i-th component. However, you have the additional possibility to access a[0] by pointer dereferencing, so that a->b becomes the same as a[0].b! That is because in C/C++ there is no distinction between pointers to a single instance or pointers to an array of instances. In other words, C/C++ could be seen handling a single instance as an array of size 1.
Quote:Original post by Ned_K
Further, considering the expression a[5] = {1,2,3,4,5}, a has no physical location of its own on the stack as, say, a strict pointer to a[0] would have. Though &a yields an address which is the same as a[0], the fact that a yields a different "value" than a[0] when evaluated demonstrates that a is more akin to a "virtual" pointer (i.e. one without its own location) than to anything else.

Yes or no?


The address of a is not necessarily the same as the content stored as the first element of a. &a != a[0] - unless it was assigned earlier, a[0] = &a and that would produce a compiler warning because a[0] holds an int and &a would be a pointer to a pointer to int.

By itself, "a" should yield the address where the memory holding the contents begins, that is, a == &a[0]

Experiment with an array and printf

printf("%p\n", a);
printf("%p %d %p %d\n", &a[0], a[0], a+0, *(a+0));
printf("%p %d %p %d\n", &a[1], a[1], a+1, *(a+1));
printf("%p %d %p %d\n", &a[2], a[2], a+2, *(a+2));
printf("%p %d %p %d\n", &a[3], a[3], a+3, *(a+3));
printf("%p %d %p %d\n", &a[4], a[4], a+4, *(a+4));

You'll start to see that each element of the array is stored at a location 4 bytes greater than where the previous element was stored.


[Edited by - LessBread on November 29, 2005 5:12:35 AM]
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Quote:Original post by Ned_K
Quote:Original post by haegarr
...

Thanks for the reply.

Are there even "arrays" on the heap or just pointers for all practical purposes? I'm really trying to nail down the behavioral differences between these things on the stack vs. the heap.

If you do a
int* myArray = (int*)malloc(...);
then you will have a variable myArray on the stack, and an array of ints on the heap. The variable myArray will have the address of the array as contents, so that dereferencing myArray will yield in the array itself.

If seen from the ouside, there is no real difference whwether you have the array on the stack or else on the heap. However, the difference comes into play if you think to use the array for a longer time. Arrays on the stack will, like any other thing on the stack, be destroyed if the scope (e.g. the function you're inside when the array was allocted) is finished (say the function has ended). The array will remain only if allocated on the heap. However, you still have to take care that a pointer to array will survive, or your array gets lost (that is then called a memory leak since there is no chance to get it back).

The other thing is that you should not allocate too big arrays on the stack. The stack's purpose is to hold a little count of local variables, and not to hold a big array.

So whether to allocate an array on the stack or else on the heap depends on the particular intention.
What all of the above posts boil down to is that you have to get it clear in your head that the pointer to a chunk of memory and the data stored in that memory are two separate things. Pointers don't care at all about data stored in memory, or whether the memory pointed to is on the stack or the heap. The only things pointers care about are memory addresses.

To understand the difference between the stack and the heap, it's easiest to think of it in terms of management. In terms of allocations and deallocations, the system is responsible for managing the stack, but you are responsible for managing the heap.

void myFunc(void){   // the memory block that holds the array values of 'a' exists on the stack   int a[5] = {1, 2, 3, 4, 5};   // the memory block that holds the array values of 'b' exists on the heap   int *b = (int*)malloc(sizeof(int) * 5);   for(int i=0; i<5; ++i)      b = i + 1;   // 'c', like 'b', is a pointer. Because array and pointer semantics are the   // same in C, 'a' is also a pointer. The actual values of 'a','b', and 'c'   // are memory addresses - they know nothing of the array contents. The   // pointer values (addresses) exist, in this case, on the stack. The   // following line initializes c to the value of a:   int *c = a;   printf("a is %p\nc is %p\n", a, c);   for(int i=0; i<5; ++i)      printf("a[%d] = %d; c[%d] = %d\n", i, a, i, c);   c = b;   printf("b is %p\nc is %p\n", b, c);   for(int i=0l i<5; ++i)      printf("b[%d] = %d; c[%d] = %d\n", i, b, i, c);   free(b);}


Although a and b were created differently, stack vs heap, the semantics are the same. All array variables in C are effectively pointer even though they aren't always declared with pointer syntax. The word 'pointer' is just a fancy name for a variable that stores a memory address. The word 'array' is just a fancy name for a variable that stores a memory address which happens to be one of a sequential series of addresses containing data of the same type (loosely anyway, as structs could meet that definition also).

In the above function, the only difference between stack and heap is that you must explicitly free the heap allocated memory whereas the system will handle cleaning up the stack memory for you. But there is a gotcha here, as the following code illustrates:

int *myFuncA(void){   int a[5] = {1, 2, 3, 4, 5};   return a;}int *myFuncB(void){   int *b = (int*)malloc(sizeof(int) * 5);   for(int i=0; i<5; ++i)      b = i + 1;   return b;}


This demonstrates the biggest difference between the stack and the heap. The pointer returned by myFuncB is guaranteed to be valid until it is freed later in the application or until you explicitly change it to point to something else or overwrite the contents of the memory it points to. What I mean by valid is not just that the address never changes, but that the value stored in the memory pointed to will not change either. myFuncA, on the other hand, is not guaranteed to be valid. The address will remain the same for as long as you hold on to it, but the value stored at the address can change at any time as the stack grows and shrinks. When myFuncA exits, the stack will shrink. You will still be able to use the return value without trouble, but once you start calling other functions you are swimming depep in the waters called undefined behavior. Most likely, the worst that will happen is you will get unexpected results.

The stack and the heap can also affect how you manipulate struct objects. I've knocked up this little example to demonstrate some of the differences to be aware of. I hope it helps shed some light on the topic (and that I'm tackling the right problem - if this isn't what you're after, oops!). You should be able to copy paste this and compile it fine. GCC (MingW, CygWin) will give you at least one warning about returning a local address.

#include <stdio.h>typedef struct{	int a, b, c;} MyStruct;// returns a pointer to a heap allocated objectMyStruct *getMyStructA(void){	MyStruct *ms = (MyStruct*)malloc(sizeof(MyStruct));	ms->a = 1;	ms->b = 2;	ms->c = 3;	return ms;}// returns a pointer to an object allocated on the local stackMyStruct *getMyStructB(void){	MyStruct ms = {4, 5, 6};	return &ms;}// returns a copy of an object allocated on the local stackMyStruct getMyStructC(void){	MyStruct ms = {7, 8, 9};	return ms;}int main(int argc, char **argv){	// heap allocated object - the 4 bytes which store the value of msA exist	// on the stack in main, but memory that msA references exists on the heap 	MyStruct *msA = getMyStructA();	printf("contents of object located at memory address (%p): %d %d %d\n", msA, msA->a, msA->b, msA->c);		// stack allocated object - the 4 bytes which store the value of msB exist	// on the stack in main, but memory that msB references exists in a location	// of stack memory that is beyond the stack pointer once getMyStructB returns.	// The memory will still return the expected results if accessed immediately,	// but once other functions are called the values will be clobbered.	MyStruct *msB = getMyStructB();	printf("contents of object located at memory address (%p): %d %d %d\n", msB, msB->a, msB->b, msB->c);		// stack allocated copy - msC is allocated on the stack locally, and the call	// to getMyStructC returns a copy of the stack object from that function. Because	// msC is a copy and exists locally, the values will be valid until they are	// changed or the function exists, no matter what happens to the area of the	// stack where the original object existed.	MyStruct msC = getMyStructC();	printf("contents of object located at memory address (%p): %d %d %d\n", &msC, msC.a, msC.b, msC.c);	// the values of msB have changed because the contents of the memory the pointer	// points to have changed, even though the pointer itself has not	printf("contents of object located at memory address (%p): %d %d %d\n", msB, msB->a, msB->b, msB->c);		// the values of msA have not changed since it is on the heap	printf("contents of object located at memory address (%p): %d %d %d\n", msA, msA->a, msA->b, msA->c);		// for kicks, allocate a new msA	free(msA);	msA = getMyStructA();	printf("contents of object located at memory address (%p): %d %d %d\n", msA, msA->a, msA->b, msA->c);		// for more kicks, print out the memory addresses of all fields in msA. Note that the address	// of msA and the address of &msA->a are the same and that the other addresses follow sequentially.	printf("msA starts at: (%p)\nmsA->a: (%p)\nmsA->b: (%p)\nmsA->c: (%p)\n", msA, &msA->a, &msA->b, &msA->c); 	        // don't forget to free msA - even though it will be freed at app exit by the OS, it's good practice.        free(msA);	return 0;	}

This topic is closed to new replies.

Advertisement