Can someone give me the exact definition/usage of Stack/Heap and Reference Types in C#?

Started by
9 comments, last by laztrezort 12 years, 6 months ago
I have gone through a million books, tutorials, videos, researched. And it's all still just giving me a headache.

I've read that alot of books/etc don't give a GOOD representation of what is actually happening. Meaning I am just reading junk that isn't entirely true.

As for what I seemingly don't understand, it's the whole concept of stack/heap. All I know is that its about memory management, that is it.

As for Reference Types, I was in the process of learning about this and then the whole stack/heap thing came into play and just confused the hell out of me.

http://pastebin.com/7mn2394j

I somewhat get that X is holding a reference to MyInt. It's just that the whole stack/heap thing has my head spinning.

Advertisement
Analogy time :-)

A stack is a way of storing data. It works exactly like a stack of plates at a buffet line. The main things you will do are "push" plates onto the top of the stack (i.e. add new plates) or "pop" plates off the stack (i.e. take some plates off the top). This is where the term comes from. However, just like a real stack of plates, you can pull one out of the middle if you really need to. More on that later.

Program stacks in most languages actually serve two purposes, and they typically interleave the data (plates) related to both purposes in the same stack space. The first purpose is the call stack. This is basically like a history of what functions you have called in the program, except instead of listing everything you've ever done, it just lists the sequence of calls that got to where you are. For instance, suppose function A calls function B, then B finishes, and after that, A calls C.

The call stack will look like this: (forgive me for making the stack horizontal instead of vertical!)

A is called: A
A calls B: A B
B finishes: A
A calls C: A C

You may also get deeper stacks; suppose C calls D which then calls E:

A C D E

But notice that B isn't on the stack here. That's because B is already done before C starts; again, the stack doesn't store a complete timeline of everything, just what is currently "active."

Each letter (function) on the call stack is referred to as a frame.


Clear as mud? :-)


The next part is related to stack variables. A stack variable is, in a nutshell, a variable that lives precisely as long as the stack frame in which it is created.

That probably doesn't mean much yet, so let's look at our call stack from a minute ago: A C D E.

Suppose in function A we have a variable Foo. In function C we have variables Bar and Baz. D has Quux, and E has Xorph. Each of these variables is created ("comes into scope") essentially when its owning function gets called.

When E exits, its stack frame goes away. This means that Xorph will go away too. When D exits, the same happens to Quux. Bar and Baz will die when C returns, and when A finally gets done with its business, Foo will go away. So this means that we can't legitimately use the value in, say, Quux from, say, function A; because in function A, either we're before the call to C (and therefore before the call to D, and therefore before the creation of the stack frame where Quux lives), or we're after the call to C (and therefore, after the call to D finishes, and thus after the destruction of the stack frame where Quux lives).

However, we could talk to Foo from any of C, D, or E; this is because Foo "outlives" all of those functions on the stack. This is a partial peek at how "out" and "ref" work in C#, for instance.



So! Hopefully that clarifies the stack a bit. Now on to the heap.

You can think of the heap as a giant pile of stuff. It is separate from the stack, and therefore the existence of things in the heap has nothing to do with stack frames or plates or buffet lines. You can just throw stuff out there, and in C#, when you're done using it, it will go away automatically. (This is what's referred to as "garbage collection.")

Why is that useful?

Well, suppose we want E to do something that A can work with. Either E has to return a value all the way back through D and then C and then up to A, or we just stuff it on the heap. Since objects on the heap can live for as long as we want to use them, we don't have to worry about the object disappearing out from under us when the stack frame of E is cleaned up.

More concretely: what if we want to do something in B and then use it from E?

Our options are limited. If we stick to doing everything on the stack, then we have to return the value from B, then pass it as a parameter to C, then D, then finally E. Obviously in a real program that's going to get messy really quick! So instead, we use the heap, and B can now share data with E without all the icky plumbing.


The final piece of the puzzle is reference types and value types. A value type is easier to think about, so we'll start there: every time you do something with a value type, you get a photocopy of it. Not the original, not instructions for how to find the original, but a copy. The copy is indistinguishable, but it's a distinct entity.

Reference types are like Post Office boxes. Instead of saying, "here's a giant package for you," we say, "here's the PO Box where you can find your package." Telling someone a PO Box number is a lot easier than photocopying the package all over the place!

In C#, class objects are always reference types. Structs, by contrast, are always value types.


So how do value/reference types and stack/heap types converge? The rule of thumb is that value types most often live on the stack, and reference types most often live on the heap. There are exceptions, but they only matter if you're really pedantic, so if your head is hurting at this point, don't worry about it for now ;-)



Hope that helps!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Postscript: I've cleaned out some unhelpful stuff from this thread. Please keep it respectful, polite, and on-topic.

Thanks all!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

"The stack" is a section of a process's memory space that is used to hold local variables and information about the functions you are calling, such as their return address. It is so named because it operates like a traditional stack ADT. Local variables are pushed onto the stack when a function is called, and popped back off when the function returns. Memory accessed this way is relative to the currently executing function, and thus its address (or rather the relative offset) is known ahead of time. This becomes important later.

"The heap" is another section of memory that contains all other data allocated by the program. It is so named because some implementations use (or used) a heap data structure to organize the memory (though that's not required, which is why it also commonly referred to as the "free store"). The precise address of the allocation is not known in advance, since there is no rigid structure to the heap and memory addresses can change each time the program runs, which means the memory can only be accessed indirectly (such as through a pointer or a reference).

In C# (and the CLR), there are two different kinds of types: reference types and value types. Value types, as you may expect from their name, act like values. They directly contain their data in a continguous block of memory. When you have a local variable that's a value type, its data is placed directly onto the stack and the various function instructions operate on that memory directly. When they are a member of another type, their data is placed inline inside the memory block for that type.

Reference types, on the other hand, always store their memory on the heap. Since accessing memory on the heap is done indirectly, the actual reference type variable only stores an address to where the actual data can be found in memory. Local variables and members that are reference types only take up enough stack memory to hold that address, which will point to a block of heap memory that contains the actual type's data. If the data of the type happens to contain other value types (such as integers, characters, or user-created value types), those will also end up located on the heap, since value type memory is accessed directly and is placed inline wherever it is declared.

There is one further concept to understand. Since .NET supports a unified type system, meaning all types derive from a common base type (System.Object) which happens to be a reference type, there must be a mechanism to allow value types to be treated as reference types for the purposes of polymorphism. This mechanism is called boxing, and works by internally creating a reference type, similar to the MyInt type in your example, that will hold the value type. This boxed type can then be treated as a normal reference type, and later "unboxed" back into a value type by copying the memory indirectly refered to by the reference type directly into a new block of value type memory, either on the stack as a local variable or on the heap as part of another reference type. Note that since this is a copy, the data in the box does not reflect any changes made to the unboxed value type. They are in separate areas of memory.
Mike Popoloski | Journal | SlimDX

I have gone through a million books, tutorials, videos, researched. And it's all still just giving me a headache.

I've read that alot of books/etc don't give a GOOD representation of what is actually happening. Meaning I am just reading junk that isn't entirely true.

As for what I seemingly don't understand, it's the whole concept of stack/heap. All I know is that its about memory management, that is it.

As for Reference Types, I was in the process of learning about this and then the whole stack/heap thing came into play and just confused the hell out of me.

http://pastebin.com/7mn2394j

I somewhat get that X is holding a reference to MyInt. It's just that the whole stack/heap thing has my head spinning.

I'll assume you're familiar with terminology such as method/function call, what a method/function is, and the concept of arguments.

Well, the stack and the heap are "places" where your computer stores data. Some of this data is created and manipulated by your code (variables!), while some of the data is taken care of by the .NET runtime. For the first part, I'm going to ignore the heap - it basically doesn't exist.

The stack stores function call information. The thing at the very, very bottom of the stack (in C#) is your application's entry point - the Main method. The reason the stack is called the stack is because new data is "stacked" on top of previous data. So if the only thing your Main method did is System.Console.WriteLine(), your stack would have two functions on it - Main, and on top of that, System.Console.WriteLine().
[attachment=5867:part1.png]

Now, let's say you're creating a Hello World program. So your Main method looks something like System.Console.WriteLine("Hello World!"). Your stack still has the functions on it, but it also has a string on the stack - so at the bottom of the stack is the function Main, then the function System.Console.WriteLine, and at the very top, the string "Hello World!".
[attachment=5868:part2.png]

This also works if you use more involved functions with more arguments - at the bottom of the stack is the function Main, then the function that you called, then all of the arguments one by one.

That's great, but most programs aren't just one function call inside Main, right? I mean, you declare variables and stuff. The thing about variables is that they also go on the stack - assuming you don't create any classes. Structs (including primitives such as int, float, double, string, byte, etc) are generally allocated on the stack, but if they're member variables of a class they are allocated on the heap - I'll touch on that later.

There's one other thing about the stack: it grows and shrinks over time. So remember the Hello World example? When the program finishes writing text to the console, the stack will only contain the function Main - both the function System.Console.WriteLine and the string argument will have been popped off the stack. If the method you just called returned a value, that value gets added to the stack where the function call used to be.
[attachment=5869:part3.png]

One way of visualizing the stack is like a tower of books (or magazines or comics :P). You start with one book - let's say it's Learn C# in 30 Days, or some variant thereof. You start reading it, and partway through it refers to another book (calls a function) about game programming. So you go and checkout the book from your local library. You stick a bookmark in Learn C# in 30 Days and put it on your floor. You start reading the game programming book, and partway through you find another interesting book about game AI (another function call!) and you go check that out from the library. So you take the game programming book, put a bookmark in it, and stack it on top of the Learn C# in 30 Days book. And so on. Eventually, you finish a book. So you take the topmost book off your stack of books - let's say it's the game AI one - and continue reading it. Eventually, you finish reading it. Then you get back to the game programming one, continue reading... and decide you want to learn physics (even more function calls). Back to the library! And your stack of books grows and shrinks over time, until at last you finish the last page of your Learn C# in 30 Days book (the Main method ends and your program exits).

Well, that's a quick overview of how the stack works. Now, onwards to the heap!

The heap is actually just a bunch of memory - that is, places where you can store data. Why do we need the heap when we have the stack? Because the stack is limited in size. In C#, the stack is (I believe) 1 megabyte per thread. By the way, that's why you get StackOverflowExceptions - there were just too many books in a pile and eventually you hit the ceiling so you couldn't pile on any more books. The heap is also limited in size, but it's much, much larger. On a computer with 2 gigabytes of RAM, the heap might be up to 1 gigabyte of memory, and if your computer had 4 gigabytes of RAM the heap might be as large as 3 gigabytes.

There's one other big difference between the stack and the heap: the heap grows, while the stack's capacity doesn't change (in C#). What this means is that if you're creating a program that manages half a megabyte of data, your heap will not be that big. On the other hand, if you have a gigabyte of art and assets, your heap is going to be fairly large. And it'll grow from it's initially small size over time as you allocate more and more memory (use up more and more of the capacity) to meet your needs - at least until the .NET framework tells it that it's already used up all 3 gigabytes (or whatever). When the heap runs of out of memory, .NET creates an OutOfMemoryException.

You use the heap for reference types. So that really means all classes. In the example you posted, MyInt is a class, and therefore uses the heap. What happens is that you allocate the memory necessary for the class on the heap (8 bytes of C#'s overhead and 4 bytes for the actual data) and create a 4-byte reference to an instance on the stack (if you're on a 64 bit machine, the overhead is 16 bytes aka 2 times 64 bits and the reference is 8 bytes aka 64 bits). Then, whenever you access the member data, the code is internally doing some pointer math... on the 4 byte reference on the stack. So that's where it gets confusing. You're doing pointer math on the stack, but the actual data you're manipulating is on the heap.

There's quite a bit more you can learn about the stack and heap. Chiefly, speed concerns, how structs and classes behave as member variables in terms of memory and speed, the .NET garbage collector and the hardware caching of memory. Hopefully my (not-so-brief) summary of the stack and heap helped you a bit; if not, just mention what you're not clear on or what confuses you and I'll do my best to clear things up or provide links to alternative ways of explaining the concept.
Thanks for the detailed explanations guys. I am slowly starting to understand it now (I'm not fully THERE yet). But after a few more readthroughs and such I guarantee I will start to get it. (I have a bad habit of having something stump the heck out of me, I give it a day or more, I come back, and it hits me like a ton of bricks.)

Once again, thanks guys.

If I'm allowed to, I'd also like to thank the guys who took the time to write this all up -- this is a really clear explanation of the theory, and although I wasn't particularly searching for this topic, I learned something valuable today (and moreover, understood it!)

Thanks guys and OP for making this thread!
Disclaimer: Each my post is intended as an attempt of helping and/or brining some meaningfull insight to the topic at hand. Due to my nature, my good intentions will not always be plainly visible. I apologise in advance and assure I mean no harm and do not intend to insult anyone, unless stated otherwise

Homepage (Under Construction)

Check my profile for funny D&D/WH FRP quotes :)

This topic is closed to new replies.

Advertisement