Jump to content

  • Log In with Google      Sign In   
  • Create Account

Can someone give me the exact definition/usage of Stack/Heap and Reference Types in C#?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 Bill Fountaine   Members   -  Reputation: 193

Like
1Likes
Like

Posted 27 October 2011 - 07:45 PM

I have gone through a million books, tutorials, videos, researched. And it's all still just giving me a headache.

I've read that alot of books/etc don't give a GOOD representation of what is actually happening. Meaning I am just reading junk that isn't entirely true.

As for what I seemingly don't understand, it's the whole concept of stack/heap. All I know is that its about memory management, that is it.

As for Reference Types, I was in the process of learning about this and then the whole stack/heap thing came into play and just confused the hell out of me.

http://pastebin.com/7mn2394j

I somewhat get that X is holding a reference to MyInt. It's just that the whole stack/heap thing has my head spinning.



Sponsor:

#2 ApochPiQ   Moderators   -  Reputation: 16392

Like
8Likes
Like

Posted 27 October 2011 - 08:10 PM

Analogy time :-)

A stack is a way of storing data. It works exactly like a stack of plates at a buffet line. The main things you will do are "push" plates onto the top of the stack (i.e. add new plates) or "pop" plates off the stack (i.e. take some plates off the top). This is where the term comes from. However, just like a real stack of plates, you can pull one out of the middle if you really need to. More on that later.

Program stacks in most languages actually serve two purposes, and they typically interleave the data (plates) related to both purposes in the same stack space. The first purpose is the call stack. This is basically like a history of what functions you have called in the program, except instead of listing everything you've ever done, it just lists the sequence of calls that got to where you are. For instance, suppose function A calls function B, then B finishes, and after that, A calls C.

The call stack will look like this: (forgive me for making the stack horizontal instead of vertical!)

A is called: A
A calls B: A B
B finishes: A
A calls C: A C

You may also get deeper stacks; suppose C calls D which then calls E:

A C D E

But notice that B isn't on the stack here. That's because B is already done before C starts; again, the stack doesn't store a complete timeline of everything, just what is currently "active."

Each letter (function) on the call stack is referred to as a frame.


Clear as mud? :-)


The next part is related to stack variables. A stack variable is, in a nutshell, a variable that lives precisely as long as the stack frame in which it is created.

That probably doesn't mean much yet, so let's look at our call stack from a minute ago: A C D E.

Suppose in function A we have a variable Foo. In function C we have variables Bar and Baz. D has Quux, and E has Xorph. Each of these variables is created ("comes into scope") essentially when its owning function gets called.

When E exits, its stack frame goes away. This means that Xorph will go away too. When D exits, the same happens to Quux. Bar and Baz will die when C returns, and when A finally gets done with its business, Foo will go away. So this means that we can't legitimately use the value in, say, Quux from, say, function A; because in function A, either we're before the call to C (and therefore before the call to D, and therefore before the creation of the stack frame where Quux lives), or we're after the call to C (and therefore, after the call to D finishes, and thus after the destruction of the stack frame where Quux lives).

However, we could talk to Foo from any of C, D, or E; this is because Foo "outlives" all of those functions on the stack. This is a partial peek at how "out" and "ref" work in C#, for instance.



So! Hopefully that clarifies the stack a bit. Now on to the heap.

You can think of the heap as a giant pile of stuff. It is separate from the stack, and therefore the existence of things in the heap has nothing to do with stack frames or plates or buffet lines. You can just throw stuff out there, and in C#, when you're done using it, it will go away automatically. (This is what's referred to as "garbage collection.")

Why is that useful?

Well, suppose we want E to do something that A can work with. Either E has to return a value all the way back through D and then C and then up to A, or we just stuff it on the heap. Since objects on the heap can live for as long as we want to use them, we don't have to worry about the object disappearing out from under us when the stack frame of E is cleaned up.

More concretely: what if we want to do something in B and then use it from E?

Our options are limited. If we stick to doing everything on the stack, then we have to return the value from B, then pass it as a parameter to C, then D, then finally E. Obviously in a real program that's going to get messy really quick! So instead, we use the heap, and B can now share data with E without all the icky plumbing.


The final piece of the puzzle is reference types and value types. A value type is easier to think about, so we'll start there: every time you do something with a value type, you get a photocopy of it. Not the original, not instructions for how to find the original, but a copy. The copy is indistinguishable, but it's a distinct entity.

Reference types are like Post Office boxes. Instead of saying, "here's a giant package for you," we say, "here's the PO Box where you can find your package." Telling someone a PO Box number is a lot easier than photocopying the package all over the place!

In C#, class objects are always reference types. Structs, by contrast, are always value types.


So how do value/reference types and stack/heap types converge? The rule of thumb is that value types most often live on the stack, and reference types most often live on the heap. There are exceptions, but they only matter if you're really pedantic, so if your head is hurting at this point, don't worry about it for now ;-)



Hope that helps!

#3 ApochPiQ   Moderators   -  Reputation: 16392

Like
0Likes
Like

Posted 27 October 2011 - 08:18 PM

Postscript: I've cleaned out some unhelpful stuff from this thread. Please keep it respectful, polite, and on-topic.

Thanks all!

#4 Mike.Popoloski   Crossbones+   -  Reputation: 2931

Like
4Likes
Like

Posted 27 October 2011 - 08:34 PM

"The stack" is a section of a process's memory space that is used to hold local variables and information about the functions you are calling, such as their return address. It is so named because it operates like a traditional stack ADT. Local variables are pushed onto the stack when a function is called, and popped back off when the function returns. Memory accessed this way is relative to the currently executing function, and thus its address (or rather the relative offset) is known ahead of time. This becomes important later.

"The heap" is another section of memory that contains all other data allocated by the program. It is so named because some implementations use (or used) a heap data structure to organize the memory (though that's not required, which is why it also commonly referred to as the "free store"). The precise address of the allocation is not known in advance, since there is no rigid structure to the heap and memory addresses can change each time the program runs, which means the memory can only be accessed indirectly (such as through a pointer or a reference).

In C# (and the CLR), there are two different kinds of types: reference types and value types. Value types, as you may expect from their name, act like values. They directly contain their data in a continguous block of memory. When you have a local variable that's a value type, its data is placed directly onto the stack and the various function instructions operate on that memory directly. When they are a member of another type, their data is placed inline inside the memory block for that type.

Reference types, on the other hand, always store their memory on the heap. Since accessing memory on the heap is done indirectly, the actual reference type variable only stores an address to where the actual data can be found in memory. Local variables and members that are reference types only take up enough stack memory to hold that address, which will point to a block of heap memory that contains the actual type's data. If the data of the type happens to contain other value types (such as integers, characters, or user-created value types), those will also end up located on the heap, since value type memory is accessed directly and is placed inline wherever it is declared.

There is one further concept to understand. Since .NET supports a unified type system, meaning all types derive from a common base type (System.Object) which happens to be a reference type, there must be a mechanism to allow value types to be treated as reference types for the purposes of polymorphism. This mechanism is called boxing, and works by internally creating a reference type, similar to the MyInt type in your example, that will hold the value type. This boxed type can then be treated as a normal reference type, and later "unboxed" back into a value type by copying the memory indirectly refered to by the reference type directly into a new block of value type memory, either on the stack as a local variable or on the heap as part of another reference type. Note that since this is a copy, the data in the box does not reflect any changes made to the unboxed value type. They are in separate areas of memory.
Mike Popoloski | Journal | SlimDX

#5 GGulati   Members   -  Reputation: 109

Like
6Likes
Like

Posted 27 October 2011 - 09:19 PM

I have gone through a million books, tutorials, videos, researched. And it's all still just giving me a headache.

I've read that alot of books/etc don't give a GOOD representation of what is actually happening. Meaning I am just reading junk that isn't entirely true.

As for what I seemingly don't understand, it's the whole concept of stack/heap. All I know is that its about memory management, that is it.

As for Reference Types, I was in the process of learning about this and then the whole stack/heap thing came into play and just confused the hell out of me.

http://pastebin.com/7mn2394j

I somewhat get that X is holding a reference to MyInt. It's just that the whole stack/heap thing has my head spinning.

I'll assume you're familiar with terminology such as method/function call, what a method/function is, and the concept of arguments.

Well, the stack and the heap are "places" where your computer stores data. Some of this data is created and manipulated by your code (variables!), while some of the data is taken care of by the .NET runtime. For the first part, I'm going to ignore the heap - it basically doesn't exist.

The stack stores function call information. The thing at the very, very bottom of the stack (in C#) is your application's entry point - the Main method. The reason the stack is called the stack is because new data is "stacked" on top of previous data. So if the only thing your Main method did is System.Console.WriteLine(), your stack would have two functions on it - Main, and on top of that,
System.Console.WriteLine()
.
part1.png

Now, let's say you're creating a Hello World program. So your Main method looks something like
System.Console.WriteLine("Hello World!")
. Your stack still has the functions on it, but it also has a string on the stack - so at the bottom of the stack is the function Main, then the function System.Console.WriteLine, and at the very top, the string "Hello World!".
part2.png

This also works if you use more involved functions with more arguments - at the bottom of the stack is the function Main, then the function that you called, then all of the arguments one by one.

That's great, but most programs aren't just one function call inside Main, right? I mean, you declare variables and stuff. The thing about variables is that they also go on the stack - assuming you don't create any classes. Structs (including primitives such as int, float, double, string, byte, etc) are generally allocated on the stack, but if they're member variables of a class they are allocated on the heap - I'll touch on that later.

There's one other thing about the stack: it grows and shrinks over time. So remember the Hello World example? When the program finishes writing text to the console, the stack will only contain the function Main - both the function System.Console.WriteLine and the string argument will have been popped off the stack. If the method you just called returned a value, that value gets added to the stack where the function call used to be.
part3.png

One way of visualizing the stack is like a tower of books (or magazines or comics :P). You start with one book - let's say it's Learn C# in 30 Days, or some variant thereof. You start reading it, and partway through it refers to another book (calls a function) about game programming. So you go and checkout the book from your local library. You stick a bookmark in Learn C# in 30 Days and put it on your floor. You start reading the game programming book, and partway through you find another interesting book about game AI (another function call!) and you go check that out from the library. So you take the game programming book, put a bookmark in it, and stack it on top of the Learn C# in 30 Days book. And so on. Eventually, you finish a book. So you take the topmost book off your stack of books - let's say it's the game AI one - and continue reading it. Eventually, you finish reading it. Then you get back to the game programming one, continue reading... and decide you want to learn physics (even more function calls). Back to the library! And your stack of books grows and shrinks over time, until at last you finish the last page of your Learn C# in 30 Days book (the Main method ends and your program exits).

Well, that's a quick overview of how the stack works. Now, onwards to the heap!

The heap is actually just a bunch of memory - that is, places where you can store data. Why do we need the heap when we have the stack? Because the stack is limited in size. In C#, the stack is (I believe) 1 megabyte per thread. By the way, that's why you get StackOverflowExceptions - there were just too many books in a pile and eventually you hit the ceiling so you couldn't pile on any more books. The heap is also limited in size, but it's much, much larger. On a computer with 2 gigabytes of RAM, the heap might be up to 1 gigabyte of memory, and if your computer had 4 gigabytes of RAM the heap might be as large as 3 gigabytes.

There's one other big difference between the stack and the heap: the heap grows, while the stack's capacity doesn't change (in C#). What this means is that if you're creating a program that manages half a megabyte of data, your heap will not be that big. On the other hand, if you have a gigabyte of art and assets, your heap is going to be fairly large. And it'll grow from it's initially small size over time as you allocate more and more memory (use up more and more of the capacity) to meet your needs - at least until the .NET framework tells it that it's already used up all 3 gigabytes (or whatever). When the heap runs of out of memory, .NET creates an OutOfMemoryException.

You use the heap for reference types. So that really means all classes. In the example you posted, MyInt is a class, and therefore uses the heap. What happens is that you allocate the memory necessary for the class on the heap (8 bytes of C#'s overhead and 4 bytes for the actual data) and create a 4-byte reference to an instance on the stack (if you're on a 64 bit machine, the overhead is 16 bytes aka 2 times 64 bits and the reference is 8 bytes aka 64 bits). Then, whenever you access the member data, the code is internally doing some pointer math... on the 4 byte reference on the stack. So that's where it gets confusing. You're doing pointer math on the stack, but the actual data you're manipulating is on the heap.

There's quite a bit more you can learn about the stack and heap. Chiefly, speed concerns, how structs and classes behave as member variables in terms of memory and speed, the .NET garbage collector and the hardware caching of memory. Hopefully my (not-so-brief) summary of the stack and heap helped you a bit; if not, just mention what you're not clear on or what confuses you and I'll do my best to clear things up or provide links to alternative ways of explaining the concept.

#6 Bill Fountaine   Members   -  Reputation: 193

Like
0Likes
Like

Posted 27 October 2011 - 10:20 PM

Thanks for the detailed explanations guys. I am slowly starting to understand it now (I'm not fully THERE yet). But after a few more readthroughs and such I guarantee I will start to get it. (I have a bad habit of having something stump the heck out of me, I give it a day or more, I come back, and it hits me like a ton of bricks.)

Once again, thanks guys.



#7 Zethariel   Members   -  Reputation: 310

Like
0Likes
Like

Posted 28 October 2011 - 02:29 AM

If I'm allowed to, I'd also like to thank the guys who took the time to write this all up -- this is a really clear explanation of the theory, and although I wasn't particularly searching for this topic, I learned something valuable today (and moreover, understood it!)

Thanks guys and OP for making this thread!
Disclaimer: Each my post is intended as an attempt of helping and/or brining some meaningfull insight to the topic at hand. Due to my nature, my good intentions will not always be plainly visible. I apologise in advance and assure I mean no harm and do not intend to insult anyone, unless stated otherwise

Homepage (Under Construction)

Check my profile for funny D&D/WH FRP quotes :)

#8 way2lazy2care   Members   -  Reputation: 782

Like
0Likes
Like

Posted 28 October 2011 - 07:44 AM

Just a side point, but the wikipedia articles on the stack and the heap are both pretty great, and they have pictures.

#9 Serapth   Crossbones+   -  Reputation: 5755

Like
2Likes
Like

Posted 28 October 2011 - 10:45 AM

The stack, when it comes to C#, is an implementation detail. It's a thing the JIT'ter uses to optimize your code and not much more.

There are some hard and fast ( and wrong ) rules about what goes on the stack, but more specifically the rules are actually what doesn't go on the stack. Namingly reference types and classes.

Value types *may* be created on the stack, as may structs, but there is actually no promise this will happen.


As to the super simple stupid explanation.


Ever been to a cafeteria where the plates are all stacked and when you go through the buffet you grab the top one? Well essentially that is what a stack is in C#. It's a chunk of memory that is set aside for ( the JIT engine to ) optimize access to certain types of variables. One of the advantages of a stack is, you know exactly where everything is, so there are no spaces and gaps and since we are dealing with value types, no unpredictability about when they will expire. This essentially means no garbage collection and very predictable locations, so no performance loss to searching or garbage collecting. This means, generally, the stack is always fastest.


Always fastest, but not always faster. Theoretically the heap can be just as fast, it all really depends on what goes on with memory. In the case of heap, instead of a stack, think of it like a giant array. As you allocate memory, "cells" of that array are used up. Eventually you use up all memory and one of two things occurs, the array is grown or garbage collection occurs, possibly both. Now as you are filling this array with memory, your performance is pretty much the same as the stack, as you are generally assigning your memory to the next top most location available.

Thing is, in time, just like your computer hardrive, this memory gets fragmented. So while the heap is continuous to start ( and therefore performing comparable to the stack ), as you allocate and free memory "holes" in the continuous array of memory locations essentially become available, so new allocations re-use old locations, or possibly are split into various different locations that have been made available. Now, when accessing your memory, you essentially don't have just a straight lookup any more, you may have multiple lookups and a search involved. Once this happens, the performance advantage of the stack becomes obvious.


Of course, as Eric Lippert said, this is all an implementation detail. You as the developer have absolutely no real control over what happens, beyond know what times explicitly will NOT be created on the stack.


The easiest ( and almost correct ) way to look at it is, the heap is memory as you generally think of it. The stack is a reserved piece of continuous memory reserved by the JIT engine for optimizing types that meet a certain criteria.




For a much more detailed explanation ( on a subject with a lot of misinformation ) from the man who is probably the second most knowledgeable person on C# read here and here.


It will basically tell you everything you needed to know.


I think a source of much of the confusion is the stack in C++, where the programmer had much more implicit control. In C# to be honest, I don't really know why they even made the distinguishment. They should have probably ignored the concept completely and made it something only the compiler developers were really aware of. Worst case scenario, for those few edge cases where the developer needed to optimize for stack usage, it could have been exposed as an attribute. Actually, there is already the stackalloc method, so even this wouldn't be need.

#10 Kobo   Members   -  Reputation: 128

Like
-1Likes
Like

Posted 28 October 2011 - 11:25 AM

I had to draw a picture of what the stack and the heap would look like after each line of assembly code (made by compiling some c code) was executed for a final one time.

Basically, the stack stuff goes away when the function goes out of scope - all of the variables it declares that don't use the static keyword disappear when it returns.

Stuff on the heap goes away when you release it (when you use delete or free()) or when the process gets judo chopped by the operating system.

Other than that there are some execution speed implications but the main rule is don't allocate and release stuff on the heap any more frequently than you have to (don't do it in loops or little functions that get called all the time).

#11 laztrezort   Members   -  Reputation: 972

Like
1Likes
Like

Posted 28 October 2011 - 12:08 PM

I think a source of much of the confusion is the stack in C++, where the programmer had much more implicit control. In C# to be honest, I don't really know why they even made the distinguishment. They should have probably ignored the concept completely and made it something only the compiler developers were really aware of. Worst case scenario, for those few edge cases where the developer needed to optimize for stack usage, it could have been exposed as an attribute. Actually, there is already the stackalloc method, so even this wouldn't be need.


Yes, this tripped me at first when moving from C++ (along with a few other memory related details) - the best thing for me (from a practical viewpoint) was to just forget about the hows and whys of memory in .net and trust the framework to take care of it all. I still get a twinge of, er, guilt maybe?, every now and then when tossing around a ton of instanced objects. Old habits, I suppose.

Not to say these things aren't important to know, and some of the wonderful replies here have filled in the rough spots of my understanding.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS