Jump to content
  • Advertisement
Sign in to follow this  
Matt328

[C++] Understanding Pointers, References, etc

This topic is 3629 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I run into this a lot, ok the function wants a pointer, so do I use the * operator? No that's not compiling, ok maybe the ampersand. No its not having that either, ok nothing? There we go, don't know what that was all about, but it seems to be happy now, move on. This one needs the ampersand to be happy, wtf? Oh well, move on. Some good advice I got, 'Don't just hack sh*t, figure out what's going on and do it right.' Can someone help me confirm I have this all straight?
void someFunction(Object* theObject);

...

Object myObject;
Object* myObjectPtr;
myObjectPtr = new Object();

someFunction(&myObject);
someFunction(myObjectPtr);

delete myObjectPtr;
Strangely enough I know more about my second Object declaration than I do the first. I know the second is a pointer type and stores only an address, and from the type we know that at that address we'll find an Object. I understand the 'new' operator reserves enough memory somewhere for an Object to fit into and returns a pointer type containing the address to said memory. The first declaration is performed on the stack. I know surprisingly little about the internal representation of these types of variables beyond what our parrot of a CS teacher in college wanted us to recite on written exams. As such I don't really use them much save for relatively small, temporary object declarations. As for the function calls, the function is set up to receive a pointer type to an Object. When you pass variables to a function, the function creates a local copy of the variable to modify. When you pass a straight object and the function modifies the object, it really modifies a copy of the object. So when the function exits, the memory containing the modified object is popped from the stack, leaving the original object untouched. The first function call in my source there, creates an ad-hoc pointer to the object using the ampersand. Is this called a reference type? The second function accomplished the same thing, but just passes the pointer type straight across. This allows the original object to be changed since the function can go ahead and allocate a copy of what's been passed, its still just a memory address. When we modify the object in the function we do so through a level of indirection, using the memory address (pointer) to get straight to our object sitting out there on the heap somewhere, or if the pointer was created from an object declared on the stack using the ampersand, the object would be on the stack somewhere. I've been into C plus plus for quite some time now, and I've decided its time I stop just 'hacking sh*t' and learn what's going on at a lower level than just what I need to get by.

Share this post


Link to post
Share on other sites
Advertisement
Let me try and give an explanation.

You have objects and pointers to objects. Objects look like this:

Object object;



A pointer to an object is created like this:

Object *object;



Every piece of memory you create, has a physical location, ie: this piece of memory starts at the 512th bit of your ram. This index where the piece of memory you use is located, is retrieved using &.

So, we just created an object. This object is stored on the RAM, but where? To get the address of anything, use ampersand:

Object object;
std::cout << &object; // location of object on the RAM



Now what if we have a pointer to an object?

Object *object; // take a piece of memory and fill it with a location on the RAM
std::cout << &object; // take the _location_ of the memory that itself contains a location to somewhere on the RAM



When you have a location on the RAM, and want to get the memory it points to, we use the * operator:

Object object;
*object; // not so wise, object doesn't hold a location, but the object it self!

Object *object; // take a piece of memory and fill it with a location on the RAM
*object; // get the memory the pointer points too, neither wise since we didn't fill the memory the at the location the pointer is pointing to. but it's possible




When modifying data and passing data around, it's good to only pass the location where the piece of memory is located around. A location is much smaller to pass around.

When functions accept an Object *object, it means they want a pointer (a location pointing to somewhere on the RAM). If it takes an Object object, it want the actual data, not its location. If it wants Object &object, it also want the actual data, but this time it's a little faster since internally it passes the pointer, not the actual data, but this is transparent for an outsider, in this case.

Share this post


Link to post
Share on other sites
Quote:
When functions accept an Object *object, it means they want a pointer (a location pointing to somewhere on the RAM). If it takes an Object object, it want the actual data, not its location. If it wants Object &object, it also want the actual data, but this time it's a little faster since internally it passes the pointer, not the actual data, but this is transparent for an outsider, in this case.


I've got the second one you described straight (If it takes an Object object.)

How does the function prototype differ in the case of Object* object, and Object& object? Can you write:

void foo(Object& object);

as well as

void foo(Object* object);


and what is the difference between these? I know the latter, you can pass &object where object is an Object, and also objectPtr where objectPtr is an Object*.

Share this post


Link to post
Share on other sites
Object* and Object& are distinct types. The former is a pointer, the latter a reference. Both serve to provide referential semantics in C++ (that is, the ability to refer to something, to say "that house over there," without having to copy the entire object). The difference between a reference in a pointer in C++, besides the syntax required to access and manipulate them (direct '.' versus indirect '->' operators for example) is that reference cannot legally be null, nor can they change what they refer to after being created (and as such must be initialized, always). Pointers can be null, and can be reseated once created. Pointers require more heavy-handed syntax quirks, such as using * and & in certain contexts. References typically do not, making them more pleasing to use in many cases.

Decrius's examples are reasonable, however slightly misleading because C++ itself says nothing about physical memory whatsoever and actually has no concept of physical memory, making use entirely of its own abstract memory model concept. If you drop all his references to "physical" and replace "RAM" with "the memory your C++ program can access" you have something slightly more accurate. My point is really that while most of what he says is reasonable, you should not walk away from this with the idea that you are anywhere near operating on RAM directly. You should also not assume that pointers are integers -- they are addresses.

You may want to review this excellent and in-depth post by ToohrVyk, which I will requote here:
Quote:

Originally posted by ToohrVyk

Values



C++ programs manipulate values. Typical values include integers, such as 42, characters, such as 'x', objects of more complex types, such as std::string or std::fstream, and also instances of any class you define in your program.

Variables and References



Values need to be manipulated, which involves giving them a name. A reference is a name given to a value. Creating a reference is done like so:
Type & name = value;
Where value is usually the return value of a function or operator:
std::ostream &o = (std::cout << "Text.");
but might also be another name given to the same value:
std::ostream &o = std::cout;


Another way to create a reference is as a function argument:
void foo(Type & name)
This will cause the value which was passed as argument to be available under that name in the entire function body. This can be quite useful if you wish to modify that value.

Of course, almost any program needs to create values. In C++, values are always created using operator new (except for literals such as 42 or "Hello", which simply exist). operator new exists in several versions. The default version generates dynamic values: these values exist until delete is called:
int & value = * new int;
value = 10;
std::cout << value;
delete (&value);
This causes performance problems if used too often, because dynamic memory allocation is slow in a non-garbage-collected language like C++. This is why another, faster version exists: placement new creates a new value using existing memory. Of course, the lifetime of the value is then limited by the lifetime of the memory which was used to create it, and it must of course be manually destroyed by calling its destructor:
std::string & str = * new (existing_memory) std::string("Hello");
str += " world!";
std::cout << str;
str.~string();
// existing_memory dies somewhere after this line
The existence of placement new allows the programmer to use memory areas which are faster than the free store used by the default operator new: it may use stack memory, a specially reserved portion of global memory, or a subset of the memory used by another value, by increasing speed of allocation. For instance, using stack memory:
int & value = * new (alloca(sizeof(int))) int;


Since allocating objects on the stack, in global memory or as a subset of an object is a very repetitive task which involves obtaining the memory, computing the size to be allocated, and then calling the destructor of the object right before the memory is gone, the C++ language provides shortcuts to do this. Take for example stack allocation of a string:
void sayhello() {
std::string & str = * new (alloca(sizeof(std::string))) std::string("Hello");
str += " world!";
std::cout << str;
str.~string();
}
I have underlined the repetitive tasks that are related to stack allocation. The shortcut provided by C++ in this situation is as follows:
void sayhello() {
auto std::string str("Hello");
str += " world!";
std::cout << str;
}
The main differences here are the presence of the auto keyword, the absence of the & in the definition of the reference, and the absence of the str.~string() destructor call at the end of the function. The first two differences cause C++ to allocate a new block of memory on the stack (auto) which will be released when the function returns and initialize it with a new value from the string literal "Hello". The auto keyword also causes the compiler to generate destructor calls: the object will be automatically destroyed right before the function exits (since that is when the stack allocation is released). In this situation auto is aptly named a storage specifier.

The other storage specifier is static, which uses global memory: defining a local variable as static causes the compiler to allocate some memory for it as part of the global data span (at compile-time) and then initialize the value at that spot in memory the first time the variable's definition is encountered. Since auto is the default storage specifier, it is generally omitted from the definition, which leads to the typical variable definition that most C and C++ programmers are used to:
std::string str("Hello");
str += " world!";
std::cout << str;


Variables at global scope automatically use global storage. Values in global storage are destroyed in the reverse order of their creation when the program exits. Global variables (as opposed to local static ones) are initialized in unspecified order, and may remain uninitialized if nobody uses them.

Member variables of structures or classes use the third type of allocation: they use a bit of reserved memory that was part of the value of which they are a member, and they are initialized there as part of that value's constructor. Their destructor is automatically called as part of the value's destructor, in their reverse order of initialization. Before:

struct super
{
std::string & str;
some_memory_buffer;
super() : str(* new (some_memory_buffer) std::string) {}
~super() { str.~string(); }
};
After:
struct super 
{
std::string str;
// No explicit memory buffer
// Automatic construction
// Automatic destruction
};


To summarize: references are names given to values. C++ provides special shortcuts to create a value and bind it to a reference at the same time, with special rules about when the created value is to be destroyed.

Iterators and Sequences



A sequence is, as its name hints at, a sequence of values. It could be, for instance, the sequence 1, 2, 3, 4. Or, it could be the set of opponents in a video game, in an arbitrary order. Or, it could be the sequence of characters being read from std::cin. In short, sequences are the fundamental concept used to represent groups of objects that are to be manipulated together.

A typical representation of sequences is through iterators, which are means of iterating over a sequence (accessing its elements in order). The basic operations required to iterate are the ability to read the "current value", the ability to move on to the "next value", and the ability to determine if the end of the sequence was reached and there are no values left. For instance, pseudocode to add together values of a sequence through iteration:
a = 0
while not end-of-sequence(iter)
a += value(iter)
iter = next(iter)


Various languages provide various forms of iterators, but the three operations outlined above are always present. For instance, Objective Caml uses (it = []), List.hd it and List.tl it as the end-of-sequence, value and next operations. C++ uses it == end, *it and ++it as the end-of-sequence, value and next operations. So, C++ code to add together the values of a sequence (represented by two iterators of type Iter representing the first iterator of the sequence and the first iterator past the sequence) would be:
template<typename Iter>
int sum(Iter begin, Iter end)
{
int a = 0;
while (begin != end)
{
a += *begin;
++begin;
}
return a;
}


The begin-end representation of sequences is standard. The end iterator is the first iterator after the sequence: as such, it has no associated value, and trying to obtain the iterator after end is invalid. So, you cannot *end or ++end. You can, however, compare end with another iterator to see if that other iterator has reached the end of the sequence.

The point of using a past-the-end iterator (one that isn't in the sequence, but is actually right after it) instead of a last-element iterator (one that corresponds to the last element in the sequence) is twofold. First, it makes code more complex, because the condition "I have reached the past-the-end iterator on this iteration" is easier to evaluate than "I was working on the last iterator on the previous iteration". Second, it allows representation of empty sequences (since the empty sequence has no last element and as such could not be represented using a last-iterator approach).

An iterator which supports only *it and ++it is called a forward iterator. Some iterators also support --it (which moves to the previous element), making it a bidirectional iterator. Some iterators also support it + n and it - n, allowing the iterator to move an arbitrary number of steps backward or forward in a single jump, which is called a random access iterator. Iterators may in turn be read-only (input iterator), write-only (output iterator) or support both read and write.

Iterators are a very powerful concept: most of the things in C++ which look like sequences can be turned into iterators.

  • The sequence of elements in a vector is represented by vect.begin() and vect.end(), which are random access read-write iterators (or read-only, if the vector is constant).

  • The sequence of elements in a list is represented by list.begin() and list.end(), which are bidrectional read-write iterators (or read-only, if the list is constant).

  • The sequence of objects of type T read from an std::istream such as std::cin is represented by std::istream_iterator<T>(std::cin) and std::istream_iterator<T>() (past-the-end iterator) which are forward input iterators.

  • Adding a sequence of elements at the end of any non-associative standard library container c is represented by std::back_inserter(c), with no end iterator (this could be an infinite sequence) and is a forward output iterator.


The list goes on and on. This system is then combined with iterator-based algorithms, such as std::copy(src_begin, src_end, dest_begin), where src_begin and src_end are at least forward input iterators representing the input sequence to be copied, and dest_begin is at least a forward output iterator representing where the input sequence should be copied to. As such, reading as many integers as possible from standard input and placing them in a vector is as simple as:
std::vector<int> integers;
std::copy( std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::back_inserter(integers) );


Or, we could add together the values read on the standard input using our function above:
std::cout << sum( std::istream_iterator<int>(std::cin),
std::istream_iterator<int>() );
Just like we could add together the elements of a vector:
std::cout << sum( integers.begin(), integers.end() );


Pointers



Pointers are a concept originally introduced in C. From a semantic point of view, pointers are random access read-write iterators. As iterators, they represent sequences of contiguous elements of the same type—such sequences are created either using a vector, or using a block allocation approach such as new[] or arrays.

Given a block N objects of type T, a pointer representing that sequence is of type T*. Depending on the nature of the block, obtaining the first pointer begin in the sequence varies.
  • T *begin = new T[N]; will directly return the first pointer in the sequence.
  • If a is an array of N objects of type T then T *begin = a; will create the first pointer in the sequence of elements of a.

Otherwise, if t (of type T) is the first element of a sequence (as opposed to the first iterator of a sequence), then T *begin = &t; is the first iterator of the sequence.

The end iterator of a sequence is simply defined as T *end = begin + N;
int values[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
std::cout << sum(values, values + 10);


As with any other iterator, going beyond the bounds of a sequence (before begin or after end) results in undefined behavior, which will usually lead to random misbehavior of your program.

All of the above also holds when N = 1 (which means that a single object is also a one-object sequence, which can as such be manipulated using a pointer).

The main problem with pointers is that this is not everything. If pointers were merely iterators, then people would have no more trouble with them than they would have with the other iterators in the standard library. However, pointers were introduced in the C language as a means to represent several other concepts.

The first of these concepts is reference semantics. You see, in C, there are on references per se: every time you create a variable, it also creates a value. This makes it difficult to handle dynamic memory (because every variable you create already has its own value, so how do you give your dynamic memory a name in order to use it?) and allow functions to modify a variable in another function (because all the arguments to your function are, by default, brand new values that will disappear when the function returns, and you have no access to the values in the calling function). The solution adopted was to decide that, since pointers are read-write iterators, and every value is a one-element sequence by definition, then pointers can be used to give indirect names to values.

For instance:
/* Solve aX² + bX + c */
int find_solutions(float a, float b, float c, float *x1, float *x2)
{
float delta = b * b - 4f * a * c;
float sq;

if (delta < 0f) return 0;
sq = sqrt(delta);
*x1 = (- b - sq) / (2 * a);
*x2 = (- b + sq) / (2 * a);
return 2;
}


This function needs to return the number of solutions (zero or two), but it also needs to send back the values of those solutions. In order to do so, it uses a trick which consists in being given two iterators to one-value sequences and modifying the unique value in each sequence. Since a copy of an iterator of a sequence is a new iterator of that same sequence, the iterator allows the function to modify the original sequence—and the values inside that sequence are the values from within the calling function.
int main()
{
float x1, x2;
float *px1 = &x1; // An iterator to the one-element sequence 'x1'
float *px2 = &x2; // An iterator to the one-element sequence 'x2'

find_solutions(1f, 0f, -1f, px1, px2);
// The function call changed the contents of the sequences
// iterated by px1 and px2, namely x1 and x2. As such, the
// values of x1 and x2 have been changed.
}


Of course, C++ provides references, which is why pointers are not used for pass-by-reference in C++ at all (as references provide a far more simple representation of one-element sequences).

Another use of pointers, which remained in C++, is the representation of reseatable references. In C++, a reference is a name given to a value, and the value is bound to the name forever. It is not possible to bind the name to another value (though it is possible, of course, to bind several names to the same value). Reseating a reference means binding it to another value. While impossible with references, pointers are merely iterators, and it's possible to assign an iterator to another (thereby changing the sequence that the iterator is traversing, or the point in the sequence where the iterator is)—after all, this is the entire point of traversing a sequence with a single iterator!

Therefore, when it becomes useful to have a reference to a value, but the ability to change that value is also useful, people sometimes use a pointer (though, in modern C++, there are other types, such as shared_ptr or weak_ptr, which do a far better job). For instance, if you wish to reference an enemy's target (and enemies can change targets at will):
struct enemy
{
object *target;
}


As such, an enemy's target can be changed by changing the pointer so that it corresponds to another value instead of the original one.

A third use of pointers, introduced in C, was the point of an option type. An option type allows the possibility for a value to be absent. For instance, one could decide that sqrt returns a float option: either its argument is positive and it returns a float, or it is negative and then it returns nothing.

Instead of defining an option type distinct from an iterator type (as C++ does for all its iterators), C merged the two together, and this carried over to C++. A pointer can be constructed from the integer constant zero, and thus becomes the null pointer. The null pointer may not be used in any way, except to be compared for equality with another pointer, or to be tested as a boolean condition (a null pointer always evaluates to false, while other pointers evaluate to true).

To summarize: pointers are bidirectional read-write iterators used in C for reference semantics and reseatable reference semantics, as well as a clunky option type.

Don't worry, this is just C++'s way of saying 'hello' [smile]

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
Quote:

Originally posted by ToohrVyk
**süüüper-sniiiiiiiiiiiiiiiiiip**


Uhm, was that an gamedev-exclusive post, or one of his blog?

Share this post


Link to post
Share on other sites
Quote:
Original post by phresnel
Uhm, was that an gamedev-exclusive post, or one of his blog?


IIRC Toohrvyk originally posted it in a response on gamedev, then later reposted it on his blog.

Share this post


Link to post
Share on other sites
Quote:
Original post by rip-off
Quote:
Original post by phresnel
Uhm, was that an gamedev-exclusive post, or one of his blog?


IIRC Toohrvyk originally posted it in a response on gamedev, then later reposted it on his blog.


I see. Hab Dank!

Share this post


Link to post
Share on other sites
Thanks. Very informative, I was not even aware that references were a separate type. I thought the extent of them were using the ampersand to get the address of an object.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!