Jump to content

  • Log In with Google      Sign In   
  • Create Account


Like
23Likes
Dislike

Data Structures for Pre-College Programmers: Arrays, Dynamic Arrays, and Linked Lists

By Bryan Wagstaff (frob) | Published Mar 29 2013 03:58 AM in General Programming
Peer Reviewed by (Alpha_ProgDes, Dragonsoulj, jbadams)


Introduction

Many of the beginners on the site are pre-college students. Often beginners will learn by reading tutorials on the Internet, copying code from books, and trying out things that they find interesting.

This article is part of a series that focuses on giving pre-college developers the basics they need to understand data structures.

This article is about the first three basic data structures: the array; the dynamic array; and the linked list.

The Array

When you need one object, you create one object. When you need several objects... well, you have several options.

I have seen many beginners with code like the following:
// High Scores
int score1 = 0;
int score2 = 0;
int score3 = 0;
int score4 = 0;
int score5 = 0;
This gives you five high score values. This method works just fine until you need fifty or a hundred or a thousand of something.

Rather than creating your objects individually you can use an array.
// High Scores
const int NUM_HIGH_SCORES = 5;
int highScore[NUM_HIGH_SCORES] = {0};

This will create a buffer of 5 items, like so:

Attached Image: ds_array.PNG

Note that the array index starts at zero. If you have five items, then they range from zero to four.

Drawbacks of a Simple Array


If all you want is a fixed number of objects an array can work just fine.

Let's say you want to add an item to the array. You can't do that with a simple array.

Let's say you want to remove an item from the array. You can't do that with with a simple array.

You are stuck with the same size of items for the duration of the object.

What we need is an array that can change size. This brings us to...

The Dynamic Array


A dynamic array is an array that can change sizes.

Major languages support dynamic arrays in their standard libraries. In C++ it is a vector. In Java it is an ArrayList. In C# it is a List. All of these are dynamic arrays.

Under the hood a dynamic array is a simple array, but it also has two extra pieces of data. It stores how big the simple array actually is, and it stores how much data the simple array can actually store.

A dynamic array might look something like this:

// Internals of a dynamic array class
sometype *internalArray;
unsigned int currentLength;
unsigned int maxCapacity;

The internalArray member points to a dynamically allocated buffer. That actual size of the buffer is stored in maxCapacity. The number of items actually used is represented with currentLength.

Adding to a Dynamic Array


When you add an object to a dynamic array several things happen.

The array class verifies that there is enough space. If currentLength < maxCapacity then there is room to add to the array. If there is not enough space then a larger internal array is allocated, and everything is copied over to th new internal array. The maxCapacity value is increased to the new larger size.

When there is enough space, the new item is added. Every element after the insertion point must be copied over one space in the internal array, and after the copies are made then the hole is filled with your new object, then the currentLength value is incremented.

Attached Image: ds_arrayAdd.PNG

Since every object after the insertion point must be moved, the best case is when you add to the end; zero objects must be moved (unless the internal array must be expanded). A dynamic array works best when you only add to the end rather than adding to the middle.

Note:  When you add an object to a dynamic array it is possible for every object to move in memory. For languages like C and C++, adding to a dynamic array means ALL pointers to array objects become invalid.



Removing from a Dynamic Array


Removing objects is less work than adding objects.

First, the object itself is destroyed. Second, every object after that location is moved over one space. Finally, the currentLength is decreased.

Attached Image: ds_arrayRemove.PNG

Just like adding to the end of the array, removing from the end of the array is the best case because zero objects need to be moved.

Also note that we don't need to resize the internal array to make it smaller. The space can hang around in case we add objects later.

Note:  Removing an object to a dynamic array causes everything after the removed item to move in memory. For languages like C and C++, removing from a dynamic array means pointers to everything after the removed object become invalid.



Drawbacks of Dynamic Arrays


Let's say your array is very large, and you need to add and remove objects frequently. This can cause objects to be frequently copied to new locations, and it can cause many pointers to be invalidated.

If you need to make frequent changes in the middle of a dynamic array, there is a better type of linear data structure for you...

Linked Lists


An array is a continuous block of memory, each item after the other. A linked list is a chain of objects.

Again, the major languages have linked lists in their standard libraries. In C++ it is called a list. In Java and C# it is called a LinkedList.

A linked list is made of a series of nodes. Each node is basically made like this:

// Linked list node
sometype data;
Node* next;

That generates this type of structure:

Attached Image: ds_linkedlist.PNG

Each node chains to the next.

Adding to a Linked List


When an object is added to a linked list it starts by creating a new node. Your data is copied inside the node.

Next, the insertion point is found. The new node's pointer to the next object is updated to point to the node that will follow it.

Finally, the node before the new node is modified to point to your new node.

Attached Image: ds_linkedlistAdd.PNG

Removing From a Linked List


When an object is removed from a linked list, the node before the removed node is found. It is altered to point to the removed object's next node. Then the removed object can be safely deleted.

Attached Image: ds_linkedlistRemove.PNG

Benefits of a Linked List


The biggest benefit of a linked list comes when adding and removing objects from the list. Making changes to the middle of the list is very quick. Remember that a dynamic array potentially caused every item to move, a linked list keeps every other object in place.

Drawbacks of a Linked List


Recall that a dynamic array is a continuous block of memory. If you need to get the 500th item in the array the code can simply look 500 spaces ahead. In a linked list the memory is chained together. If you need to find the 500th item, you must start at the beginning of the chain and follow its pointer to the next item. Then you need to follow the pointer in the next node. Then you need to follow the pointer in the next node. Repeat 500 times. Random access to a linked list is very slow.

The other major drawback of a linked list isn't quite as obvious. Each node requires a little extra space. How much space does it require? You might think it only requires the size of a pointer, but that isn't quite right.

There is a little bit of overhead every time an object is created dynamically. Some languages, like C++, work with pages of memory. Typically a page is 4 kilobytes. When you use the default new and delete operators a full page of memory will be allocated even if you will only use a single byte. Both Java and C# were designed differently and have special rules for small objects. These languages do not require a full 4kB memory page, but they do have a small amount of overhead.

You don't need to worry about the second drawback if you are using the standard libraries. They have been written in a way that minimizes the wasted space.

Conclusion


These three types, the array, dynamic array, and linked list, form the basis for nearly all the more complex data containers. When you get to college some of the first assignments in a data structures class will be to implement a dynamic array and a linked list class on your own.

These structures are fundamental to programming. It doesn't matter what language you learn, you will be using them to work with your data.

NEXT: Stacks and Queues

Updates
2013-03-29 - typos.



License


GDOL (Gamedev.net Open License)




Comments

I think it would be helpful if you added a code snippet of how a Vector looks like as a Class. At least a barebones version.

 

Something like:

 

class Vector
{
   public:
      /* some methods */
   
   private:
      sometype *internalArray;
      unsigned int currentLength;
      unsigned int maxCapacity;
}
 

Fine article but the title is really not fit. "pre-college"? Why would you call it that? Why not something like a "gentle introduction to datastructures"?

 

Also some people did take a college degree just not in cs, so should they not read this great article if they graduated with a major in business and minor IT and knew nothing about data structures?

 

So "after" and "pre" college are really not fit for the article here. Oh and that stuff was taught at my University as a part of an object oriented class in java the very first year for freshman students so at least in my country this is not pre-college stuf smile.png

 

cheers smile.png

 

Edit: I voted the article up :)

Fine article but the title is really not fit. "pre-college"? Why would you call it that? Why not something like a "gentle introduction to datastructures"?

 

Also some people did take a college degree just not in cs, so should they not read this great article if they graduated with a major in business and minor IT and knew nothing about data structures?

 

True enough.

 

Part of the motivation of this series was to write something my 12-year-old could understand, even though she isn't really interested in programming.  She was a great proofreader for the series.

 

 

My target audience was the common demographic we see in the For Beginners forum, which tends to be 12-18 year old individuals.  Hence the name.

 

 

I'll think about renaming the series.

I think it would be helpful if you added a code snippet of how a Vector looks like as a Class. At least a barebones version.

 

I thought about doing that, and in my earliest drafts I had some source code.

 

Then I remembered my target audience.

 

I want my 12-year-old who does not program to be able to read the article and understand the concepts.

 

She asked me about adding more examples.  I thought about including something, but which language?  C++? Java? C#? Something else?  

 

Then I asked her for more clarification on examples.  She didn't want source code.  After struggling with a definition, she described that she really wanted use cases.  

 

That's where the example of a high score list came from.

 

In her case, the extra bit of source code for the five scores was actually a distraction from the article rather than a contribution.  This is the only article in the series that includes source code for a use case.  (I don't know if you can see the other articles yet, the links to take me to an error page when I try to view them when not logged in.)

 

Perhaps that would be better served for the comments after the article rather than inside the article itself.

A small typo:

 

The numver of items actually used is represented with currentLength.

"Remember that a dynamic array potentially caused every item to move, a dynamic array keeps every other object in place."

 

I think you want to say "a Linked List" in the second place where you say "dynamic array"? In the Linked List Benefits section.

I agree that you should include more code. You said your daughter the proofreader is not interested in learning programming and that the code snippets distracted her, but surely your target audience is actually interested in learning programming and will want to see clearly how these structures are used?

 

Aside from this, excellent article.

Thanks for both corrections.  Fixed.

I like the article. I wouldn't call the items of the array as "objects". It might confuse the target audience (who might or might not know about what objects are, thus not knowing if you're talking about objects in the programming sense or not).

 

I'd call them "array elements" or "items".

 

Just IMHO.

The one thing I find missing in this is the suggestion of the double linked list.  I realize it is an "upgrade" but I think it should be stated or at least explained that the std::list is very different than your single link list explanation.  "Just in case" the reader knows std::list..

Nice article. I really like that it is largely language agnostic, which is especially appropriate for an article on algorithms, and I think it is pitched at just the right level.

"You can't do that with with a simple array."


"You are stuck with the same size of items for the duration of the object."
This seems a bit misleading to me.  I think 'size of items' kind of entails that the actual items, which would be integers in your example, are affected by the length of the array, rather than the number of items you can fit into the array being influenced.

 

Perhaps:

"You are stuck with the same amount of items for the duration of the array."

(Note also 'object' changed to 'array' for clarification.)

 

"What we need is an array that can change size."

Maybe 'size' should be changed to 'length' here?  It seems like 'length' is more frequently used for arrays.

 

"A dynamic array is an array that can change sizes."

Sorry if it seems like I'm nitpicking, but 'sizes' might be better off if it's changed to 'length'...or at least 'size' since there's only one size.

"and everything is copied over to th new internal array."
Just a little typo.

 

"Note:  Removing an object to a dynamic array"
I think that 'to' should be 'from'.

Uh I saw your debates in the comments so I decided to give my view point as a 16 years old "begginer" in programming.

 

So as a start I would like to say I am completly self taught, I learned HTML,CSS really young, when I was 4th grade ( that would mean I was 11 years old) and followed the road to javascript jquery php and finaly c++, after I covered the basics of C++ I started learning SDL from all the great articles arround the net. That not only  taught me how to use SDL but how basic games are structured and the essentials of how they work inside out, I started using classes, functions, pointers, timers and much more, everything I have put so many hours of learning was in use

And then I realised I didnt know how to use structures, so I decided to go back and learn it and this got me to here.

 

Now I knew the stuff in the article except one thing that shoked me, you said in the article:

"Again, the major languages have linked lists in their standard libraries. In C++ it is called a list "

I found out I didnt know structures at the point when I wanetd to create a 3d tile game and I found out I didnt knew how to structure it at all, after reading abit about it I found out I would like to use a linked list but when I turned my back at std:list It said that its not infact a link list and its not what I wanna use, so now I am confused about it.

 

As a whole the article is great its really helpful series even for someone that is abit older then your daughter but as I said I already new THAT much of the basics at the moment of this righting there are alot more tutorials of the series for me to read so I am sure I would learn alot from them. Thank you for your effort to put this series together.

 

P.S sorry for the wall of text.


Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS