How to start using Data Structures in your code

Started by
18 comments, last by cache_hit 14 years, 1 month ago
Hi, I've done a basic course on Data Structures And Algorithms where I studied about Data Structures like binary trees, Red Black Trees, Heaps, graphs, etc. My problem is, that I don't really know where to use these Data Structures in my programs. I've learned them , so I'm itching to actually use them in my code, but I don't know how to start using them. Any ideas where to start?
I blog here : http://lameness-prevails.com/
Advertisement
Data structures are tools, you shouldn't make an application just to use them. Most of them are also probably already implemented in the standard libraries of your preferred language, so you will rarely need to do it yourself.
If you have learned data structure, you have probably also learned when and how to use them. In a real application there are a lot of opportunity to use some kind of data structure, just start implementing one and you will be able to see it by yourself.
Quote:Original post by kvsingh
My problem is, that I don't really know where to use these Data Structures in my programs.
In general, you don't.

On a day-to-day basis you will use the implementations provided by the language developers (i.e. std::list, std::map, etc.). Only when solving complex and unusual problems will you need to implement these data structures yourself.

However, data structures are an essential part of a computer scientist's toolkit. It is important to understand the performance/memory behaviour of each data structure, and how that relates to the implementations you are using (for instance, std::map is typically implemented as a red-black tree), so that you can choose the right tool to solve a given problem.

That, and I hear that data structure implementation often shows up as 'gimme' questions on job interviews [smile]

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

You use them to solve the problems that they are useful in solving. Don't have the problem? Oh well. Don't artificially create a problem (except to practise, in a separate project). Don't know what problem is solved? Well, then you didn't really learn the material, hmm? ;)

Data structures don't really do anything on their own. They're just ways to... structure data. In the same way, cleaning up your room doesn't do anything by itself. It does, however, make it easier to look for things (apply an algorithm that is compatible with the data structure). Correspondingly, a heap, for example, makes it easy to find large (or small, or whatever other property) elements, which in turn makes it easy to sort them by size (or whatever other property) - i.e. HeapSort - or process them in order, without doing all of the sorting ahead of time.
Yeah,

In practice it's definitely rare that you actually create your own data structure; most modern languages provide common implementations for you. Your knowledge of Data Structures will allow you to know that (a) you need one and (b) which is the best candidate. The "Algorithms" part of that class is also very important. Being able to evaluate and reduce the complexity of your algorithms is a supremely important part of getting a game to run at framerate.

However, here's a couple of simpler examples off the top of my head:
Texture manager: hashmap of name -> textureID so you only have to load textures once
A* pathfinding: binary heap to store the open node list
AI behavior: stack to represent complex behaviors as compositions of simple ones
General: queue for various types of event passing/processing
Interviewing: you'll be asked to design a data structure for some random task in almost every technical interview.

But generally you'll encounter those interview-type situations commonly. You have some weird set of data that you need to access in very specific ways: design or choose a data structure and algorithm to most efficiently handle those conditions.

-me
Quote:Original post by kvsingh
Hi, I've done a basic course on Data Structures And Algorithms where I studied about Data Structures like binary trees, Red Black Trees, Heaps, graphs, etc.

My problem is, that I don't really know where to use these Data Structures in my programs. I've learned them , so I'm itching to actually use them in my code, but I don't know how to start using them.

Any ideas where to start?


There's plenty of places to use them in computer games, just think of what you want to do and keep in mind your tools in case you need them.

Quote:Original post by apatriarca
Data structures are tools, you shouldn't make an application just to use them. Most of them are also probably already implemented in the standard libraries of your preferred language, so you will rarely need to do it yourself.


Not for business/database/gui coding or scripting of any kind but for games in C++ using stl is a joke for a graphics engine or computation heavy tools or anything that has very specific needs (which seems to come up all the time).

This is my thread. There are many threads like it, but this one is mine.

Quote:Original post by thatguyfromthething
for games in C++ using stl is a joke for a graphics engine or computation heavy tools or anything that has very specific needs (which seems to come up all the time).
Unfortunately, that myth is still very widespread...
Maybe if you're using codewarrior or MSVC6 it's still good advice, but it's possible to get perfect performance out of the STL these days.

std::vector+std::sort over malloc+free+realloc+qsort+memset any day...
Quote:Original post by thatguyfromthething
Quote:Original post by apatriarca
Data structures are tools, you shouldn't make an application just to use them. Most of them are also probably already implemented in the standard libraries of your preferred language, so you will rarely need to do it yourself.


Not for business/database/gui coding or scripting of any kind but for games in C++ using stl is a joke for a graphics engine or computation heavy tools or anything that has very specific needs (which seems to come up all the time).

My comment was quite general, C++ isn't the only language he can be interested in and games aren't the only applications he can write. But as Hodgman said, STL can be used in games and if this is a problem in some part of code, there are a lot of already available libraries you can try before writing your own.
Quote:Original post by Hodgman
Quote:Original post by thatguyfromthething
for games in C++ using stl is a joke for a graphics engine or computation heavy tools or anything that has very specific needs (which seems to come up all the time).
Unfortunately, that myth is still very widespread...
Maybe if you're using codewarrior or MSVC6 it's still good advice, but it's possible to get perfect performance out of the STL these days.

std::vector+std::sort over malloc+free+realloc+qsort+memset any day...


Yeah, it's extremely disappointing that this advice is so widespread. At a previous company we had a few developers who refused to use STL and wrote their own containers because "STL was too slow". I did some performance tests and showed that in various cases, the STL was faster, sometimes considerably. There was not a single case where theirs were faster than STL. It's definitely possible to outperform STL, but not so simple and often not worthwhile. The myth gets on my nerves too. Working in C++ is frustrating from the point of view that this mentality is so rampant, when in 97% of cases, it's utterly unnecessary.
Quote:Original post by Rydinare
Quote:Original post by Hodgman
Quote:Original post by thatguyfromthething
for games in C++ using stl is a joke for a graphics engine or computation heavy tools or anything that has very specific needs (which seems to come up all the time).
Unfortunately, that myth is still very widespread...
Maybe if you're using codewarrior or MSVC6 it's still good advice, but it's possible to get perfect performance out of the STL these days.

std::vector+std::sort over malloc+free+realloc+qsort+memset any day...


Yeah, it's extremely disappointing that this advice is so widespread. At a previous company we had a few developers who refused to use STL and wrote their own containers because "STL was too slow". I did some performance tests and showed that in various cases, the STL was faster, sometimes considerably. There was not a single case where theirs were faster than STL. It's definitely possible to outperform STL, but not so simple and often not worthwhile. The myth gets on my nerves too. Working in C++ is frustrating from the point of view that this mentality is so rampant, when in 97% of cases, it's utterly unnecessary.


Sure, it will be faster than what someone who is a novice programmer writes, but how many times have you seen people using stl drop down to a straight c array? If you need to do that more than one in a thousand times you suffer all the penalty of if you didn't use it at all, plus the penalty of huge learning curve and million problems and quirks.

To write a good vector class as an example, it's really an exercise in memory management. That's too much for the average guy to tackle in a realistic timeframe, but the gains you can make are enormous even for vector class if you know what you are doing. Like if you might want your memory aligned properly by default or to have some SIMD get generated out (and no, this will never EVER happen automatically in your code).

And that's just easily measurable speed you can improve, not the insidious problems caused by code size bloat when you have everything defined as multiple templates defined in terms of each other, which template on multiple items.

Not to mention where is the stl kdtree, hashtable, octree, bsp, cover tree, ball tree, rtree, btree, rtree, spatial index etc. etc. etc.? I mean, not even a hash table? Seriously? So how far can you go with game programming with just a vector and a map?

Also not mentioning crossplatform compatibility. Linux? Who cares? Has anyone ever made a cent supporting linux? Mac support is nice I guess but then there's what? 5000 other platforms? Wouldn't you like to have code work on more than 1% of those platforms without additional tweaks? I have sat next to guys on multiple occasions who were trying to port stl over to other platforms, and for the time spent you may as well make your own - especially since there's not a hell of a lot in there anyway. Even if an implementation is lame you will be able to change it any time you want easily. I've gone through millions of lines of other people's code and hacked away but there's no way I could go into stlport source and figure out thing one of what's going on without searching around for hours.

A template library written almost entirely in macros! The irony of that is astounding. That's exactly what templates are there to replace, and if you need or want to resort to macros chances are you are seriously abusing templates and have a poor design, which I strongly believe is the case.

So we get this kind of lame vector class, some math utilities that are far from optimal, a sort that is slow and has a ridiculously cumbersome interface using function points. Iostreams? Probably worst implementation idea in history. Except iterators, of course, which is the other big 'selling point' that is supposedly 'object oriented'.

But, templates are the opposite of OOP. Instead of unifying your interface you are shattering it completely and making it impossible to tell what code is actually generated. WHAT GETS GENERATED? This is very hard to track down and is mostly left up to the people making the compiler, and for anything where you care about performance this is an issue. And if you care about more than one compiler, you get to track it down more than once, and you get things working great on one but not the other.

I have gotten surprised countless times at the stupid things compilers will do, especially with templates. Which is not to say not to use them at all because with C++ you are somewhat forced into it, but when you template on serveral items and do all the cute tricks you see in stl and boost, it is almost totally hopeless to ever track down what's really going on at a low level and you will see sudden performance drops for seemingly no reason. From the stupid comments I see all the time it's clear most people using stl and boost don't even know how templates really work or how the compiler works at all.

So obviously it's terrible if you really care about performance, and it doesn't have the data structures you would care about for making anything but some kind of business app without resorting to another API. And it's it's the antithesis of OOP because instead of simplifying interfaces and and types it exponentially grows them, and instead of decoupling interface and implementation it again exponentially increases the coupling effect between files.

So it's a terrible idea from either of the two important perspectives. If you are doing like a large business or database or general GUI app, you should care about the OOP aspect. If you are doing like a game or graphics tool then the performance aspect for most tasks and the OOP for things like events and GUI. If you don't care about either of these things, why are you using C++ and not perl or something?

So no, I am not spreading lies or misinformation. I used to think more like you, but years and years of struggling with C++ to make it behave properly have finally made me realize why this is such a poor approach.

This is my thread. There are many threads like it, but this one is mine.

This topic is closed to new replies.

Advertisement