Sign in to follow this  
Coldon

My Neural Network Tutorials

Recommended Posts

I read the whole thing

1. Overall a great tutorial
2. The graphs where you have Input1 and Input2 on xy axis are a little confusing. There should be a little more description on what they represent, and maybe some discussion on what they show. I was not able to understand the other one where you have the hidden layer, and what exactly is shown on the graph with the hyper planes
3. I was also a little unclear on the purpose of the bias boxes. Why is their weight -1 ? Why not 1? What is significance of that -1? Is a network equivalent to a network with or without bias boxes? I mean... are they there for completeness, or for complexity?
4. I completely agree with you on the Object Orientation. My friends are software engineers and they have been spoon fed that bullshit for years... we run into heated discussions about that very topic all the time. They always quote extensibility and shit like that, but I will always be done a task before they can type "public static void xyz extends abc throws def {....". There are just some things where OO is not a good idea, and they REFUSE to understand that. It infuriates me.

Share this post


Link to post
Share on other sites
My one remark, and its on the use of Neural Nets in general, not the tutorial. I've worked through two different Neural Net projects, and have invested a bit of thinking into it. I always come up with the idea that Neural Nets really do nothing... You need to make these gigantic assumptions, like "Hey, the data i'm trying to classify is distinguishable by the boundary of a certain curve." I took data mining, and it just seems like there is just way too much human input for it to be truly interesting.

Another thing is the use of squared error when training a network (in gradient descent). It makes it easy, but the error is unfairly waited against data that is far away from the correct point than data that is close. This always bothers me, even in something as simple as linear regression.

Share this post


Link to post
Share on other sites
Quote:
Original text from Coldon's tutorial
Now I’ve seen various implementations and wait for it… here comes an OO rant: I don’t understand why people feel the need to encapsulate everything in classes.
...
So below is how I structured my neural network and afaik it’s as efficient as possible. If anyone can further optimize my implementation please do!

That sounds like a challenge ;)
Can we OO-ify your code and retain the speed? I'll get back to you on that...
[edit]
On the topic of improving the speed - I'm sure some use of SIMD and multi-threading would work wonders here ;)

[Edited by - Hodgman on May 14, 2008 2:13:07 AM]

Share this post


Link to post
Share on other sites
I took a look at the code and I think that there's a lot of things that could be improved.

1. Use std::vector<> instead of arrays. Should the class neuralNetwork for some reason throw an exception during construction it will leak memory like a sieve. Using vectors you don't need to write all those deletes in the destructor either and vectors conveniently know their sizes.

2. Separate conceptually different things. There's no reason why the neuralNetwork class should contain all that code for running training. The code would be better organized if the neuralNetwork class would only represent the network and have the propagation methods. Training should be handled by a separate class.

3. Keep headers small. Many of the methods in neuralNetwork class are large enough to warrant placing in a .cpp file. This speeds up compilation when using your library and makes it easier to read the headers for documentation.

4. BPNs don't usually require weights to be initialized to random values. That's just an old 'superstition'. In any case, calling srand() in the bowels of library code is not a good idea. It should be left to the main application.

5. Well commented is not the same thing as having comments all over the place. It's more important to have useful comments. Like having the comments 'train the network' before a function named trainNetwork or 'return results' before a return statement are not really useful.

Share this post


Link to post
Share on other sites
Quote:
Original post by SnotBob
I took a look at the code and I think that there's a lot of things that could be improved.


I agree. To add to the other points,

1) I see you pass vector<dataEntry*> by value many times, which is going to create a copy of the entire vector every time. You should pass it as a "const vector<dataEntry*> &", or non-const only when it will be used to return something.

2) If you're going to return the internal variables (like a dataSet *), you should make it const.

3) It's bad form in C++ to place a 'using namespace' in a head file, since anything that includes it will get unexpected name conflicts.

4) There's no point using #define for constants. You could just place "const double LEARNING_RATE = 0.001;" in a header and it will work the same.

Apart from that, I don't know why you had the huge rant about OO being slow and a waste of time. You seem to have encapsulated your code well enough for my liking. However it is true that most university professors overdo it a bit, trying to teach dim-witted students the merits of good development practises :)

Share this post


Link to post
Share on other sites
Quote:
Original post by SnotBob
I took a look at the code and I think that there's a lot of things that could be improved.

1. Use std::vector<> instead of arrays. Should the class neuralNetwork for some reason throw an exception during construction it will leak memory like a sieve. Using vectors you don't need to write all those deletes in the destructor either and vectors conveniently know their sizes.

2. Separate conceptually different things. There's no reason why the neuralNetwork class should contain all that code for running training. The code would be better organized if the neuralNetwork class would only represent the network and have the propagation methods. Training should be handled by a separate class.

3. Keep headers small. Many of the methods in neuralNetwork class are large enough to warrant placing in a .cpp file. This speeds up compilation when using your library and makes it easier to read the headers for documentation.

4. BPNs don't usually require weights to be initialized to random values. That's just an old 'superstition'. In any case, calling srand() in the bowels of library code is not a good idea. It should be left to the main application.

5. Well commented is not the same thing as having comments all over the place. It's more important to have useful comments. Like having the comments 'train the network' before a function named trainNetwork or 'return results' before a return statement are not really useful.


in my implementation, there is no need for vectors honestly, yes i have extra deletes in my constructor. There is no performance gain in switching to vectors. As for exceptions, i don't really see your point.

your idea would over complicate things, why separate it into two classes when both classes are tied together? Not to mention all the linking you'll have to do between the classes, it just adds complexity to something thats not necessary.

I don't like the separation into two files, afaik there is no major performance loss during compilation, it makes debugging a lot easier. I wish i have my one c++ optimization book here to check that but its at the office, I'll take a look tomorrow and report back.

As for initialization the weights, you're completely mistaken, if you don't initialize the weight to a random value, if you run the same data through you'll always get the same result, thats the whole reason that the weights are initialized to random values for even if the data is the same it can produce different results, for any dataset there can be multiple completely different weight sets that give the same accuracy.

For the comments i'll give you that one, i was extremely rushed when writing the NN as i was busy developing a Dynamic Niche ES at the same time. I'm probably sure there are some super dumb or redundant comments in there. I've started commenting basically out of habit now, or i'll comment something i was thinking. Another thing i tend to do, is before i code i outline all the sections of the algorithm in comments ie.


//load data

//create network

//train

//run validation set

//return results


so i often leave those things in... its a bad habit i know...

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
Quote:
Original text from Coldon's tutorial
Now I’ve seen various implementations and wait for it… here comes an OO rant: I don’t understand why people feel the need to encapsulate everything in classes.
...
So below is how I structured my neural network and afaik it’s as efficient as possible. If anyone can further optimize my implementation please do!

That sounds like a challenge ;)
Can we OO-ify your code and retain the speed? I'll get back to you on that...
[edit]
On the topic of improving the speed - I'm sure some use of SIMD and multi-threading would work wonders here ;)


oo can be fast but as the guy mentioned college professors and "software architects" drill in pointless practices so that you end up with code like this:



(base fitness evaluator)(es.getElements()[i].getFitnessEvaluator()).getFitnessValue().toNumeric();



just to get a fitness value from an element, sound ridiculous check out CILIB, cilib.sourceforge.net . Most of the professor in my department are shoving OO down peoples throat with no thought or reason...

Share this post


Link to post
Share on other sites
Quote:
Original post by hh10k
Quote:
Original post by SnotBob
I took a look at the code and I think that there's a lot of things that could be improved.


I agree. To add to the other points,

1) I see you pass vector<dataEntry*> by value many times, which is going to create a copy of the entire vector every time. You should pass it as a "const vector<dataEntry*> &", or non-const only when it will be used to return something.

2) If you're going to return the internal variables (like a dataSet *), you should make it const.

3) It's bad form in C++ to place a 'using namespace' in a head file, since anything that includes it will get unexpected name conflicts.

4) There's no point using #define for constants. You could just place "const double LEARNING_RATE = 0.001;" in a header and it will work the same.

Apart from that, I don't know why you had the huge rant about OO being slow and a waste of time. You seem to have encapsulated your code well enough for my liking. However it is true that most university professors overdo it a bit, trying to teach dim-witted students the merits of good development practises :)


1) 150% correct, i cant believe i was doing that, like i said i was rushed when writing it, a couple of people were hassling me about the code so i had to code it quickly.

2) the const-ness is not a necessity, i guess its more of a debugging tool, i should be specifying it for certain parts but there wont be any performance impact, i agree my code is a little sloppy without doing that.

3) Again a shortcut when coding, i make use of the namespace filtering in other other file (i know i shouldn't, but forgive me, i get lazy) :P

4) just a matter of preference i guess, i think i got it drilled into me by some idiot prof that global variables are bad. Ugh i honestly think i had less bad habit entering college than i do leaving it...

As for the OO, as i said earlier it can be fast and elegant when used properly, unfortunately if you've looked at college code its a nightmare especially once they start teaching design patterns and OO together, it becomes a mess. A good example was a simple bank account model: their final program modeling a simple bank account (deposit, withdraw, check if limit reached) ended up with 4 classes and made use of abstraction, inheritance, etc....

another NN implementation i saw and at what my comment was directly ws one guy that had every neuron as a class, with its own activation function, weights, etc. Then he had a layer class for each layer with lots of accessor methods for inter layer communication. Then another large NN class... I'm sorry but thats just ridiculous and pointless? all those classes compared to like 4 arrays? What about when you have netowrk like i did for my image recognition video surveillance system with over 800 inputs, the overhead builds up...

Sorry for the multiple posts, and thanks for the criticism, i really appreciate it, i'v been away from c++ (had to do php, c# work to pay the bills) for a while and falling in love with it all over again, but I'm still a bit rusty...

Otherwise apart from silly things is the code quality okay?

Share this post


Link to post
Share on other sites
Quote:
Original post by lemour9907
I read the whole thing

1. Overall a great tutorial
2. The graphs where you have Input1 and Input2 on xy axis are a little confusing. There should be a little more description on what they represent, and maybe some discussion on what they show. I was not able to understand the other one where you have the hidden layer, and what exactly is shown on the graph with the hyper planes
3. I was also a little unclear on the purpose of the bias boxes. Why is their weight -1 ? Why not 1? What is significance of that -1? Is a network equivalent to a network with or without bias boxes? I mean... are they there for completeness, or for complexity?
4. I completely agree with you on the Object Orientation. My friends are software engineers and they have been spoon fed that bullshit for years... we run into heated discussions about that very topic all the time. They always quote extensibility and shit like that, but I will always be done a task before they can type "public static void xyz extends abc throws def {....". There are just some things where OO is not a good idea, and they REFUSE to understand that. It infuriates me.



oooh missed a post, those graphs represent the search space of the problem, the dots are the data entries and their positions on the search space and their colors represent their category. the hyperplanes get generated by the neural network to separate the search space into different categories. i used two dimensions to make it easy, the graphs dont represent the x y planes but rather the values of input 1 and 2. (the graph are so labelled as well).

The bias boxes need to be set to -1 to comply with the formula for the calculation for the activation level of the neuron, again i think i explain their significance.

Share this post


Link to post
Share on other sites
Quote:
Original post by Coldon
in my implementation, there is no need for vectors honestly, yes i have extra deletes in my constructor. There is no performance gain in switching to vectors. As for exceptions, i don't really see your point.

It's not a performance issue. Ask yourself, what happens if I were to modify initializeWeights(), called by the constructor, so that it might throw an exception. If you used vectors, your constructor would be exception safe and you'd have no need for the destructor. Resource Acquisition Is Initialization is a standard C++ technique.

Quote:

your idea would over complicate things, why separate it into two classes when both classes are tied together? Not to mention all the linking you'll have to do between the classes, it just adds complexity to something thats not necessary.

Actually, it would simplify the code, making it easier to read. The network class would be self-contained and a training class would make use of the public interface of the network class.

Quote:

I don't like the separation into two files, afaik there is no major performance loss during compilation, it makes debugging a lot easier. I wish i have my one c++ optimization book here to check that but its at the office, I'll take a look tomorrow and report back.

How does it make debugging easier? Each time a header file is included, it's compiled and what's far worse is that each time a header is modified, all the files including it must be recompiled, which is a huge waste of time.

Quote:
... for any dataset there can be multiple completely different weight sets that give the same accuracy.

Which is exactly why the initialization is not usually needed. Any one of the possible network configurations will perform just as well as another.

Quote:

For the comments i'll give you that one

If there's one thing that you should always comment, it's functions and classes in header files. A good practice is writing comments similar to the javadocs used to generate the JDK API documentation. If you need to, you can nicely produce that kind of documentation from C++ with Doxygen.

Share this post


Link to post
Share on other sites
you have a point with the c++ comments, you'll kill me for this but i disable exceptions on compilation! I'm actually going to try your suggestions and first chance i get separate the NN into two. I've been thinking about it and i think you might be onto something. you can make your code exception safe with arrays too, it just takes a little more effort ie. a cleanup function call before you exit the catch block.

As for the header files ( i'm lazy, and hate having to swap between files for definition and implementation ), i didnt think of it from a compilation stand point, i think you're right about the recompilation, like i said i'll need to check cause i don't want to talk out of my ass, my one optimization book had a whole chapter on compile time optimization, i'll scan through it but chances are you're 100% right.

I have to disagree with the neural network part: how do you know if you're solution is good enough and that there isn't a better one? Often NNs tend to become stuck around local minimas/maximas in the search space, so stuck that even using momentum cannot get them out.

its happened before, i'd got a huge dataset of silhouettes from one of my cameras and ran it through the network getting accuracies of around 70% which in itself isn't too bad, but out of all the run ( i did about 40 of them, 2 of the runs managed around 85%), exact same dataset (perhaps a little reshuffled), if as you claim the initialization is a myth why is it still in almost every AI reference book. If you can please point me to a reference for your statement. Most recent journal articles on neural networks still randomly initialize the weights, i cant imagine that all these researchers are so in the dark.

for the comments, javadoc/doxygen is a little time consuming for me, i'd rather not comment the functions so much as the code inside the functions. I know for enterprise systems you have to, its a necessity but for something small like this its a bit overboard (again laziness, rather my course load is crazy atm - doing my honors currently, thats first year of masters for you guys in the states.)

oh and thanks for your comments, i've learnt a bit today :P

Share this post


Link to post
Share on other sites
1. Sure, you can make array allocation exception safe, but vectors do that for you with minimal amount of code.

2. I don't think that relying on random weights to avoid local minima is a very good way to deal with that problem. Detecting a local minimum and summing random values to the weights works much better. You're right about needing initialization though, I probably mis-associated SOM, for which random initialization of vectors is not very good, with BPN, which reminds me to always read up on stuff I don't remember so well before commenting. Anyways, here's an article on the subject of initial value ranges. It seems like a good idea to allow the application control over the range of initial weight values.

3. A good thing about documenting functions is that it makes you think what the function is supposed to do and what the parameters are for. Also, for anyone reading the code, that's the first place the look at and they get annoyed if they don't see good comments up front. Also, Doxygen comments are not that big and I think that you can get plugins for Visual Studio that automatically create Doxygen comment templates.

/// Brief description of the function.
/**
* Multi-line detailed description. Can use latex-style math and other
* clever things for very nice formatting.
* \param x This is the argument.
* \return Stuff done on x.
*/
int doSomeStuff(int x);

And since you're writing a tutorial, good commenting style is especially important.

Share this post


Link to post
Share on other sites
yeh, look for the gradient descent learning method, the random initialization is how you'd introduce diversity into the searching. For other methods you'd probably not need random initialization.

As for initialization ranges a good way to do it is to use a normal distribution over the number of inputs. I just figured that i didn't want to overcomplicated things.

I read up on the whole header inclusion and you are 100% correct as i expected. The more i've been thinking about it, the more elegant it would be to have a NN class and an NN trainer class :P

I'll look for VS doxygen plugins. You think anyone would be interested in PSO or ES tutorials? not that they have that much use in game AI. :/

Thanks a ton though for your input, its really nice to meet people that know what they are doing. Code optimization is an art. Its been a nightmare here where i have no one to turn to for advice or help. Like i practically got into image processing by accident, and now i'm the resident "expert" on it and i barely know anything :(

god i hate this country! nice thing is I'm getting out of here in a few months...

Share this post


Link to post
Share on other sites
I'd just like to comment on the college professor bit.

Yes, college programming education is not the best. I failed at my first try because my teacher created the class and coded the implementation right next to the methods and the naming that she used was a disaster. Even looking at it now (after knowing how to code up classes and do inheritence and all) I still get confused.

I quit college and picked up programming books and it was a cinch. I wonder if those prof's ever ask themselves: Is this code EASY to understand, and EASY to MAKE CONNECTIONS with code in SECTION A and SECTION B? Obviously this was never the case at my college. I flunked once she introduced classes.

Howabout we aim for the learning part first, and then tackle the performance and etiquette second? For someone who took a week or so to learn classes, i could teach someone classes now within 30 minutes.

Professors are so stupid... I work at cablevision in new england as phone tech support. One Ph.D guy called me and said of his masters pupils: "After reading their reports, they're really quite frightening. And these are people who have masters upon masters. You'd think they'd be a bit brighter..."

Ugh. If you want to feel your IQ drop 30 points go visit the Bronx or Brooklyn in New York. Just speaking to those people drops my IQ. Anyone want to start a non-profit organization that finds professors with intelligence? It's not hard to be booksmart. It's hard to be intelligent :P

Share this post


Link to post
Share on other sites
To save time writing comments for every little thing, make the class, function and variable names comment themselves. Self documenting code makes life 1000000x easier in that if you change something you don't have to update stupid documention. Our entire engine is like this and adding in a new member requires about 1 hours worth of energy compared to before, taking days or weeks. Remember a good IDE will have intelisense and will complete the names for you ;-)

class GruntAI
{
void MoveTo(const Vector3D& Destination);
};

As for const "correctioness", I find that makes life easier for debugging and code flow (reading, so you know what it should/shouldn't do).

As for separate files, I'll save you the lookup in your book, it is a lot faster to have most of the code (except inlined functions) in a separate .cpp file.

Share this post


Link to post
Share on other sites
I've been playing around with headers and name spaces, and I'm a firm believer in that now. it actually cleans stuff up a lot.

I'm using more constness too, i was just lazy before tbh.

As for colleges, i think they need to spend your first years going through the basics of programming languages, perhaps do a managed, a unmanaged, an interpreted and maybe even a functional language.

once thats done, they need to move onto just theory regarding algorithms, and techniques and leave the language choice and programming techniques to the students themselves. Nothing pisses me off more than when they force me to use a specific language or OS.

This is how some of our final year courses are handled, you are run through the theory of the assignment, and the implementation is all up to you.

The old adage: those who can do and those who cant teach holds true quite often. I wouldn't trust 70% of my lecturers with coding a hello world application.

Share this post


Link to post
Share on other sites
Nice tutorial. Very readable with some great figures. Probably the best I've read of online ANN tutorials.

About the code, pretty much what was mentioned above:

- get rid of the hand managed arrays. In C++ there is no reason to not use Vectors. There is no performance hit since Vectors have the same access speed as standard arrays, but are much safer, and result in shorter, more readable code.

- separate the training from the ANN code. When solving a problem, an ANN is simply a mathematical object like a Matrix or a Vector. The purpose of training is to find a suitable ANN specimen that solves the task at hand. It doesn't make sense imho to have the training code integrated into the ANN code.

Share this post


Link to post
Share on other sites
Quote:
Original post by mpipe
I still don't see how to use NNs practically in games. Are there any tutorials out there on that?


Possibly because NNs aren't used practically in games for the most part. You can probably count the number of games with a NN on the fingers of one hand.

Share this post


Link to post
Share on other sites
they could be though, i really don't see why not? you'd have to think about the uses intelligently tho. You would have to probably combine them with another CI technique like a PSO / EA learning approach to get them trained quickly enough.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this