Jump to content

  • Log In with Google      Sign In   
  • Create Account


TD learning code example with function approxiamation


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
7 replies to this topic

#1 Druzil   Members   -  Reputation: 559

Like
0Likes
Like

Posted 21 March 2012 - 07:07 PM

I'm trying to implement a board game using features that are tuned using temporal difference learning.
I've read quite a few descriptions of the TD implementation but can't seem to find any clean code examples.

I'm specifically looking for c-like code (c# would be optimal, java next best) that demonstrates TD learning
using a function approximator (I'm not interested in q-states).

Particularly of interest is how weights of the function are updated and what values of alpha (refer the function approxiamation
part of http://www.scholarpe...erence_Learning) are reasonable - initial testing suggests less than 0.1
also does the value of alpha change over time

thanks

Sponsor:

#2 LorenzoGatti   Crossbones+   -  Reputation: 2458

Like
-1Likes
Like

Posted 23 March 2012 - 07:50 AM

I'm trying to implement a board game using features that are tuned using temporal difference learning.
I've read quite a few descriptions of the TD implementation but can't seem to find any clean code examples.

If you choose a peculiar framework, you have the burden of proving that it makes sense.
  • Do your "features" really change over time, or it's a clumsy way to account for different strategies at different stages of the game, or just "evolution"?
  • Why do you want to forget old examples with that alpha factor in the first place? If some examples are somehow worse than others you want to give them less importance than better examples, which has nothing to do with a meaningless sequential ordering.
  • Usually, machine learning in board games is applied to finding one good position evaluation function that, given a game state, matches the outcome of an exhaustive simulation from that state. How does the peculiar sequential structure of TD learning fit such an inherently memoryless problem?

Produci, consuma, crepa

#3 Druzil   Members   -  Reputation: 559

Like
0Likes
Like

Posted 25 March 2012 - 11:01 PM

I'm not looking for a framework, I'm looking for example code implementing TD for function approxiamation. The tuning of features is only done once, it is not an on-going exercise. I will define many features and I'm using the TD to provide a weighting for each of those features. Once the TD algorithm converges I could potentially hardcode those weights against each feature. The TD code would then have done its job and not be needed again. Because there are potentially hundreds of feaures I don't want to try and tune each of them by hand. I already have a comprehensive implementation of adversarial search using all the usual suspects for look ahead. (alpha/beta, principle variation, killer moves, transpotition tables, etc) The heuristic function used returning an evaluation of the position will be made up of the features and their weightings.

#4 LorenzoGatti   Crossbones+   -  Reputation: 2458

Like
0Likes
Like

Posted 26 March 2012 - 02:22 AM

Well, I'm arguing that TD learning doesn't seem a particularly suitable framework for your application. Since you don't seem too clueless about boardgame AI, is it some kind of homework in which you have to use TD learning? Or TD learning makes sense after all?
It isn't at all clear whether you have stumbling blocks in your implementation, or a complete implementation that performs badly.
Produci, consuma, crepa

#5 Druzil   Members   -  Reputation: 559

Like
0Likes
Like

Posted 26 March 2012 - 04:40 PM

The bigger picture is that I'm creating a general gaming playing bot. Part of this is taking in GDL (game description language) and creating features from ruleset using evolutionary strategies. The features that are created from the ruleset are mostly going to be junk so there needs to be a mechanism for tuning the features to see what their
worth is towards a heuristic and whether the feature gets used or chucked and also what its worth is towards evaluation. I've used genetic programming for this in the past and had very limited success with it. I've read a bit of literature (scientific journals) using TD to provide feature tuning and was hoping to emulate some of these techniques. I'm just a hobbyist that finds this stuff extrememly interesting :) Over the past few days I've managed a TD implementation using a linear combination of features and realised that I probably need to move to a ANN. As a simple example if you take checkers and a single feature that returns the difference in the number of men that you have, a linear evaluation doesn't really give a good representation of the feature (e.g 12 vs 11 men would eval the same as 2 vs 1 men). I was looking for code examples to vet my understanding and to see how decaying learning rates and other issues (that I'm not aware of) are handled. I've implemented AI on quite a few boards games successfully using adversarial search and hand tuned evaluations and now stepping up to more complicated problems.

#6 jefferytitan   Members   -  Reputation: 1595

Like
0Likes
Like

Posted 28 March 2012 - 03:05 PM

Good luck, sounds interesting although outside my area of expertise. One possibility that may help with either a genetic algorithm or a NN is offering some "pre-chewed" metrics, e.g. have one input which is the difference, one which is the difference ratio, one which is the log of the difference. That reduces the complexity that the NN or GA needs to represent, as the most appropriate metric will tend to be strengthened/selected for. Although I'm not sure whether more inputs will introduce other difficulties.

#7 Druzil   Members   -  Reputation: 559

Like
0Likes
Like

Posted 28 March 2012 - 04:44 PM

Interesting suggestion Jeff. The trade off would be possible more useful metrics to more metrics. The more inputs, the longer the NN/GA should take to converge, but the better metrics would shorten the converging. It would be an interesting test to see which option would work better

#8 jefferytitan   Members   -  Reputation: 1595

Like
0Likes
Like

Posted 28 March 2012 - 06:08 PM

I've heard of people using that approach successfully for specific tasks such as handwriting recognition. I think the success would partly be selecting potentially useful transformations and partly be a crapshoot. ;)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS