I'm trying to implement a board game using features that are tuned using temporal difference learning.
I've read quite a few descriptions of the TD implementation but can't seem to find any clean code examples.
I'm specifically looking for c-like code (c# would be optimal, java next best) that demonstrates TD learning
using a function approximator (I'm not interested in q-states).
Particularly of interest is how weights of the function are updated and what values of alpha (refer the function approxiamation
part of http://www.scholarpe...erence_Learning) are reasonable - initial testing suggests less than 0.1
also does the value of alpha change over time
DruzilMember Since 27 Feb 2011
Offline Last Active Aug 12 2014 09:26 PM
- Group Members
- Active Posts 45
- Profile Views 2,964
- Submitted Links 0
- Member Title Member
- Age Age Unknown
- Birthday Birthday Unknown