I'm trying to implement a board game using features that are tuned using temporal difference learning.
I've read quite a few descriptions of the TD implementation but can't seem to find any clean code examples.
I'm specifically looking for c-like code (c# would be optimal, java next best) that demonstrates TD learning
using a function approximator (I'm not interested in q-states).
Particularly of interest is how weights of the function are updated and what values of alpha (refer the function approxiamation
part of http://www.scholarpe...erence_Learning) are reasonable - initial testing suggests less than 0.1
also does the value of alpha change over time
DruzilMember Since 27 Feb 2011
Offline Last Active Feb 23 2015 03:24 PM
- Group Members
- Active Posts 46
- Profile Views 3,320
- Submitted Links 0
- Member Title Member
- Age Age Unknown
- Birthday Birthday Unknown