I've read quite a few descriptions of the TD implementation but can't seem to find any clean code examples.
I'm specifically looking for c-like code (c# would be optimal, java next best) that demonstrates TD learning
using a function approximator (I'm not interested in q-states).
Particularly of interest is how weights of the function are updated and what values of alpha (refer the function approxiamation
part of http://www.scholarpe...erence_Learning) are reasonable - initial testing suggests less than 0.1
also does the value of alpha change over time
thanks
: Added tags to topic.