If you choose a peculiar framework, you have the burden of proving that it makes sense.
I'm trying to implement a board game using features that are tuned using temporal difference learning.
I've read quite a few descriptions of the TD implementation but can't seem to find any clean code examples.
- Do your "features" really change over time, or it's a clumsy way to account for different strategies at different stages of the game, or just "evolution"?
- Why do you want to forget old examples with that alpha factor in the first place? If some examples are somehow worse than others you want to give them less importance than better examples, which has nothing to do with a meaningless sequential ordering.
- Usually, machine learning in board games is applied to finding one good position evaluation function that, given a game state, matches the outcome of an exhaustive simulation from that state. How does the peculiar sequential structure of TD learning fit such an inherently memoryless problem?