# Whats the difference between Reinforcement Learning and Case Based Reasoning?

## Recommended Posts

Icebone1000    1958
Im not understanding very well..
Try and error vs past cases...but if you want learn something from try and error, you will save the goddam tries (witch in my brain can be seen as past cases, witch in my brain is the same thing as a state)

i.e.
When player was at that pos, I shooted that, he shooted that, I died, player lives;
Learned that I shouldnt shoot that when player is at that pos with that wepaon.
Save state/case, for reference next time samething happened..

My knowledge on the subject are very limited as you can see.

##### Share on other sites
Emergent    982
I think there are two answers.
1. "Reinforcement learning" tends to involve a [url="http://en.wikipedia.org/wiki/Bellman_equation"]Dynamic Programming[/url] step that propagates information from neighboring states.
2. "Reinforcement learning" is typically "eager learning," (either of a policy, or of something that implies a policy, like a cost-to-go function or q-table) whereas "case-based reasoning" is typically "[url="http://en.wikipedia.org/wiki/Lazy_learning"]lazy learning[/url]."

##### Share on other sites
Emergent    982
I realize my reply may not have been very helpful, so let me be more specific by giving examples of how each method can work:

[b]Case-based reasoning[/b]
Save a state trajectory -- a "replay" -- of every game that you play to a database. Then, when the player is at state [i]x[/i], look in your database for saved state trajectories that pass near [i]x[/i]. Then, take whatever action was performed in the most successful trajectory (i.e., the one that has the best final state).

[b]Reinforcement learning[/b]
Store a "how good is this state" number for each state. This is called the "value" of a state. Then,

- To select an action during play, take a look at the different states you can get to, and pick the action that takes you to the one with the best value.

- The value of state is tricky: It needs to encode the [i]future[/i] rewards that the player will get. Or in other words, it needs to account for the reward that you'll get at the [i]next[/i] state, and the one after that. So to update the value of state x, move it closer to this quantity:

[the reward you just got at state x] + [the best value you can get from a neighboring state].

If you want to know the details, take a look at [url="http://en.wikipedia.org/wiki/Q-learning#Algorithm"]Q-learning[/url]; most "reinforcement learning" algorithms are variations on this theme.

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account