Sign in to follow this  
hatch22

Do you think this is do-able (long)?

Recommended Posts

hatch22    178
Greetings everyone. I wanted to ask what you all thought of a project a friend of mine and I have started working on. If this description gets long, I apologize in advance. I simply want to make sure we are all on the same page as much as possible. I have been studying reinforcement learning (particularly Temporal Difference algorithms), especially in conjunction with function approximators such as ANN's for solving the value function (see an example and code). I want to see if this strategy can actually be adapted to an FPS-style fighting game in real time. Both my friend and I are experienced programmers and are using existing engines to create a 3D environment for the agent. To keep things simple, the agent will be a companion to the player, and all other opponent AI will be using a FSM. This way the expensive iterations through the ANN are kept to a minimum. Most of the examples of reinforcement learning I have seen applied to a game environment (or most other environments for that matter) completely learn on their own through sheer trial and error, gradually learning after a great many iterations. I am going to try to speed up the learning process in two ways: First and most importantly, in addition to the agent training the ANN by adjusting the weights based on reward, I also want the ANN to be updated by the player's behavior. The agent has the same interface to the game as the player (i.e. cursor position, movement controls, etc.) and when the player performs actions and receives a reward, the ANN is updated in the same manner as if the agent had performed the action. This allows good strategies performed by the player to be taught to the agent via the value function approximated by the ANN. Also, the player's avatar and the agent's avatar are very similar (though not identical) so that most of the states that the player encounters are likely to be experienced by the agent as well. In effect the agent learns by example, but retains the ability to experiment on its own and generalize from past experience using the ANN. You might think of it as a kind of real time semi-supervised learning. The second strategy in my design is to separate the AI into two parts. The low level AI is the one described above, learning by directly imitating the player's keyboard and mouse input based on the state the player or agent finds themselves in. Since it would be difficult to learn any long term strategies this way, there is a second layer to the AI that monitors the human player's behavior in terms of what path the player selects to follow given the player's state. This AI is aware of what parts of the level have been explored and what parts of the level are currently visible, using a waypoint grid. The possible paths that the player might take are determined by using a combined version of Dynamic A* Lite (for search speed) and Tactical A*. The High level AI observes what paths the player takes and what rewards are received for those choices, once again applying reinforcement learning and generalizing with an ANN. The Agent's behavior is also monitored in the same manner, and the low level AI is rewarded by the high level AI if it stays on the appropriate path selected by the High level AI's ANN (more info here and here). So at a more strategic level the Agent learns to mimic the player when the player performs actions that are beneficial and avoid behaviors that are detrimental. Since the goal of such an agent is not to be the most intelligent, but instead to appear human, I believe that this approach of learning by example has merit. The agent will be able to find state-action paths that lead to reward more quickly by following the example of the human companion, but it will not be prevented from performing its own experimentation and will be able to learn on its own if the player is absent. Breaking up the AI into a hierarchy of high and low level reinforcement learning segments has been shown in other research to provide slightly suboptimal results in exchange for a large reduction in state space and run time. Since the goal is not optimal behavior but believable behavior, I believe this is an acceptable trade-off. The reinforcment learning AI will then consist of two smaller ANNs, one per AI level (rather than one huge one larger than the two combined) and four iterations of the same reinforcement learning algorithm (one for each AI level and one for each avatar). Iterations will also be staggered across frames so that the entire system does not run every frame and kill the frame rate. It is our hope that such a system dealing with only one agent and one human player along with a small number of simpler opponents will allow real time performance. That is the idea of what we are attempting. Do any of you see any potential pitfalls or alternatives that I should explore? Do you think that such a system is too complex to run in real time on a high end machine? Do you have any questions regarding this approach or need clarification? I would basically like any and all constructive feedback anyone wishes to give. I am fully aware that this approach may not work at all in practice, but do you think that the theory is sound, or am I overlooking something important? Any suggestions for improvements? If anyone would like to see more sources, just ask and I will provide them. For an overview, check out the reinforcement learning survey. Thank you for taking the time to read all this. [Edited by - hatch22 on October 15, 2004 3:45:41 PM]

Share this post


Link to post
Share on other sites
To be honest, I had come up with an idea very similar to this for something I recently started putting together.

First thing i see wrong is how you are using an action leading to award setup. At first this might seem like a good idea, and in some respects it is, but what if there isn't always a reward? What would happen if there was a goal. but that goal was without an immediate reward? Therefore I think the system should be set up to work towards a goal rather than a reward.

Now on to the human behavior.
One of the most dominating aspects of human behavior, at least as i see it, is emotion. Even playing games gainst a computer opponent even I tend to get excited at certain points during the game. People get angry or worried depending on the actions of their opponent and act in a certain way because of it. Playing games against other people, especially someone from some other part of the world through the internet, constantly changing emotions shape and affect how a person reacts to certain stimuli. Computers don't feel, and they don;t react with emotion.
So the problem then, is how do you create an intelligence that mimicks human behavior while eliminating the fact that computers don't feel emotions?
Another important aspect I see, is how do you make it so that the computer AI isn't too good to play against. This may actually be the simpler part when compared to my previous example. I remember playing a copy of Counterstrike Condition Zero recently. One of the reasons....actually the main reason I stopped playing was that I became frustrated with the fact that the computer had a reaction time of what seemed like half a second or less, and I got nailed before i even passed the edge of the wall. So there has to be limiting factors to what the computer can see and delays to the reaction time as well. An enemy that can hit you form just inside the extreme limits of vision is not a good thing.

I understand how the combined pathfinding programs would work, and that does seem like a fairly good idea. Still, going back to my earlier argument, human emotion may be the key to the whole thing.
Computer opponents, more specifically in RTS games, sometimes try to fake out the player by launching a large assualt and then immediately send a second force while the first force is still fighting. This strategy hardly ever on the same person more than a few times, but the strategy is also one of several that have been preprogrammed into the system and are seen as possibilities to victory. This example might be seen as irrelevent because it's dealing with a different format than what you are developing, but really it's not that far off. One strategy that works a large deal of the time against the computer enemy is to send a sizeable force into the base to mess things up a bit while keeping a large force just in front of your base and creating choke points and deathtraps. After your initial assault has subsided, the computer often launches a massive counterattack on your base, because it makes the assumption that you have just lost nearly all your troops in that failed assault. However these assumptions are all designed directly into the system on an action-reaction type basis.
These reactions and strategies from the computer are expected and usually countered and exploited rather easily. Sometimes you even see your own strategies being used by the computer.
That's the basic idea I've noticed behind AI on RTS games, which at one point was all I played for quite some time.

But now what happens when the computer takes your strategy, notices a flaw or two and takes measures to correct that flaw, and then launches a similar strategy against you? At first you may think you know how to stop it, but then you notice the changes and find yourself hardpressed to find an effective solution. Oftentimes this element of surprise is what leads to victory or defeat for either side. Surprise.....something new....unexpected....unknown even. A person using an unexpected strategy against his opponent instills a sense of fear, makes the opponent edgy and nervous, and nervous people don't think as they normally would. I think that first you'll have to be able to mimick this, and then recreate it somehow.

Get the AI to play mindgames with the players. One trick after another that is not part of an individual objectives, but is a series of steps leading to one ultimate goal.
So maybe, in a sense, all you eally need to do is to make the players, THINK or maybe FEEL, like they're playing against a human. How can we do this? Manipulation and misdirection.
I've been typing this for about 45 minutes by now. So I think I'll take a break. I'll be happy to answer any questions you have about my long-winded response to your ideas. And maybe after a few hours I'll even have more to say.

Lastly, for now. I haven't actually done any type of programming but it has always been an interest for me. I'm very interested in this project, and I might be able to bring one or two friends into this,if you would have us of course. I recently started looking into various programming tools and texts, and over time I plan to teach myself a great deal about it. At the very least I have some ideas that you may be able to use, and I'm willing to work as a tester if need be.
I mentioned earlier that I just recently put together a project of my own. I won't go into any detail about it here other than saying that I plan to go farther, faster, longer, and better than any game has ever gone to date. Small steps of course.

So with all that having been said, perhaps we can make an arrangement of sorts? Hell even if you don't want an arrangement I'll still provide feedback just to see how this thing turns out.

Share this post


Link to post
Share on other sites
Quak    206
I agree with SpiritualTempest that emotions are the key to a more believable AI. It might be a good idea to give your agent a precomputed character that of course may change over time. The agent should then be able to analyze the players actions and deside which one he may aply to itselfe and work together with its character and skills and which he think are bad actions of the player and learn from the players mistakes (which are relative to the agent's character)and decide how to do it a better way. For example the player plays in a rambo like fashion blowing up not only the bad guys but also civilians and the agent think of as a bad thing because of its sensible character he may try a "sam fisher style" or something and if their are civilians in the way he may not fire but quickly search for cover and risk to be shot or something.
These are just a few thought after I've read this interesting post. I am profesiionall but you might find my ideas useful in some way :)

Share this post


Link to post
Share on other sites
hatch22    178
SpiritualTempest, thank you for the thoughtful reply. I see you are a new member. Welcome aboard. You have raised a number of valid concerns, and I will try to address each of them to your satisfaction. I fear this may be another long post.
Quote:
Original post by SpiritualTempest
First thing i see wrong is how you are using an action leading to award setup. At first this might seem like a good idea, and in some respects it is, but what if there isn't always a reward? What would happen if there was a goal. but that goal was without an immediate reward? Therefore I think the system should be set up to work towards a goal rather than a reward.
You have made the exact same misinterpretation that I made when I first started studying Reinforcement Learning(RL). The beauty of RL is that if it is properly set up, it is goal oriented. What you have described is what is called a greedy policy: a plan that always selects an action based on greatest immediate reward. In RL, however, choices can be made not on immediate reward, but on percieved future reward. The probability of reaching some future reward based on a given action-state pair is called the value of that action-state pair.

When a RL system reaches some goal condition, it recieves a reward for achieving the goal. The value of the previous state-action pair that lead to the goal is then increased by some amount. Then the state-action pair that lead to that state-action pair is increased by a lesser amount, and so on back through the path of state-action pairs that the AI chose until the amount that the value is updated is close to zero, where the process stops. In this way, state-action pairs that lead to good rewards down the road will have higher value, even if they themselves don't yield any reward (or even a negative reward or punishment). This allows the AI to discover longer term plans that yield better rewards in the long run than a tempting short term goal that leads to little or no reward in the future. See this description from Reinforcement Learning: An Introduction.

Now on to the human behavior:
Quote:
Original post by SpiritualTempest
One of the most dominating aspects of human behavior, at least as i see it, is emotion. Even playing games gainst a computer opponent even I tend to get excited at certain points during the game. People get angry or worried depending on the actions of their opponent and act in a certain way because of it. Playing games against other people, especially someone from some other part of the world through the internet, constantly changing emotions shape and affect how a person reacts to certain stimuli. Computers don't feel, and they don;t react with emotion.
Believe me, I have not forgotten such an important topic as emotional response (of the agent and the player). This can be achieved in a number of ways. I have not completely decided on the exact methods I will use, but two methods look promising.

First, emotion can be simulated by immitation. If the agent observes the player often runs away in a panic in the face of overwhelming odds, it will do likewise. If at other times the player is angry at being ambushed, he may charge right in. The AI will observe this and immitate, choosing whether to fight or flee on the basis of probability, and paying special attention to exactly what state the player was in when making such drastically different choices (with differint long term rewards and punishments). Both decisions may lead to reward or lack of punishment (e.g. survival vs. killing lots of enemies). This is what the ANN is for. If a slight difference in state (such as player health level) makes a large differnce in behavior, the RL system will adjust the ANN's weights so that when facing overwhelming odds, the small difference (such as health) will matter more in the decision making process as to whether the AI fights to the death or flees. Mimicking behavior can often convey an emotion.

Another way to handle less obvious emotions than panic or rage is to apply emotional states as part of the factors involved in decision making. Emotional states can limit how likely certain actions are to be performed in a given physical state. For more details and a lot of other helpful information see Programming Believable Characters for Computer Games and the GDC 2002 presentation on Halo AI.

There are other approaches as well that I am looking into.
Quote:
Original post by SpiritualTempest
Another important aspect I see, is how do you make it so that the computer AI isn't too good to play against. This may actually be the simpler part when compared to my previous example. I remember playing a copy of Counterstrike Condition Zero recently. One of the reasons....actually the main reason I stopped playing was that I became frustrated with the fact that the computer had a reaction time of what seemed like half a second or less, and I got nailed before i even passed the edge of the wall. So there has to be limiting factors to what the computer can see and delays to the reaction time as well. An enemy that can hit you form just inside the extreme limits of vision is not a good thing.
You are absolutely right. This is why noise is introduced into both the input and output. The agent is unable to perceive its environment perfectly, and aim and other factors are influenced by controlled random error. Also, the speed of reaction is regulated to be no faster than a human. If a mouse cursor won't move any faster, than the agent won't be able to aim any faster either. Between noise and speed restrictions, perfect reactions can be avoided.

Also, the choices that the agent makes will not always be the choice with the highest value (note I said value, not reward). The probability of making a choice will mostly depend on how effective that choice was in the past, but it can also be changed by an extra factor called the experimentation facter. This allows for other actions to be tried out, and it also allows for mistakes. If the experimentation factor is made to be flexible as learning progresses, the AI can avoid making extremely stupid decisions while still being given the opportunity to experiment and learn from trial and error as well as the human ally. See my link to Reinforcement Learning: An Introduction referenced above for more information.
Quote:
Original post by SpiritualTempest
One strategy that works a large deal of the time against the computer enemy is to send a sizeable force into the base to mess things up a bit while keeping a large force just in front of your base and creating choke points and deathtraps. After your initial assault has subsided, the computer often launches a massive counterattack on your base, because it makes the assumption that you have just lost nearly all your troops in that failed assault. However these assumptions are all designed directly into the system on an action-reaction type basis.
These reactions and strategies from the computer are expected and usually countered and exploited rather easily. Sometimes you even see your own strategies being used by the computer.
This kind of predictable behavior is avoided because the RL system is constantly learning from experience. If a strategy doesn't work, the value of that state-action path is reduced and a more effective strategy's value is raised.
Quote:
Original post by SpiritualTempest
But now what happens when the computer takes your strategy, notices a flaw or two and takes measures to correct that flaw, and then launches a similar strategy against you? At first you may think you know how to stop it, but then you notice the changes and find yourself hardpressed to find an effective solution. Oftentimes this element of surprise is what leads to victory or defeat for either side. Surprise.....something new....unexpected....unknown even. A person using an unexpected strategy against his opponent instills a sense of fear, makes the opponent edgy and nervous, and nervous people don't think as they normally would.
First, I am opperating under the assumption that for any given situation, there is a finite number of good overall strategies that sufficiently address that situation. No matter how surprising, some strategies are just plain stupid. I'm not talking about strategies based on emotional response. If you can intimidate your opponent into a route with a mad charge, the a strategy that might have gotten you killed (e.g. if your oppenents units stood there ground and mowed down your charge) was instead effective. I'm saying that one of the strategies that might occur to a learning AI is "I think I'll tell my units to amble about aimlessly during an ambush." This would lead to a large negative reward, discourageing similar behavior in the future.
Quote:
Original post by SpiritualTempest
I think that first you'll have to be able to mimick this, and then recreate it somehow.
Exactly. The agent can learn from the human's behavior, including some of the human's more surprising but effective strategies. With enough observations of different strategies used in differnt conditions and situations, and a way to evaluate their effectiveness(reward), the agent can develop a large enough pool of effective state-actions paths (strategies) that an opponent won't know which proven strategy the agent will choose for a given situation. As long as the agent has observed or experienced more than one good way to handle a situation, it can surpise opponents by being unpredictable but still intelligent. Also, thanks to the ANN, it can generalize from past experience when it faces a situation it has never seen before based on similar situations it has experienced in the past. You don't know which strategy it's going to pull on you, and its doubtful you can handle every possibility with a given counter strategy. Even if you could, that would give the AI incentive to adapt and try something else in the future.
Quote:
Original post by SpiritualTempest
So maybe, in a sense, all you eally need to do is to make the players, THINK or maybe FEEL, like they're playing against a human. How can we do this? Manipulation and misdirection.
As for manipulation and misdirection (have you seen Swordfish?), this requires the ability to anticipate. This is possible, but it first requires that the AI be sufficiently experienced to avoid making extremely stupid decisions. Keep in mind that the human player in my system starts off as the AI's ally and teacher. Once its human guide feels the agent is able to do well enough on its own, it is considered trained. At this point, the AI would no longer learn by directly monitoring the behavior of the human player. This does not mean that the AI would stop learning, or stop learning from the player. Rather, the agent will use external information about the player that it can percieve (what gun the player has, where the player is in relation to me and other players or agents that I am aware of, what the player is doing, etc.) to monitor the players response to the agents own actions.

The agent will then be able to use its mind to predict the player's actions. It will think something to the affect of "If I was the player, knowing what I know about the player's state, what would I do if I saw myself performing the action I am about to perform?" However, this can get expensive processor wise, so I do not know if direct anticipation will be possible in real time. Even so, learning patterns of behavior from previous experience can help the agent choose actions that are effective against the player even though it is not trying to predict the player's exact behavior. It can think "Last time the player was doing that and I did this, I entered a state that had high value (found a path that lead to high reward), so maybe I should try it again?"
Quote:
Original post by SpiritualTempest
Lastly, for now. I haven't actually done any type of programming but it has always been an interest for me. I'm very interested in this project, and I might be able to bring one or two friends into this,if you would have us of course. I recently started looking into various programming tools and texts, and over time I plan to teach myself a great deal about it. At the very least I have some ideas that you may be able to use, and I'm willing to work as a tester if need be.
Thank you for the offer of help. At this point, my friend and I are still working on the design document and overall structure of our project, so we don't need much help yet, but any ideas you have would be appreciated. The more input and suggestions we have, the more options we will have to select from when finalizing the design. When we actually get started on coding this, we could use the extra manpower as long as we feel the coding skills of you and your friends are up to the challenge. I'm not trying to slight your abilities, I'm just trying to be realistic.
Quote:
Original post by SpiritualTempest
I mentioned earlier that I just recently put together a project of my own. I won't go into any detail about it here other than saying that I plan to go farther, faster, longer, and better than any game has ever gone to date. Small steps of course.

If you would like to share the details of your project with us, give me an email. As for going farther, faster, longer, and better than any game has ever gone, that is the goal of every game programmer trying something new. Good luck, and as you say, take small steps.
Quote:
Original post by SpiritualTempest
So with all that having been said, perhaps we can make an arrangement of sorts? Hell even if you don't want an arrangement I'll still provide feedback just to see how this thing turns out.
At this point, we are not ready to work with anyone else until we get our design done. However, if you are interested in helping us out, or would like us to consider helping you out or doing a joint project, give me an email. In my experience projects done over the web are a little tricky, but still do-able. Feedback and testing is always appreciated.

[Edited by - hatch22 on October 16, 2004 12:37:47 PM]

Share this post


Link to post
Share on other sites
hatch22    178
Quak, you replied to my post as I was busy replying to Spiritual Tempest. I appreciate your suggestion. That is indeed the kind of emotional system I have in mind. Programming Believable Characters for Computer Games has an entire chapter devoted to emotional systems. As I said above, I haven't completely decided which model to adopt, but I am definitely looking into it. I feel as you both do that emotional characters are critical to believable ane fun AI.

Any more thoughts?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
First, I just have to say that the bit about killer Counter Strike AI cracked me up, because I felt the same way about those guys playing night and day in internet bars in Beijing. I swear they can see through walls... Not to mention I can't understand all their voice chat. [wink]

Anyway, the path-finding AI is a pretty good idea, as well as the goal-learning bit. I'm not sure about a frame-by-frame state/decision AI though. I can't see what kind of useful state information you'd feed the AI each frame to let it choose a sensible action, or to stick to a single path instead of wavering indecisively. It may be a better idea to make things more abstract, by recording the player's action over time, and programmatically interpretting them into state and behaviour data you can feed into the AI. Such as, "surrounded by enemies", "run for health." Then the frame-by-frame actions would be simple interpolations based on what behaviour the AI has chosen. Also, this way, you don't have to run things through the AI so often. Maybe you can tell me why I'm wrong though... I haven't read your RL reference yet, though I plan to.

The reason this idea interests me is I used to have this multiplayer FPS game, a mod to Duke Nukem that I made years ago. We used to play it for four hours straight (it was that awesome, but not all my architecture) and then sit and watch the replays for another 4 hours. I really got a good insight into a player's behaviour in that type of game. The thing I found most interesting is that we'd often end up on auto-pilot. Just running around until we saw someone to shoot at. Pass the stairwell, hop through the window and get the super health. Pass by the book room, run in and get the pistol ammo (even if we already had full pistol ammo!). It would be the easiest thing for an AI to immitate that kind of behaviour, and then just break down into the experimental strategies when you open the cafeteria door and find your opponent there laying tripwires!

This leads me into my point that, although emotion is an important part of the gameplay in these games, I'm not sure it necessarily shows through in your playstyle or strategy. However, what Quak described seems more like personality, than momentary emotion. For this, I had a solution worked out. It was designed to work in a more relaxed environment, a police detective game, but could be adapted for anything. It used a system of desires or needs, similar to The Sims, but was based on the real psychological indicators of the MMPI psych test. The idea was you could feed in the results of this test (or just tweaked scale values) and out pops an AI of the appropriate behaviour. Unfortunately I had to take the extremely detailed page I had about this off the web, but if you're interested, I could post more here, including source. It would have to be modified for a straight FPS though.

Share this post


Link to post
Share on other sites
hatch22    178
Tubular, I'm glad you brought up the speed of execution issue.

I briefly mentioned this before, but I didn't go into much detail (don't worry, I'm not going to write another book of a post). Considering how much content my original post contained I'm not at all surprised if people didn't even see it.

I originally said "Iterations will also be staggered across frames so that the entire system does not run every frame and kill the frame rate." What this means is that the different parts of the AI (high level and low level, player monitoring and agent monitoring) will never execute in the same frame unless the computer can clearly handle that much load without a problem. What I didn't make clear was the rate of decision making. The state of the player and the agent is monitored every frame (or perhaps every few frames), but new decisions are made only when the state changes due to a change in perceptions. So decisions are not being made every frame. Also, the high level strategic AI makes decisions far less often than the low level tactical/reflex AI, so the amount of AI code running every frame is less than it may have first appeared. For some frames, there may not be any AI decision code running. After all, how many people do you know that can make decisions 60 times a second or faster? The strategic AI basically sets a movement goal for the tactical AI to achieve, and rewards it when it does. Unless a change in the environment prompts the strategic AI to change its goal, it won't run again until the goal is reached.

As to whether a system of desires and needs counts as emotions or not, it all depends on your definition of emotion (and there are many). The Creating Believeable Characters book I've mentioned does examine a desires-needs system as one of the ways a realistic emoting response can be achieved. Using the MMPI psych as the basis for such a system is an interesting idea though. If you want to post more on the topic, feel free, or if you would rather not you can email me the info. Thanks for mentioning it.

Also let me say that while the testbed for this system is an FPS, there is no reason why this kind of an AI system could not be adapted to a variety of genres. I hope to make it flexible enough to use it in different types of games without too much work. Its self adaptability certainly helps with such a task. Just change the kinds of inputs it recieves and the kinds of actions it can take to suit the new genre.

Share this post


Link to post
Share on other sites
Quak    206
That's true Tabular, I descriped the personality of a agent.
But momentory emotions are also based a great deal on personality and the pro of having such a precomputed personality is that the agent cannot select the best of the players tactics and behaviours and just copy them because it is an individual. Hence he has to analyse and adapt them in such a way that they work together with its personality. This way the agent would act more human because humans also have limits set by their personality. Think about yourself, you just can't adept any style or behaviour or whatever from another person because you are different and thus have to do things in a different way. Of course you may not need such a complex system for a simple FPS bot, but for important characters in a game this would be a good idea I think. The example I made in my last post was a rather bad one so it might not have made my point clear, so I hope this post does :)

(and excuse me for my mistakes in gramma and spelling, english is not my native language but I'm working on it :)

Share this post


Link to post
Share on other sites
Alright. First off, thanks for the kind words Hatch. I understand clearly your points about extra help and wanting to have the necessary skills.

I'm not sure how to use other peoples quotes, so I'll try to keep this as orderly as possible, following the order of concerns in your last post.

I see your point about my misinterpretation of the RL system. With your explanation it seems very effective.
I'll have to think a little more about your response to emotion. But I'm also wondering if I could have misunderstood my own ideas. Perhaps instead of mimicking EMOTION, you should instead focus on mimicking BEHAVIOR. By having the computer basically ask itself "what should I do in this situation?" and basing the decision off of learned behavior, you can focus on the overall reaction to the situation and you can add the limitation of emotions to the already existing reaction sets.

Noise is one factor that i feel needs to be introduced. However, you didn't mention anything about sight being a factor. I think you already realize how important line of sight and distance can be, I just wanted to point it out for everybody else to acknowledge as a factor.
You have to take into consideration also that the speed of the mouse can differ greatly depending on the individual player. So the limitation based off of the speed of a mouse can only provide for the minimum and maximum turning and targeting speeds, but then you'll still have to account for variation in speed based on current surroundings and the current situation.

Now I do believe you have made a slight misconception on my meaning of misdirection and manipulation. By the way, Swordfish is one of my favorite movies. All it really takes for manipulation and misdirection to work, lies not the ability to anticipate what will happen, but in the ability to determine several possibilities of what MAY HAPPEN in response to the first part of a series of actions.
Supposing a plan has eight steps. Treat the term "steps" as meaning "a small plan or action leading towards a single collective goal". Say there are four possibile outcomes for each step. Each step works independently of the others, but is also connected with the others based on the outcome. Using a system like this, there doesn't have to be a 1,2,3,4 order to the steps in the plan, but allows there to be a random order based on each individual outcome in a series. This gives the ability to predict what may happen and to plan an appropiate response for each outcome. So the system would then be focused on learning the possible outcomes to each particular situation.

Anyways, sometime within the next week, I'll definitely find the time to feed you an email about my own project.

Also later on I'll be likely to respond to other's posts on this topic to give more of my own opinion and ideas.

Share this post


Link to post
Share on other sites
hatch22    178
ST: It appears we have had yet another (perfectly reasonable) misunderstanding. When I said noise, I was not referring to sound (as compared to sight), but rather to the more mathematical use of the word noise, meaning a random and persistent disturbance that obscures or reduces the clarity of a signal. Basically the farther out you go, the more the precision of ALL your senses degrades (including sight). I appologize for momentarily slipping into math and physics lingo. I am an engineer, and despite my training in avoiding jargon unless it is accompanied by sufficient explaination, I do make mistakes. My bad.

The mouse speed issue actually occurred to me after finishing my last post. It is a difficult problem to tackle because different players prefer different mouse sensitivities. Suffice it to say that the agent will not be able to aim any faster than would be possible with the mouse sensitivity turned up all the way. This would at least give it an upper limit. The use of a noise signal applied to the aiming direction (so that it shakes slightly) should avoid perfect aim at long range as well. More complex solutions may become apparent later.

I did indeed misinterpret what you meant by misdirection and manipulation, and your clarification is noted. However, I don't think I was too far off the mark, since I demonstrated that the agent is unable to do anything more than try to predict what might happen. Prediction with certainty is quite impossible with incomplete knowledge of the environment. Thankfully it is in situations with incomplete knowledge of the nature of environment that RL performs better than other learning algorithms.

Also, for a really good explaination and demonstration of reinforcement learning that I just found, go here. The cat and mouse java applet is fun to play with. If anyone is interested, my particular approach to reinforcement learning is a Sarsa algorithm using a softmax policy. If you don't understand what I just said, check out that site.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this