Advertisement Jump to content
azhar_r

R&D Exploration vs exploitation problem

Recommended Posts

I have the following question and wanted to know if my answer is more or less correct (makes sense):

 

Quote

Suppose a player can choose between five actions in all states of a game. And assume that the player has executed each action a different number of times in state 27, noting how valuable each action is in terms of the utility of the states reached after each action. Explain how the player should choose which action to execute next time s/he reaches state 27. Demonstrate your understanding of the exploration versus exploitation dilemma in your answer [2 marks]

My answer:

Quote

Upon reaching state 27, the player will already know the utility values of each action thus there won't be any need to explore any other actions. Therefore, the player can choose the action (exploitation) with the utility value that will return the highest reward.

Does this answer make sense? If not, what needs to be added or changed?

Share this post


Link to post
Share on other sites
Advertisement
10 hours ago, azhar_r said:

Does this answer make sense? If not, what needs to be added or changed?

Your answer seems logical at first glance, but also almost too straight forward to be worth 2 points. I don't know the material that you're covering at all (like zero), but I don't get a sense of a clear understanding of the exploitation vs. exploration dilemma that was mandated in the question.

Assuming that the question has been worded carefully and that state 27 is an arbitrary thing... I'd say that the player, upon returning to state 27, may or may not have taken the same actions as before. Thus, if the value of their current situation is better than before, they may be inclined to explore, as they are already doing well enough, and there might be better ways to increase value. If the current situation is worse than the previous time, they would most likely take an exploitative action to try and catch up to their previous value, as exploring gives unknown results.

I tend to over analyze, but to understand the "dilemma", your answer would have to give credence to either choice... otherwise, there is no dilemma. Your answer doesn't address any dilemma whatsoever, thus you must describe a situation to that favors exploration as well; 1 point for exploitation and 1 point for exploration.

My biggest pet peeve with tests is when a teacher glosses over the influence certain words can have over the direction of a question. Every word in a question must have purpose and many teachers fail their own tests with the quality of the writing in their questions.

I hope I'm not doing your homework for you, azhar_r, but I found this question too interesting to resist. 😉

 

Share this post


Link to post
Share on other sites

Thank you for your answer. Since it was already stated in the question that all five actions have been carried out before and the utility values are known, I thought the player would only exploit a particular action with the highest utility value.

But I understand your point about coming back to state 27 via other actions with "better" utility values which would then affect the actions being taken from state 27.

Thank you for your response. And no, this is directly from a test which unfortunately I don't have the memo to cross-check my answers.

Share this post


Link to post
Share on other sites

No, I don't think you understand the issue at all. The result of the action is somewhat random, and the utility is assigned not to the actions, but to the individual outcomes. In other words, every time you take a particular action some utility is observed, but this is only a sample from a random variable, whose distribution is not known.

You want to pick actions with high expected utility (exploitation), but you only have a noisy estimate of this expected utility, so over time you want to try every action enough times that you discover which action that is (exploration).

Further reading here: https://en.wikipedia.org/wiki/Multi-armed_bandit

 

Share this post


Link to post
Share on other sites
8 minutes ago, azhar_r said:

Since it was already stated in the question that all five actions have been carried out before and the utility values are known, I thought the player would only exploit a particular action with the highest utility value.

Yeah, obviously I don't know your course material, but I kind of read the question as a player can choose one of five things to do in each state and assumed that the 27th action would naturally be state 27. However, one part of the question bothered me a lot...

"And assume that the player has executed each action a different number of times in state 27,"

...and this wording sounds as if the player can do an infinite number of actions per state, which confuses me a bit because then a state is almost meaningless. And then right after, in the same exact sentence...

"noting how valuable each action is in terms of the utility of the states reached after each action."

...which arguably contradicts what was said prior in the sentence as each single action leads to a supposed new state. I absolutely detest poorly worded questions, especially by academic minds. They should not only know better, but do better. This question is a bit confusing when under scrutiny.

 

12 minutes ago, alvaro said:

Yeah, I agree with you, alvaro, but I think this question was intended to be more simplistic to achieve an undoubtedly concise, correct answer... or how would you have written the answer to the question for 2 marks?

Share this post


Link to post
Share on other sites

I think the question is perfectly clear, although I don't know the context in which it is being posed. The OP was asked to demonstrate his understanding of the exploration versus exploitation dilemma, and he demonstrated that he only understands the exploitation side.

 

Share this post


Link to post
Share on other sites
8 minutes ago, alvaro said:

I think the question is perfectly clear, although I don't know the context in which it is being posed. The OP was asked to demonstrate his understanding of the exploration versus exploitation dilemma, and he demonstrated that he only understands the exploitation side.

I don't mean to sound argumentative. I just like how you explained exploration as an unknown in the pursuit of discovering possible exploits (it makes sense) and was curious how your answer might differ from mine. I don't think there is any context to the question though. Just face value.

Also, the fact that you feel there might be additional context required kind of supports my position that the question is not worded very well. Curious, what part of the question do you feel needs context?

Share this post


Link to post
Share on other sites

I mean that I don't know where this question was found. See Lactose's question. I don't know what a "test paper" is. I don't usually encounter questions worth "2 marks" (whatever that is) in the wild.

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!