# Quiz: Reinforcement Learning¶

For the following action-selection method, indicate which option describes it best.

With probability p, select argmax a Q(s,a). With probability 1 − p, select a random action. Let p = 0.99.

For the following action-selection method, indicate which option describes it best.

Select action a with probability

where τ is a temperature parameter that is decreased over time.

For the following action-selection method, indicate which option describes it best.

Always select a random action.

What model would be learned from the above observed episodes?

Note: T(s,a,s’) represents the transition probability from state s to state s’ under action a. Use a period as a decimal separator.

T(A, south, C) =

What model would be learned from the above observed episodes?

Note: T(s,a,s’) represents the transition probability from state s to state s’ under action a. Use a period as a decimal separator.

T(B, east, C) =

What model would be learned from the above observed episodes?

Note: T(s,a,s’) represents the transition probability from state s to state s’ under action a. Use a period as a decimal separator.

T(C, south, E) =

What model would be learned from the above observed episodes?

Note: T(s,a,s’) represents the transition probability from state s to state s’ under action a. Use a period as a decimal separator.

T(C, south, D) =

In the ε-greedy approach to action selection in reinforcement learning, which of the following values of ε makes the approach identical to a purely greedy approach?
Posting submission...