분류 전체보기 330

Markov Decision Process, State value function, Action value function, Optimal policy, Bellman equation

Markov Decision Process - Decision : Sequence of Actions.- S1 : It absorbed S0, a0 to indicate a1.- a1 : It is given by S1. If only S1 is given, a1 is determined regadless of S0, a0. https://youtu.be/DbbcaspZATg?si=KgUq5CdJKzHj9QOJ : Probability of what action to do in time t, state t. That is, distribution of what action to do in a particular state.The policy determines the action." data-ke-typ..

Q-Learning, Greedy action, Q-Value, Exploration, ϵ-greedy, epsilon-greedy, Exploitation, Discount factor, Q-update

Q-Learning It is the process of finding famous restaurants by moving up, down, left, and right from the starting point on this map. In other words, it is a process that induces the fastest way to find this restaurant. The process of learning Q-Value. Episode A sequence of interactions between an agent and its environment, starting from an initial state and ending at a terminal state. In other wo..