A group of McGill students have created a brilliant short video introducing the basic ideas behind Reinforcement Learning (RL) and one of the most popular RL algorithms known as Q-learning. Using a hypothetical bartender robot named Shaker, the video explains how an agent learns to act from interactions with his environment and a reward/punishment system.
At first, such agent thinks that all actions are equally good; this is a consequence of the fact that he has no prior experience that would allow him to make a proper action selection. As a result, the agent chooses actions at random for all situations (also known as states.) At the conclusion of each action, the agent receives either a reward or a punishment from the environment. The former denotes an action that was a good choice for the particular situation while a punishment denotes the opposite. Continuing in this fashion, over time, the agent learns which strategies, i.e., sequences of actions, help him maximize rewards or minimize punishment. The learned value function allows the agent to act rationally in all situations.
The simplest Reinforcement Learning algorithm is known as Q-Learning. It is a model-free method since a model of the world is not available to the agent a priori. The Q function that the agent learns interacting with his environment gives a value for each situation and action combination possible. Even though Q-Learning is a simple and yet powerful algorithm, it is not a practical one. The number of states and action combination that the Q-function must be learned for is often large if not infinite. An agent will often fail to explore all cases unless considerable amount of time is made available to him; by the time your average robot learns to act using this method, you and everyone else on Earth will most likely be living on Mars. There are many more powerful algorithms that researchers have invented over the years that tackle some of the above issues; I will discuss some of them in future posts. For now, remember that we still have a long way to go before we have robots that efficiently learn from experience but progress is continuing at a fast pace.
If you found the above textual introduction to Reinforcement Learning boring or difficult to follow, then the below video which is also the main point of this post might clear things up. Enjoy!