Reinforcement learning (RL) is a field in artificial intelligence that is concerned with the problem of enabling agents to learn new skills through their interaction with the environment. A reinforcement learning agent continuously tracks the state of its environment and chooses actions in order to maximize the total reward received over long periods of time. Rewards can be positive or negative and represent feedback that the agent uses in order to assess the value of his actions in order to construct a strategy, i.e., compute a sequence of actions, for completing a specific task.
This type of learning is common in people and animals. Using the proper reinforcement signal we can train a puppy to do tricks for us; this is the method a dog trainer uses. He rewards his dog with a tasty cookie if the dog follows his command and performs a specific trick. The dog, who wants to eat as many cookies as possible (maximize total future reward), quickly learns that he can achieve this goal by performing the trick when asked. In a similar way, people learn a lot during their entire life using reinforcement learning. You only need to burn yourself once to realize that trying to touch fire with your bare hands is the wrong thing to do.
Reinforcement learning is different than supervised learning and learning by demonstration, both popular machine learning methods. In supervised learning, an external agent, the supervisor, supplies the learner with a set of examples of the correct actions for a number of different situations (or states). The agent uses these examples to infer the proper strategy for any situation it may encounter in the future. In learning by example, a teacher shows the learning agent what the proper sequence of actions is for a specific task. The agent records this sequence and repeats it if faced with the same situation again in the future. Both supervised learning and learning by demonstration approaches have advantages and disadvantages and I will discuss them separately in the future.
Intelligent agents have not yet been able to take full advantage of reinforcement learning. It turns out that it is very difficult to specify a reward function that works well in most situations. Moreover, because agents are often equipped with noisy sensors, they can never know exactly what the true state of the world is. Instead, they can only maintain probability distributions, also known as beliefs, about what the true state is. This usually makes learning intractable or in other words nearly impossible to find optimal solutions even for problems of moderate size. In addition, current learning techniques require too much data in order to find an adequate solution. The ability people and animals have to learn quickly from a very small (and often just one) number of examples is still eluding intelligent machines.
If you are interested in reading more about reinforcement learning in artificial intelligence, then I strongly suggest the book by Dr. Richard Sutton and Dr. Andrew Barto aptly titled, “Reinforcement Learning: An Introduction.” You can buy a copy of the book if you want but you can also read it online from Dr. Sutton’s webpage here. If you are looking for just a quick introduction to RL, then you might want to start with the RL FAQ that is also hosted by Dr. Sutton here.