Skip to content

Latest commit

 

History

History
executable file
·
7 lines (5 loc) · 1.98 KB

reinforcement_learning.md

File metadata and controls

executable file
·
7 lines (5 loc) · 1.98 KB

REINFORCMENT LEARNING

RL is a paradigm for learning by trial-and-error inspired by the way humans learn new tasks. In a typical RL setup, an agent is tasked with observing its current state in a digital environment and taking actions that maximise accrual of a long-term reward it has been set. The agent receives feedback from the environment as a result of each action such that it knows whether the action promoted or hindered its progress. An RL agent must therefore balance the exploration of its environment to find optimal strategies of accruing reward with exploiting the best strategy it has found to achieve the desired goal. This approach was made popular by Google DeepMind in their work on Atari games and Go. An example of RL working in the real world is the task of optimising energy efficiency for cooling Google data centers. Here, an RL system achieved a 40% reduction in cooling costs. An important native advantage of using RL agents in environments that can be simulated (e.g. video games) is that training data can be generated in troves and at very low cost. This is in stark contrast to supervised deep learning tasks that often require training data that is expensive and difficult to procure from the real world.

Applications: Multiple agents learning in their own instance of an environment with a shared model or by interacting and learning from one another in the same environment, learning to navigate 3D environments like mazes or city streets for autonomous driving, inverse reinforcement learning to recapitulate observed behaviours by learning the goal of a task (e.g. learning to drive or endowing non-player video game characters with human-like behaviours). Principal Researchers: Pieter Abbeel (OpenAI), David Silver, Nando de Freitas, Raia Hadsell, Marc Bellemare (Google DeepMind), Carl Rasmussen (Cambridge), Rich Sutton (Alberta), John Shawe-Taylor (UCL) and others. Companies: Google DeepMind, Prowler.io, Osaro, MicroPSI, Maluuba/Microsoft, NVIDIA, Mobileye, OpenAI.