2016

Game AI and deep reinforcement learning

Reinforcement learning
- Markov decision process + {States, Actions, Rewards}
- Q-Learning in video games
  - goal: function which advises an action given a state
  - Convnets playing Atari games (DeepMind)
challenges
- time matters
- reward can be delayed
- exploration vs. exploitation
  - multi-armed bandits
  - discount factor
Applications
- maneuver a body (robotics)
- manage something (investment portfolio, power stations, industrial applications)
- play games :)
  - arms race? AI vs impossible super mario levels?

Tic-Tac-Toe
- parsing game trees
Chess
- Monte Carlo Tree Search
- handcrafted value function
- DeepBlue vs. Gary Kasparov
Go
- AlphaGo Nature paper
- policy + value convnets
- self-play to train value network
- MCTS + policy + value
- AlphaGo vs. Lee Se-dol
It gets waaaay harder (like Doom)