ITP-NYU :: 4/28/2016
Game AI and deep reinforcement learning
- The whole class "in 10 minutes" (14:33)
- Autoencoders (43:03)
- Generative adversarial networks (53:58)
- Game AI + reinforcement learning (1:04:59)
- Convnets mastering atari games (1:14:51)
- States, actions, and rewards, Q-learning (1:28:52)
- Super mario craziness, computer tic-tac-toe (1:40:48)
- Computer chess: how DeepBlue works (1:49:59)
- Computer go: how AlphaGo works (2:04:05)
Class notes
News / admin
- ML amplifies privilege
- OpenAI reinforcement learning gym
- incredible style transfer video with temporal loss term
- alt-AI exhibition
Review: the whole class "in 10 minutes"
- Esepcailly review convnets and applications, and RNNs
- repetition = mental backpropagation
Generative Models
- Autoencoders
- Generative adversarial networks
Video game AI + reinforcement learning
- Reinforcement learning
- Markov decision process + {States, Actions, Rewards}
- Q-Learning in video games
- goal: function which advises an action given a state
- Convnets playing Atari games (DeepMind)
- challenges
- time matters
- reward can be delayed
- exploration vs. exploitation
- multi-armed bandits
- discount factor
- Applications
- maneuver a body (robotics)
- manage something (investment portfolio, power stations, industrial applications)
- play games :)
- arms race? AI vs impossible super mario levels?
How AlphaGo works
- Tic-Tac-Toe
- parsing game trees
- Chess
- Monte Carlo Tree Search
- handcrafted value function
- DeepBlue vs. Gary Kasparov
- Go
- AlphaGo Nature paper
- policy + value convnets
- self-play to train value network
- MCTS + policy + value
- AlphaGo vs. Lee Se-dol
- It gets waaaay harder (like Doom)