OpenAI Research Overview
Category: Technical
<!-- gdoc-inlined -->
By Jeremy Nixon [[email protected]]. Nov 2017.
Categories: Domain in which the paper’s innovation is novel.
-
Reinforcement Learning
- Multi-Agent
- Exploration
- Imitation Learning
-
Deep Learning
-
Memory
-
Program Learning
-
Representation Learning
-
Variational Inference
-
Generative Models
-
Evolution
-
Applications
- Security / Safety
- Robotics
-
Environments
-
Reinforcement Learning
- Multi-Agent
- Learning with Opponent-Learning Awareness
- https://arxiv.org/abs/1709.04326
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- https://arxiv.org/abs/1706.02275
- Emergence of Grounded Compositional Language in Multi-Agent Populations
- https://arxiv.org/abs/1703.04908
- Learning with Opponent-Learning Awareness
- Exploration
- Parameter Space Noise for Exploration
- https://arxiv.org/abs/1706.01905
- UCB and InfoGain Exploration via Q-Ensembles
- https://arxiv.org/abs/1706.01502
- Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
- https://arxiv.org/abs/1611.04717
- VIME: Variational Information Maximizing Exploration
- https://arxiv.org/abs/1605.09674
- Parameter Space Noise for Exploration
- Imitation Learning
- Third-Person Imitation Learning
- https://arxiv.org/abs/1703.01703
- One-Shot Imitation Learning
- https://arxiv.org/abs/1703.07326
- Third-Person Imitation Learning
- RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
- https://arxiv.org/abs/1611.02779
- Teacher-Student Curriculum Learning
- https://arxiv.org/abs/1707.00183
- Equivalence Between Policy Gradients and Soft Q-Learning
- https://arxiv.org/abs/1704.06440
- Prediction and Control with Temporal Segment Models
- https://arxiv.org/abs/1703.04070
- Multi-Agent
-
Deep Learning
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
- https://arxiv.org/abs/1602.07868
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
-
Memory
- Hindsight Experience Replay [Also, Reinforcement Learning]
- https://arxiv.org/pdf/1707.01495.pdf
- Hindsight Experience Replay [Also, Reinforcement Learning]
-
Program Learning
- Extensions and Limitations of the Neural GPU
- https://arxiv.org/abs/1611.00736
- Extensions and Limitations of the Neural GPU
-
Representation Learning
- Variational Lossy Autoencoder
- https://arxiv.org/abs/1611.02731
- Variational Lossy Autoencoder
-
Variational Inference
- Improving Variational Inference with Inverse Autoregressive Flow
- https://arxiv.org/abs/1606.04934
- Improving Variational Inference with Inverse Autoregressive Flow
-
Generative Models
- Generative Adversarial Networks
- InfoGan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets [Also, Representation Learning]
- https://arxiv.org/abs/1606.03657
- Improved Techniques for Training GANs
- https://arxiv.org/abs/1606.03498
- InfoGan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets [Also, Representation Learning]
- On the Quantitative Analysis of Decoder-Based Generative Models
- https://arxiv.org/abs/1611.04273
- A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy Based Models [Also Reinforcement Learning]
- https://arxiv.org/pdf/1611.03852.pdf
- PixelCNN++: Improving the Pixel CNN with Discretized Logistic Mixture Likelihood and Other Modifications
- https://arxiv.org/abs/1701.05517
- Learning to Generate Reviews and Discovering Sentiment
- https://arxiv.org/abs/1704.01444
- Generative Adversarial Networks
-
Evolution
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning
- https://arxiv.org/abs/1703.03864
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning
-
Applications
- Security / Safety
- Deep Reinforcement Learning from Human Preferences
- https://arxiv.org/abs/1706.03741
- Concrete Problems in AI Safety
- https://arxiv.org/abs/1606.06565
- Adversarial Attacks on Neural Network Policies
- https://arxiv.org/abs/1702.02284
- Adversarial Training Methods for Semi-Supervised Text Classification
- https://arxiv.org/abs/1605.07725
- Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data
- https://arxiv.org/abs/1610.05755
- Debate Amplification
- https://arxiv.org/pdf/1805.00899.pdf
-
- Robotics
- Domain Randomization for Transferring Deep NEural Networks from Simulation to the Real World
- https://arxiv.org/abs/1703.06907
- Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
- https://arxiv.org/abs/1610.03518
- Deep Reinforcement Learning from Human Preferences
- Security / Safety
-
Environments
-
Infrastructure for Deep Learning
- https://blog.openai.com/infrastructure-for-deep-learning/
-
Universe
- https://blog.openai.com/universe/
-
OpenAI Gym
- https://arxiv.org/abs/1606.01540
OpenAI Researchers
- Paul Christiano
- Ryan Lowe
- Jean Harb
- Pieter Abbeel
- Igor Mordatch x
- Matthias Plappert
- Rein Houthooft x
- Prafulla Dhariwal
- Szymon Sidor
- Richard Y. Chen
- Xi Chen
- Marcin Andrychowicz x
- John Schulman
- Alec Radford
- Rafal Jozefowicz
- Yan Duan
- Bradly C. Stadie
- Jonathan Ho
- Jonas Schneider
- Ilya Sutskever
- Wojciech Zaremba
- Rachel Fong
- Josh Tobin
- Alex Ray
- Nikhil Mishra
- Ian Goodfellow
- Tim Salimans
- Diederik P. Kingma
- Andrej Karpathy
- Yuri Burda
- Zain Shah
- Trevor Blackwell
- Vicki Cheung
Salaries of top employees [Pg. 28] Hours & Salaries of top employees [Pg. 7] OpenAI spent 11 million in 2016, 7 million on salary. For comparison, Deepmind spend 138 million in 2016.
Source: Original Google Doc