This video illustrates the improvement in the performance of DQN over training (i.e. after 100, 200, 400 and 600 episodes). After 600 episodes DQN finds and exploits the optimal strategy in this game, which is to make a tunnel around the side, and then allow the ball to hit blocks by bouncing behind the wall. Note: the score is displayed at the top left of the screen (maximum for clearing one screen is 448 points), number of lives remaining is shown in the middle (starting with 5 lives), and the “1” on the top right indicates this is a 1-player game.
source
Previous ArticleUsing Semantic Trees In Place of Sentences | Munashe Shumba | OpenAI Scholars Demo Day 2018
Next Article Kevin Systrom: Instagram | Lex Fridman Podcast #243