OpenAI's Dota robot

During the weekend, there was much hype on the internet about OpenAI's latest research project: a "bot" that can play the popular Dota 2 computer game. I quote from OpenAI's announcement:

We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules. The bot learned the game from scratch by self-play, and does not use imitation learning or tree search.

The hype, and misunderstanding of the nature of Dota, is perhaps best exemplified by Elon Musk's own tweet announcement (keep in mind here that Musk is one of the founders of OpenAI):

OpenAI first ever to defeat world's best players in competitive eSports. Vastly more complex than traditional board games like chess & Go.

Another example is an article from the Verge headlined:

The world’s best Dota 2 players just got destroyed by a killer AI from Elon Musk’s startup

This would be a revolutionary step forward in artificial intelligence (AI) if it was true. Naturally, the announcement was promptly followed by speculation of strong AI and the imminent arrival of our robot overlords, sublimely underscored by the bot's inclination to play the Shadow fiend character -- an evil spirit imprisoning the human souls that it consumes.

Should we be afraid? The short answer is 'no', because within a day, the bot had been beaten more than 50 times by hobby players picked up from the audience of the ongoing Dota championships. The truth is that bot shows impressive technical skills, but it was easily fooled by human players playing mind games with it. Needless to say, this was not reported by media.

I think that some background is necessary to put OpenAI's achievement in the right context. During the weekend, the yearly Dota 2 World Championships was held in Seattle. Dota 2 is highly competitive computer game played in 2 teams of 5 people each. This event may pale in media coverage compared to e.g. the concurrent IAAF World Championships in London, but the prize pool is close to 25 million dollars, with a 11 million dollar windfall to the winning team, showing that professional Dota is a serious business. It is in the same league as a winning a Grand Slam Tennis championship such as the Wimbledon (total prize money of 32 million pounds) and the top professional players can make a good living out of playing Dota 2. This by itself motivates some research into Dota bots, because they could be used as sparring partners for training purposes.

What caused a stir (besides Team Liquid's crushing victory against Team Newbee in the grand finals) was OpenAI's announcement at the main stage of their latest research project: a Dota bot trained using state of the art machine learning techniques. The aim of the project being to surpass human players in skill level. Dota is consider a quite complex game that requires both long-term planning, creativity and dealing with unknown states (to a much larger extent than chess and Go) so this represents in many ways the next frontier in AI research. The news that OpenAI's bot supposedly could beat professional players understandably generated a lot of hype and hopes of a breakthrough discovery in machine learning. Unfortunately, it does not seem to be case. Below, I explain why.

Firstly, the experimental setup that was used by OpenAI when pitting the bot against human players was not representative of the real game of Dota. The most important difference being that it plays 1 vs. 1 against a human, not 5 vs. 5 in a team. For Dota players it can be described in game lingo as "1v1 Shadow fiend mid", which is usually considered a test of dexterity and mechanical skills. For people not familiar with the game of Dota, I would explain it as comparing penalty kicks with the full game of soccer. There is simply no comparison in complexity. A game of Dota is traditionally divided in four stages: pre-game drafting of player characters (think of it as a card game inside the main game), initial game (laning phase), middle game, and late game. Pure 1v1 playing is only relevant in the initial phase of the game. The rest of the game is focused on cooperative team play, something which the bot cannot do yet.

Secondly, it appears that the bot did not really discover game techniques such as last hitting, fake attacks and blocking completely by itself during the training. My understanding is that they used reinforcement learning, which means that you have to define some desired behaviors (state transitions) and the reward functions in advance, or in other wards, the bot was explicitly rewarded during training for actions such as creep blocking. OpenAI has not published any paper yet describing the details of the training, so we will not known exactly until then, but my impression from the developers and second hand sources is that they indirectly acknowledged that some of the behaviors were hard-coded.

Thirdly, the bot is not playing versus us on a level playing field. It accesses game information through the game API, which gives access to more a detailed and exact state than a human player would be able to get from just observing the pixels on the monitor. If the bot would have to do image analysis in real-time of the frame buffer (which I am sure it could do in practice) and input game commands with the same reaction time and precision as a human it would likely not fare as well. It would make tiny mistakes here and there and have to work with some error margin. Currently, it moves its virtual "mouse" with pixel perfect accuracy and reacts "instantly" to any change in the game state according to Dendi (the player who played against the bot on stage).

In summary, you could say that the bot learned perfect mechanical skills (or what I would call unconscious Dota skills) through established machine learning methods, presumably involving neural networks. This is an interesting result, but it does not currently advance the state of the art very much, neither in machine learning itself nor in the applied case of making Dota bots. It was already known that you can program bots to be very strong in the initial laning phase through traditionally rule-based programming (for example, the infamous built-in Viper bot in Dota 2). The fact that it can be replicated with neural nets bodes well for further applications to higher-level game concepts which we would think of as naturally rule based, such as player positioning in team fights or item builds. The challenge, as I see it, is the magnitude of training that would be required to optimize a multi-player scenario. If it took 3 weeks of full-time training to achieve professional level 1v1 games in a very restricted scenario, I would imagine that full 5v5 games with different player characters would require order of magnitudes more training time, which probably makes it insurmountable unless they come up with more refined machine learning methods. I suppose that it was the research project is really about.

Obs!Future

Hard-hitting divination

OpenAI's Dota robot

End of Moore's law: business implications

Facebook as the fabric of reality