Machine Learning and the Future of Video Games

The rapid progress in deep reinforcement learning (RL) over the last few years holds the promise of fixing the shortcomings of computer opponents in video games and of unlocking entirely new regions in game design space. However, the exorbitant engineering effort and hardware investments required to train neural networks that master complex real-time strategy games might lead to the impression that the commercial viability of deep RL is still far in the future. To the contrary, I will argue in Part 1 of this essay that these techniques are imminently practical and may see widespread adoption within the next decade. Part 2 presents a case study in which I use deep RL to improve the design of a real-time strategy game. Finally, in Part 3, I speculate about the many ways in which machine learning will impact video games in the years to come.

r/gamedev (1 point, 9 comments)
r/futurology (9 points Wholesome, 0 comments)


The year 2019 saw an important milestone for large-scale RL systems with the defeat of top human players in Dota 2​1,2​ and StarCraft II​3​. These systems still have many limitations and there is an ongoing debate on how to make matches between computers and humans fair. Nonetheless, they serve as a compelling demonstration of the fact that we are now capable of creating AIs for complex video games that can beat the vast majority of human players. One apparent obstacle to the widespread adoption of these methods is their high computational cost, which has been estimated at tens of millions of dollars​4​. However, considering current trends in hardware cost, training efficiency, and the diminishing returns of scale, we can expect that the cost of comparable systems will decline by three orders of magnitude or more over the next decade.

Declining Hardware Cost

Machine learning is embarrassingly parallel and directly benefits from increasing transistor counts. While Moore’s law​5​ has started to slow down, several die shrinks are currently underway​6–10​ and we can expect costs to decline for at least a few more years​11,12​. On top of generic improvements to semiconductor manufacturing technology, there are currently several startups and established companies working on new chips specifically designed for neural networks​13​. While the success of these ventures remains to be determined, top-of-the-line NVIDIA GPUs are increasingly focused on accelerating machine learning applications. The most performant GPU available in 2019 was the V100, which has since been superseded by the 2x-3x faster A100​14–16​. Unless the rate at which GPU price per FLOPS is decreasing slows down dramatically, we will see at least an 8x improvement over the V100 baseline within the next three GPU generations​17​. In addition, both cloud providers and particularly NVIDIA currently extract substantial margins on enterprise-line GPUs, which exhibit much lower performance/$ than custom workstations based on consumer-grade GPUs​18​. Once a viable competitor to NVIDIA emerges, we may see costs decline further.

The CPUs running simulations can also amount to a substantial fraction of training costs (reported as 30% for OpenAI Five​1​). However, I contend that this is largely due to inefficient simulators that were not optimized to run many parallel game instances in headless mode at speeds much faster than real-time. Even without any further advances in hardware, teams that know how to write efficient code should be capable of driving this part of the cost down to a small fraction for most games with only a moderate increase in development effort.

Growing Training Efficiency

The training efficiency of machine learning methods is improving rapidly. For example, the compute required to reach the same performance as AlexNet on the ImageNet object recognition dataset has declined by 44x over a period of seven years​19​:

Total amount of compute in teraflops/s-days used to train to AlexNet level performance.​19​

The field of deep reinforcement learning is still relatively nascent and there are likely many untapped opportunities for optimization on both algorithmic and engineering fronts. Point in case, OpenAI Rerun achieved a 98% win rate over OpenAI Five with only 20% of the computational resources and no technical innovations​1​. Even if progress is much slower than for image recognition, an efficiency gain of 10x-100x within the decade is highly plausible. Some generally applicable recent results include mixed-precision training (~2x speedup)​20​, algorithmic improvements like Phasic Policy Gradient (2x higher sample efficiency on many problems)​21​, and the burgeoning field of imitation learning (several orders-of-magnitude speedups when additional data is available)​22–25​.

Diminishing Returns of Scale

The OpenAI Five and AlphaStar flagship projects both aimed to beat the very best human players on more-or-less even footing. While this is very neat, reaching such an extreme level of capability is entirely unnecessary for the vast majority of practical applications. Many games do not offer more than 100 hours of content, and the level of skill that most humans can reach within that amount of time is much more easily surpassed. The following diagram from the OpenAI paper is instructive and shows that agents reached a skill level comparable to that of fairly strong players after only 7% of training:

TrueSkill over the course of training for OpenAI Five.​1​

In addition, most games are less complex than even the restricted version of Dota 2 that OpenAI Five played on, and game developers have many options for compensating for deficiencies in subpar AIs through clever level design.

Expertise and Engineering Effort

Another obstacle to the adaption of deep RL techniques is the expertise and engineering effort required to build out and operate efficient training systems. This challenge is still substantial, but just as we do not have to keep reimplementing databases and physics engines, most of this engineering effort and know-how can and will be codified in reusable frameworks as the ecosystem matures. While much work remains to be done on this front, projects like Stable Baselines3​26​ and Unity ML-Agents​27​ have already made deep RL much more accessible and keep advancing in great strides. Since ML is a somewhat leaky abstraction, some expertise will still be required. However, much of the dark knowledge that goes into building AI systems can be distilled into simple heuristics and repeatable processes​28,29​ that aren’t any more difficult to master than many of the other highly advanced techniques routinely used by game developers to efficiently approximate reality.

A large majority of the engineering effort that currently goes into creating an RL system for a new game is incidental and caused by existing games not being designed for that purpose. While the development overhead from interfacing with deep RL frameworks will remain nonzero, it can be significantly reduced by architecting games from the beginning in a way that makes them amenable to machine learning. A practical discussion of the key consideration for creating video games amenable to modern machine learning techniques can be found in Appendix A.

Vision of a Shiny Future

The year is 2028. You are a developer at a small game studio which just invested into a 16 GPU server at a total cost of ownership of $150,000 over three years. The server is equivalent to 128 V100s in terms of raw FLOPS, and equivalent to 1024 V100s running inefficient algorithms from 2019. The new real-time strategy game you are developing has already been architected to take advantage of AI, and so it takes you only a few days to plug it into a powerful open-source training framework. Immediately, the AI discovers myriads of ways to crash your game and you spend several days fixing bugs. After a few more weeks of designing curricula, implementing domain randomizations, and tweaking hyperparameters, the resulting AI handily beats you and everyone else in the office and has discovered unanticipatedly powerful combinations of game mechanics. Fine-tuning the AI to adapt it to a modified version of the game takes less than an hour, allowing for rapid iteration on new ideas. When you launch, your game is virtually free of bugs and ships with hundreds of AIs at different difficulty levels. While the player base is too small for multiplayer to be viable, your AIs allow for hundreds of hours of engaging game-play across many different game modes. Data collected from human players allows the AI to become acquainted with artisanal cheese and to emulate and become robust to a diverse set of strategies. As development on the game slows down, your server is put to work over multiple months to train a final series of AIs that will give even seasoned players a run for their money for years to come.

Case Study: Enhancing CodeCraft

One potential application of ML to game development is as an automated playtester that helps balance game mechanics by finding good strategies much faster and cheaper than humans could. As luck would have it, I recently built a system that can train AIs for the CodeCraft real-time strategy game in less than a day. To explore how AI could assist in game development, I used my training system to create a more balanced version of CodeCraft with more strategic depth and interesting gameplay.

A more detailed overview of CodeCraft can be found here, but in a nutshell, it is a typical real-time strategy game in which each of the two players starts with a single mothership unit that can move across the map, collect resources, and build additional combat and production units. The goal is to eliminate all units of the opponent. In principle, CodeCraft implements a wide range of different units capable of both combat and production. Nonetheless, on the original version of the game my AIs invariably learned an aggressive strategy that completely eschewed any economic development in favor of quickly building an army consisting of only one or two different unit types (right-click to increase playback speed):

Best strategy found by my AIs on the original version of CodeCraft.

The first issue I wanted to tackle was to make resource-collecting and drone-building units viable. Simple enough, I tripled the speed at which units are built, along with some smaller changes such as allowing units to move while building to make them less vulnerable to attack. To prevent exponential growth in the number of units, I also changed maps from containing many small resource patches to a small number of large resources (each of which can only be harvested by one unit at a time). As hoped, AIs trained on the new version of the game were now making frequent use of drone-producing units:

After increasing production speed, the AI now makes frequent use of drone-producing units.

The AIs still weren’t utilizing resource-collecting units though. To address this, I doubled the rate at which units can harvest resources while halving the number of resource-collecting modules on the mothership to keep the initial production capacity constant. In exchange, I added two extra engine modules to the mothership, allowing it to move much faster and speeding up the initial phase of the game. Additionally, I nerfed the greatly overpowered shield modules by reducing their hitpoints from 7 to 4, but also allowing them to regenerate 3 times more quickly. Resource collector units now saw frequent usage:

Doubling the harvesting speed of resource-collecting units makes them an important part of the game.

While the economic part of the game was looking a lot better now and players were able to field much larger armies, AIs were still relying on the same small set of combat units despite the nerfed shields. After a few more iterations of smaller tweaks to the game mechanics, AIs started to use a third type of combat unit, and engagements were looking a bit more dynamic:

Gameplay is a lot more varied on the rebalanced version of CodeCraft.

Now, the above video may not look like much to you, but after having watched my AIs play for hundreds of hours while developing my training system I have become somewhat of a connoisseur, and I can assure you that this is a truly exquisite game of CodeCraft. One unexpected result was that on this new version of CodeCraft I was able to profitably train AIs four times longer before learning progress plateaued. There are many possible explanations for this, but the one I like best is that this balanced version of CodeCraft allows for strategies that are objectively several times more complex than what we started with.

With AIs now utilizing most of the existing game mechanics, it was time to add something new. I settled on a weapons module with a longer range but slower firing rate than the existing weapon, which I figured would allow for more interesting tactics. My first attempt at implementing this turned out to be a complete disaster: I had accidentally set the lifetime of the missiles too short, which caused AIs to adopt a rather silly strategy of only briefly moving units into lock-on range to fire their missiles, and then quickly moving them back again to outrun any potential enemy projectiles:

The AI quickly reveals a flaw in my first implementation of long-range missiles.

Even after fixing the missile lifetimes, it took a few more iterations to balance out the new module. It turned out that, despite the low firing rate, the combination of fast movement speed and long-range weapons was hopelessly overpowered, so I ended up removing all unit types that were both fast and equipped with long-range missiles. This was still insufficient: the lower firing rate of the new module was more than compensated for by the fact that units using it got the first shot, making it very difficult to approach groups of units with long-range missiles once they reached a critical mass. I ultimately addressed this issue by requiring long-range missiles to “charge up” before firing, giving other unit types just enough time to close the gap and trade favorable even against larger groups of long-range units. For good measure, I also added two additional strong but expensive “late game” units. After all these changes, my AIs were utilizing 7 different unit types (not counting the mothership) and choosing different unit compositions depending on the map and phase of the game. On small maps with limited resources, slow but powerful and weak but cost-effective units rule the day:

Small maps favor cheaper and slower units and rely heavily on mothership micro.

On larger maps, movement speed becomes a lot more important, and games go on long enough to allow for investment into more powerful units:

On larger maps, a wide variety of units is viable, including weak but fast-moving units that can slowly build up an advantage by briefly diving into the enemy army to fire a few shots, and then quickly retreating to regenerate their shields.

While this enhanced version of CodeCraft is still a far cry from StarCraft II, it is perhaps not that far away from games like Tooth & Tail. Especially on larger maps, I have a hard time following the game and figuring out why a particular player ends up winning:

Lots of stuff happening.

A few additional matches can be found in Appendix B. For a limited time, I will also be hosting a fully automated stream of CodeCraft on Twitch.

Originally I was going to add a second new module type, but by now I was starting to realize that game design is actually kind of difficult and decided to call it a day. In any case, the goal was to explore RL’s potential for game development. With just a few days’ work (and several GPU-weeks of compute) I was able to create and validate a much more complex version of CodeCraft. In the process, my AIs taught me about many unanticipated tactics and interactions of different game mechanics that would have been difficult to predict and laborious to discover. Game design will remain hard for the foreseeable future, but methods like the one demonstrated here are going to enable much faster iteration cycles that quickly surface imbalances and other issues and ultimately allow for the creation of compelling and balanced games with unprecedented levels of strategic depth.

Emerging Possibilities

If current trends are to continue, hardware and engineering costs will soon cease to be a major obstacle to applying machine learning methods to video games. The question remains, will these techniques actually help us create better games? And in what ways can we expect game development to change? There is still much uncertainty around when and how to use these techniques​30​ and fit them into existing workflows​31​. Likely, much of this friction is caused by the fact that decades of experience have gone into optimizing existing game archetypes and practices around the strength and weaknesses of classical AI. While the optimal ways of creating games that take full advantage of deep RL are going to take much time and effort to discover, we can already speculate about the myriads of potential applications that are now on the horizon.

AI Opponents

Many of the world’s most popular games rely on a competitive multiplayer mode that matches players with intelligent opponents of similar strength that can keep each other challenged and engaged for hundreds of hours. But only few games can sustain a sufficiently large player-base, and as the number of participants dwindles the game starts to suffer from long wait times, poor matchmaking, and eventually dies. Even for popular games, multiplayer brings its own challenges of engineering and infrastructure complexity, and cheating. Multiplayer also simply doesn’t work well with many game mechanics such as highly asymmetric game-play or long matches.

The possibility of easily creating a large pool of challenging and varied AI opponents promises the best of both worlds, eliminating wait times, allowing for fine-grained adjustment of difficulty, and enabling a larger variety of customizable game modes that could even include all past versions of the game long after the community has moved on. The poor quality of AIs is one of the most common complaints about mainstays such as the Civilization series, and better ways of creating compelling AIs will greatly enhance many single-player and cooperative games​32​. RL could also be used as a form of automatic content creation that significantly reduces the effort required to fill a game world with large swaths of unique and interesting non-player characters. Indeed, that is the premise of Source of Madness, a rogue-lite that uses AI to animate millions of different procedurally generated enemies.

Can AI ever truly replace human opponents, or is there something special about playing against a real person that makes games feel inherently more meaningful? The popularity of online chess and reactions to Fortnite to stuffing lobbies with bots suggest that humanity is not in any danger of being replaced anytime soon. Still, compared to hard-coded rules or classical AI, neural network agents seem to have a more natural “feel” to them and enable a broader space of possibilities. This new generation of AIs, capable of making intelligent decisions that can surprise the player, and perhaps imbued with distinctive character and playstyles, will certainly close the gap to real humans and may become increasingly difficult to distinguish​33​.

Automated Playtesting

Extensions to competitive games such as StarCraft II and Hearthstone go through many months of extensive design and playtesting to add new game mechanics in a balanced way. In addition to being costly and time-consuming, this process more often than not fails to produce the desired result, necessitating hot patches to restore balance, and on occasion introducing deep flaws that cause unfun strategies to dominate and ruin the metagame for months to come. Far from being an indictment of the abilities of game designers, this is an unavoidable consequence of the fact that any game that is complex enough to be interesting is necessarily difficult to reason about.

RL systems that explore millions of sequences in parallel and quickly home in on optimal strategies that would otherwise only be discovered after many weeks of experimentation by a large playerbase offer a way out of this quandary. Game developers will be able to observe the ultimate consequences of different design choices within hours or even minutes. As demonstrated in the previous section, such rapid experimentation allows for creating more complex and balanced games with a fraction of the time and effort required today. Automated playtesting is actively researched​34,35​, and its applications to game design have previously been explored in the context of chess​36​, the Chimera card game​37​, and the 0-AD real-time strategy game​38​. Already, and Unity are building commercial products for automating playtesting and the creation of AI opponents.

Computational Creativity

A somewhat different area of AI with emerging applications for games is the fascinating field of computational creativity​39​. Steerable generative models like DALL-E​40,41​ may significantly reduce the cost of creating game assets or generate context-appropriate music on the fly​42​. Beyond creating individual assets, ML can be used to amplify the abilities of level designers by partially automating the placement of objects based on examples​43​, automatically generate challenging yet solvable levels from scratch​44,45​, populate landscapes in Minecraft with settlements​46​, and process large-scale datasets to create 3D models of the entire Earth​47​.

Your Buddy, the AI

Many games have repetitive aspects that quickly become boring, and particularly strategy games have a tendency to accumulate complexity over the course of a playthrough that eventually results in the player spending most of their time on tedious micromanagement where each individual action has negligible impact on the game. More intelligent AIs could make it possible to delegate the grind, allowing the player to move up through several abstraction layers over the course of a game and remain focused on impactful strategic decisions without being bogged down by minutia. For example, one might imagine a 4X game where you start out by micromanaging individual villagers, but as time progresses you delegate management of first individual cities and then whole provinces to intelligent AI governors. Eventually, you only set the high-level policy objectives for your sprawling empire, which are then faithfully adapted by a large administration composed of dozens of AIs. Or, perhaps, not so faithfully, as corrupt AI governors are also pursuing divergent objectives for their own personal gain.

Automating micromanagement is one particular possibility in which AIs might assist the player, and there are many other design points for AIs to help through coaching or taking over execution​48​. More intelligent AIs that behave in interesting ways may also enable a new generation of simulation and god games where players provide minimal input beyond setting the initial starting conditions and occasional high-level interactions with the game world. At the extreme end of automation, we might even see games that are designed primarily to generate broadcasts of matches played between AIs and which are never intended to be played directly by humans.

Adverse Impacts

While I expect the overall impact of machine learning techniques on games to be positive, particularly for single-player and cooperative games, it will undoubtedly also bring new challenges. Cheating has already become rampant in many games, and new AI techniques will keep making cheating easier and enable the creation of more human-like bots that are difficult to detect​49–52​. For many multiplayer games, the most enjoyable phase is in the weeks after a new release during which the metagame is actively evolving and pro players can snatch surprise wins with innovative new strategies. AI systems that can quickly determine the optimal set of strategies threaten this process of discovery, though perhaps differential access to compute and efficient simulators will mitigate this problem for quite some time. In the professional Go world, AlphaGo has reportedly lead to an “enormous sense of loss” for those seeing mastery of Go as an intrinsically valuable intellectual pursuit, and the effectiveness of learning from and imitating the AI has reduced the variety of play styles and led to a large reduction in the teaching opportunities and concomitant loss of livelihood​53​. Finally, unscrupulous companies will soon (or, more likely, already are) apply RL to directly optimize addictive game mechanics that exploit flaws in the human reward system for maximum profit​54–62​, though there is also hope that more sophisticated RL techniques will allow us to move beyond shallow engagement metrics and optimize for something more akin to well-being​63,64​.

Vast Virtual Worlds

For more than a decade, it’s been a personal dream of mine to create virtual worlds inhabited by tens of thousands of autonomous agents all pursuing their own goals in a way that allows for the emergence of complex economies and societies. Rather than dusty museums inhabited by mindless automata following predictable schedules or reenacting scripted stories, players would be able to explore vast dynamic worlds that feel truly alive, where nothing is set in stone, choices actually matter, and actions can have non-obvious and far-reaching consequences on how the world ultimately evolves. While more work remains to be done, I’m quite confident that the latest wave of deep RL advancements now squarely places this in the realm of the possible.


Simulated worlds sidestep many of the problems that plague the integration of ML with the real world. This makes video games a particularly attractive testbed for research and a likely candidate for the initial deployment of powerful new techniques. If current trends are to continue, the range of viable applications of ML in the domain of video games is about to undergo explosive growth. Beyond just reducing the cost of creating high-quality games and enhancing existing game mechanics, ML opens up entirely new areas of game-design space, and I believe that some of the most compelling experiences to be created in the next decade will come out of the creative use of machine learning.


Thanks to Batu Aytemiz, Anssi Kanervisto, Benedikt Winter, Christopher Berner, Paul Christiano, and Jeremy Salwen for giving helpful feedback on drafts of this article.


  1. 1.
    OpenAI, Berner C, et al. Dota 2 with Large Scale Deep Reinforcement Learning. Published online 2019.
  2. 2.
    OpenAI. OpenAI Five Defeats Dota 2 World Champions. OpenAI. Published online 2020.
  3. 3.
    Vinyals O, Babuschkin I, Czarnecki WM, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature. 2019;575:350–354.
  4. 4.
    Huang D. Openai five: Thoughts. Dansplaining.
  5. 5.
    Moore’s law. Wikipedia. Published online 2021.’s_law
  6. 6.
    Shilov A. Samsung’s aggressive EUV plans: 6nm production In H2, 5NM & 4NM on track. AnandTech. Published online 2019.
  7. 7.
    Shilov A. TSMC: 5nm on track for Q2 2020 HVM, will RAMP faster Than 7nm. AnandTech. Published online 2019.
  8. 8.
    Semiconductor Manufactoring Company T. TSMC and OIP ecosystem Partners deliver industry’s first complete design infrastructure for 5nm process technology. Taiwan Semiconductor Manufacturing Company Limited. Published online 2020.
  9. 9.
    Armasu L. Samsung plans mass production of 3nm GAAFET chips in 2021. Tom’s Hardware. Published online 2019.,38426.html
  10. 10.
    IBM unveils world’s first 2 Nanometer chip technology, opening a new frontier for semiconductors. IBM Newsroom. Published online 2021.,-Opening-a-New-Frontier-for-Semiconductors
  11. 11.
    Martin E. Moore’s law is alive and well. Medium. Published online 2020.
  12. 12.
    Cheng G. Moore’s law is not dead. TSMC Blog. Published online 2019.
  13. 13.
    Tang S. AI-Chip – A list of ICs and IPs for AI, Machine Learning and Deep Learning. Ai-chip.
  14. 14.
    Balaban M. A100 vs V100 deep Learning Benchmarks: Lambda. Lambda Blog. Published online 2021.
  15. 15.
    Dettmers T. The best GPUs for deep learning in 2020 – an in-depth analysis. Tim Dettmers. Published online 2020.
  16. 16.
    NVIDIA data Center deep learning product performance. NVIDIA Developer. Published online 2021.
  17. 17.
    Impacts A. 2019 recent trends in GPU price per flops. AI Impacts. Published online 2020.
  18. 18.
    Winter C. Building the Ultimate Deep Learning Workstation. Clemens’ Blog. Published online 2021.
  19. 19.
    Hernandez D, B. Brown T. Measuring the Algorithmic Efficiency of Neural Networks. Published online 2020.
  20. 20.
    Bjorck J, Chen X, De Sa C, P. Gomes C, Q. Weinberger K. Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision. Published online 2021.
  21. 21.
    Cobbe K, Hilton J, Klimov O, Schulman J. Phasic Policy Gradient. Published online 2020.
  22. 22.
    P. Patil V, Hofmarcher M, Dinu M-C, et al. Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution. Published online 2020.
  23. 23.
    Pearce T, Zhu J. Counter-Strike Deathmatch with Large-Scale Behavioural Cloning. Published online 2021.
  24. 24.
    Wang X, Song J, Qi P, et al. SCC: an Efficient Deep Reinforcement Learning Agent Mastering the Game of StarCraft II. Published online 2021.
  25. 25.
    H. Guss W, Ynocente Castro M, Devlin S, et al. The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors. Published online 2021.
  26. 26.
    Raffin A, Hill A, Ernestus M, Gleave A, Kanervisto A, Dormann N. Stable Baselines3. GitHub repository. Published online 2019.
  27. 27.
    Juliani A, Berges V-P, Teng E, et al. Unity: A General Platform for Intelligent Agents. Published online 2020.
  28. 28.
    Winter C. My Reinforcement Learning Learnings. Clemens’ Blog. Published online 2021.
  29. 29.
    Kanervisto A, Scheller C, Schraner Y, Hautamäki V. Distilling Reinforcement Learning Tricks for Video Games. Published online 2021.
  30. 30.
    Jacob M, Devlin S, Hofmann K. “It’s Unwieldy and It Takes a Lot of Time.” Challenges and Opportunities for Creating Agents in Commercial Games. In: 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. Association for the Advancement of Artificial Intelligence (AAAI); 2020.
  31. 31.
    Aytemiz B, Devlin S, Hofmann K. Designer-centered reinforcement learning. Microsoft Research. Published online 2021.
  32. 32.
    Yang W. Self-play Reinforcement Learning for Hearthstone AI. Microsoft Research. Published online 2021.
  33. 33.
    Ramuz J. Battlefield 2042 bots are going to be “really hard” to spot. EarlyGame. Published online 2021.
  34. 34.
    Bergdahl J, Gordillo C, Tollmar K, Gisslén L. Augmenting Automated Game Testing with Deep Reinforcement Learning. Published online 2021.
  35. 35.
    Gordillo C, Bergdahl J, Tollmar K, Gisslén L. Improving Playtesting Coverage via Curiosity Driven Reinforcement Learning Agents. Published online 2021.
  36. 36.
    Tomašev N, Paquet U, Hassabis D, Kramnik V. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess. Published online 2020.
  37. 37.
    Kim JH, Wu R. Leveraging Machine Learning for Game Development. Google AI Blog. Published online 2021.
  38. 38.
    Broll B. Game Balancing Example in 0 A.D. GitHub. Published online 2021.
  39. 39.
    Riedl M. Making Things that Make Things: Computational Creativity for Games and Worlds. Microsoft Research. Published online 2021.
  40. 40.
    Ramesh A, Pavlov M, Goh G. DALL·E: Creating Images from Text. OpenAI. Published online 2021. https ://
  41. 41.
    Ramesh A, Pavlov M, Goh G, et al. Zero-Shot Text-to-Image Generation. Published online 2021.
  42. 42.
    Ackerman M. Musical AI Landscape & Opportunities in Gaming. Microsoft Research. Published online 2021.
  43. 43.
    Opara A. Place Like I do! Example-based Procedural Placement. Microsoft Research. Published online 2021.
  44. 44.
    Dennis M, Jaques N, Vinitsky E, et al. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. Published online 2021.
  45. 45.
    Gisslén L, Eakins A, Gordillo C, Bergdahl J, Tollmar K. Adversarial Reinforcement Learning for Procedural Content Generation. Published online 2021.
  46. 46.
    Salge C. Generative Design in Minecraft: The AI Settlement Generation Challenge. Microsoft Research. Published online 2021.
  47. 47.
    Neumann J. Creating the 4D Digital Twin in Microsoft Flight Simulator with AI. Microsoft Research. Published online 2021.
  48. 48.
    Aytemiz B, Shu X, Hu E, Smith A. Your Buddy, the Grandmaster: Repurposing the Game-Playing AI Surplus for Inclusivity. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. 2020;16:17-23.
  49. 49.
    Bland A. Chess’s cheating CRISIS: “paranoia has become the culture.” The Guardian. Published online 2020.
  50. 50.
    Schormann C. Can online chess overcome cheating? ChessTech News. Published online 2020.
  51. 51.
    Humphries M. Machine Learning is now being used to cheat in multiplayer games. PCMAG. Published online 2021.
  52. 52.
    Warren T. The World’s Biggest PC Games Are Fighting a New Surge of Cheaters and Hackers. The Verge. Published online 2020.
  53. 53.
    Lee H. Impact of go ai on the professional go world. Medium. Published online 2020.
  54. 54.
    Tassi P. Why it’s scary When 0.15% mobile Gamers bring in 50% of the revenue. Forbes. Published online 2014.
  55. 55.
    Stuart K. My favourite waste of time: Why Candy Crush and Angry birds are so compulsive. The Guardian. Published online 2014.
  56. 56.
    Wang Q, Ren H, Long J, Liu Y, Liu T. Research progress and debates on gaming disorder. General Psychiatry. 2019;32. doi:10.1136/gpsych-2019-100071
  57. 57.
    Dreier M, Wölfling K, Duven E, Giralt S, Beutel ME, Müller KW. Free-to-play: About addicted Whales, at risk Dolphins and healthy Minnows. Monetarization design and Internet Gaming Disorder. Addict Behav. 2017;64:328–333.
  58. 58.
    Rose M. Chasing the whale: Examining the ethics of free-to-play games. Gamasutra. Published online 2013.
  59. 59.
    Wiltshire A. Behind the addictive psychology and seductive art of loot boxes. pcgamer. Published online 2017.
  60. 60.
    Zendle D, Meyer R, Over H. Adolescents and loot boxes: Links with problem gambling and motivations for purchase. Royal Society Open Science. Published online 2019. doi:10.31234/
  61. 61.
    Breeze M. A quiet killer: Why games are so addictive. TNW | Insider. Published online 2021.
  62. 62.
    Mann A. Pseudo-Gambling and Whaling: How Loot Boxes Pray on Vulnerable Populations and How to Curtail Future Predatory Behavior. Washington Journal of Law, Technology & Arts . 2020;15:200.
  63. 63.
    Stray J. Beyond engagement: Aligning algorithmic recommendations with prosocial goals. The Partnership on AI. Published online 2021.
  64. 64.
    Stiennon N, Ouyang L, Wu J, et al. Learning to summarize from human feedback. Published online 2020.
  65. 65.
    Brockman G, Cheung V, Pettersson L, et al. OpenAI Gym. Published online 2016.
  66. 66.
    Terry JK, Black B, Grammel N, et al. PettingZoo: Gym for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:200914471. Published online 2020.


A Architecting games for machine learning

This section gives an overview of how to create games that can be easily adapted by AI researchers or plugged into future training frameworks. While the field and library ecosystem are still evolving, we can nonetheless identify a number of high-level considerations and fundamental constraints. The predominant API for bridging environments (games) and reinforcement learning algorithms is the OpenAI Gym library​65​. Gym defines a standard interface for environments that makes it possible to create libraries like Stable Baselines3​26​, which implements state-of-the-art reinforcement learning algorithms that work with any environment conforming to the Gym interface.

import gym
env = gym.make("MyAwesomeGame")
observation = env.reset()
for _ in range(1000):
  action = rl_agent.act(observation)
  observation, reward, done, info = env.step(action)
  if done:
    observation = env.reset()

Example code for creating a gym environment and running it for 1000 timesteps.

Most reinforcement learning implementations run many (64 or more) game instances in parallel to maximize GPU utilization by batching multiple requests on each timestep. From the point of RL training, an ideal environment has low enough memory usage to allow for many parallel instances, can be efficiently single-stepped much faster than real-time, and allows game instances to be created and reset with low overhead. Since Python’s concurrency support is rather poor, it can be useful to implement a vectorized version of the API that allows many game instances to be created/stepped/observed with a single function call.

The Gym interface is still too restrictive to allow it to be easily used with the majority of real games. For example, it does not have any provisions for handling environments that contain multiple independent agents. PettingZoo​66​ is a recent attempt at fixing this shortcoming while still closely mirroring the original Gym API.

Another limitation that is yet to be lifted is that both the observations and actions of game agents are required to be fixed size vectors/matrices. The state of most games cannot be easily represented in this way. Using images as observations provides a very general interface, but such a low-level representation easily increases training costs by orders of magnitude and is therefore not a practical solution. A better API that would be a much more natural fit for most games and allow for efficient RL implementations would be based on variable-length lists of objects that directly correspond to the game state. Users of the environment would be able to specify a shape that lists the kind of objects and properties they want to query from the environment:

shape = {
  'global': ['level', 'difficulty', 'remaining_time'],
  'hero': ['x_pos', 'y_pos', 'health', 'max_health', 'attack_cooldown'],
  'platform': ['x_pos', 'y_pos'],
  'enemy': ['x_pos', 'y_pos', 'type'],

Observations could then simply contain all the corresponding lists of objects:

observation = {
  # Features not associated with any particular object
  'global': array([13.0, 2., 0.2183]),
  # Features of the hero and currently visible platforms and enemies
  'hero': array([[100. , 200. , 60. , 100. , 0.3]]),
  'platform': array([
    [100., 180.],
    [200., 230.],
  'enemy': array([
    [103., 200., 1.],
    [130., 0., 0.],
    [153., 0. 0.],

While there are currently no reusable RL frameworks or network architectures that can operate directly on observations structured in this way, I expect this to change in due time.


While developing an AI, it is very useful to have an option to enable graphical output or to record and watch replays of games. Even better is an API that also allows adding custom visualizations at each time step, such as rendering debug text, placing marker objects in the game world, or highlighting different game objects.


Existing reinforcement learning techniques still have many limitations, particularly with regard to exploration. One of the most effective ways of bypassing these is to construct curricula and randomize the game mechanics. To make this possible, when creating a game instance it should be possible to set the value of any numerical constants (health, movement speed, attack damage, cooldowns, item costs, …), and exercise fine-grained control over all aspects of map generation.

B Videos of CodeCraft Enhanced

1 thought on “Machine Learning and the Future of Video Games

  1. Bret Bernhoft

    This is a really thoughtful, well-researched article on a subject that I have only briefly considered prior. I also appreciate the memes to drive home the point that Artificial Intelligence and Video Game Devs have a lot to gain from working together on new media products. This is definitely the future I’m expecting as well.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.