
The AI made a “mental map” of the world to collect the game’s most sought-after material.
My nephew couldn’t stop playing Minecraft when he was seven years old.
One of the most popular games ever, Minecraft is an open world in which players build terrain and craft various items and tools. No one showed him how to navigate the game. But over time, he learned the basics through trial and error, eventually figuring out how to craft intricate designs, such as theme parks and entire working cities and towns. But first, he had to gather materials, some of which—diamonds in particular—are difficult to collect.
Now, a new DeepMind AI can do the same.
Without access to any human gameplay as an example, the AI taught itself the rules, physics, and complex maneuvers needed to mine diamonds. “Applied out of the box, Dreamer is, to our knowledge, the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula,” wrote study author, Danijar Hafner, in a blog post.
But playing Minecraft isn’t the point. AI scientist have long been after general algorithms that can solve tasks across a wide range of problems—not just the ones they’re trained on. Although some of today’s models can generalize a skill across similar problems, they struggle to transfer those skills across more complex tasks requiring multiple steps.
In the limited world of Minecraft, Dreamer seemed to have that flexibility. After learning a model of its environment, it could “imagine” future scenarios to improve its decision making at each step and ultimately was able to collect that elusive diamond.
The work “is about training a single algorithm to perform well across diverse…tasks,” said Harvard’s Keyon Vafa, who was not involved in the study, to Nature. “This is a notoriously hard problem and the results are fantastic.”
Learning From Experience
Children naturally soak up their environment. Through trial and error, they quickly learn to avoid touching a hot stove and, by extension, a recently used toaster oven. Dubbed reinforcement learning, this process incorporates experiences—such as “yikes, that hurt”—into a model of how the world works.
A mental model makes it easier to imagine or predict consequences and generalize previous experiences to other scenarios. And when decisions don’t work out, the brain updates its modeling of the consequences of actions—”I dropped a gallon of milk because it was too heavy for me”—so that kids eventually learn not to repeat the same behavior.
Scientists have adopted the same principles for AI, essentially raising algorithms like children. OpenAI previously developed reinforcement learning algorithms that learned to play the fast-paced multiplayer Dota 2 video game with minimal training. Other such algorithms have learned to control robots capable of solving multiple tasks or beat the hardest Atari games.
Learning from mistakes and wins sounds easy. But we live in a complex world, and even simple tasks, like, say, making a peanut butter and jelly sandwich, involve multiple steps. And if the final sandwich turns into an overloaded, soggy abomination, which step went wrong?
That’s the problem with sparse rewards. We don’t immediately get feedback on every step and action. Reinforcement learning in AI struggles with a similar problem: How can algorithms figure out where their decisions went right or wrong?
World of Minecraft
Minecraft is a perfect AI training ground.
Players freely explore the game’s vast terrain—farmland, mountains, swamps, and deserts—and harvest specialized materials as they go. In most modes, players use these materials to build intricate structures—from chicken coups to the Eiffel Tower—craft objects like swords and fences, or start a farm.
The game also resets: Every time a player joins a new game the world map is different, so remembering a previous strategy or place to mine materials doesn’t help. Instead, the player has to more generally learn the world’s physics and how to accomplish goals—say, mining a diamond.
These quirks make the game an especially useful test for AI that can generalize, and the AI community has focused on collecting diamonds as the ultimate challenge. This requires players to complete multiple tasks, from chopping down trees to making pickaxes and carrying water to an underground lava flow.
Kids can learn how to collect diamonds from a 10-minute YouTube video. But in a 2019 competition, AI struggled even after up to four days of training on roughly 1,000 hours of footage from human gameplay.
Algorithms mimicking gamer behavior were better than those learning purely by reinforcement learning. One of the organizers of the competition, at the time, commented that the latter wouldn’t stand a chance in the competition on their own.
Dreamer the Explorer
Rather than relying on human gameplay, Dreamer explored the game by itself, learning through experimentation to collect a diamond from scratch.
The AI is comprised of three main neural networks. The first of these models the Minecraft world, building an internal “understanding” of its physics and how actions work. The second network is basically a parent that judges the outcome of the AI’s actions. Was that really the right move? The last network then decides the best next step to collect a diamond.
All three components were simultaneously trained using data from the AI’s previous tries—a bit like a gamer playing again and again as they aim for the perfect run.
World modeling is the key to Dreamer’s success, Hafner told Nature. This component mimics the way human players see the game and allows the AI to predict how its actions could change the future—and whether that future comes with a reward.
“The world model really equips the AI system with the ability to imagine the future,” said Hafner.
To evaluate Dreamer, the team challenged it against several state-of-the-art singular use algorithms in over 150 tasks. Some tested the AI’s ability to sustain longer decisions. Others gave either constant or sparse feedback to see how the programs fared in 2D and 3D worlds.
“Dreamer matches or exceeds the best [AI] experts,” wrote the team.
They then turned to a far harder task: Collecting diamonds, which requires a dozen steps. Intermediate rewards helped Dreamer pick the next move with the largest chance of success. As an extra challenge, the team reset the game every half hour to ensure the AI didn’t form and remember a specific strategy.
Dreamer collected a diamond after roughly nine days of continuous gameplay. That’s far slower than expert human players, who need just 20 minutes or so. However, the AI wasn’t specifically trained on the task. It taught itself how to mine one of the game’s most coveted items.
The AI “paves the way for future research directions, including teaching agents world knowledge from internet videos and learning a single world model” so they can increasingly accumulate a general understanding of our world, wrote the team.
“Dreamer marks a significant step towards general AI systems,” said Hafner.
The post DeepMind’s New AI Teaches Itself to Play Minecraft From Scratch appeared first on SingularityHub.
* This article was originally published at Singularity Hub
0 Comments