“Our sneak peek into Google’s new robotics model, RT-2,
which melds artificial intelligence technology with robots.
A one-armed robot stood in front of a table. On the table
sat three plastic figurines: a lion, a whale and a dinosaur.
An engineer gave the robot an instruction: “Pick up the
extinct animal.”
The robot whirred for a moment, then its arm extended and
its claw opened and descended. It grabbed the dinosaur.
Until very recently, this demonstration, which I witnessed
during a podcast interview at Google’s robotics division in Mountain View,
Calif., last week, would have been impossible. Robots weren’t able to reliably
manipulate objects they had never seen before, and they certainly weren’t
capable of making the logical leap from “extinct animal” to “plastic dinosaur.”
But a quiet revolution is underway in robotics, one that
piggybacks on recent advances in so-called large language models — the same
type of artificial intelligence system that powers ChatGPT, Bard and other
chatbots.
Google has recently begun plugging state-of-the-art language
models into its robots, giving them the equivalent of artificial brains. The
secretive project has made the robots far smarter and given them new powers of
understanding and problem-solving.
I got a glimpse of that progress during a private
demonstration of Google’s latest robotics model, called RT-2. The model, which
is being unveiled on Friday, amounts to a first step toward what Google executives
described as a major leap in the way robots are built and programmed.
“We’ve had to reconsider our entire research program as a
result of this change,” said Vincent Vanhoucke, Google DeepMind’s head of
robotics. “A lot of the things that we were working on before have been
entirely invalidated.”
Robots still fall short of human-level dexterity and fail at
some basic tasks, but Google’s use of A.I. language models to give robots new
skills of reasoning and improvisation represents a promising breakthrough, said
Ken Goldberg, a robotics professor at the University of California, Berkeley.
“What’s very impressive is how it links semantics with
robots,” he said. “That’s very exciting for robotics.”
To understand the magnitude of this, it helps to know a
little about how robots have conventionally been built.
For years, the way engineers at Google and other companies
trained robots to do a mechanical task — flipping a burger, for example — was
by programming them with a specific list of instructions. (Lower the spatula
6.5 inches, slide it forward until it encounters resistance, raise it 4.2
inches, rotate it 180 degrees, and so on.) Robots would then practice the task
again and again, with engineers tweaking the instructions each time until they
got it right.
This approach worked for certain, limited uses. But training
robots this way is slow and labor-intensive. It requires collecting lots of
data from real-world tests. And if you wanted to teach a robot to do something
new — to flip a pancake instead of a burger, say — you usually had to reprogram
it from scratch.
Partly because of these limitations, hardware robots have
improved less quickly than their software-based siblings. OpenAI, the maker of
ChatGPT, disbanded its robotics team in 2021, citing slow progress and a lack
of high-quality training data. In 2017, Google’s parent company, Alphabet, sold
Boston Dynamics, a robotics company it had acquired, to the Japanese tech
conglomerate SoftBank. (Boston Dynamics is now owned by Hyundai and seems to
exist mainly to produce viral videos of humanoid robots performing terrifying
feats of agility.)
In recent years, researchers at Google had an idea. What if,
instead of being programmed for specific tasks one by one, robots could use an
A.I. language model — one that had been trained on vast swaths of internet text
— to learn new skills for themselves?
”We started playing with these language models around two
years ago, and then we realized that they have a lot of knowledge in them,”
said Karol Hausman, a Google research scientist. “So we started connecting them
to robots.”
Google’s first attempt to join language models and physical
robots was a research project called PaLM-SayCan, which was revealed last year.
It drew some attention, but its usefulness was limited. The robots lacked the
ability to interpret images — a crucial skill, if you want them to be able to
navigate the world. They could write out step-by-step instructions for
different tasks, but they couldn’t turn those steps into actions.
Google’s new robotics model, RT-2, can do just that. It’s
what the company calls a “vision-language-action” model, or an A.I. system that
has the ability not just to see and analyze the world around it, but to tell a
robot how to move.
It does so by translating the robot’s movements into a
series of numbers — a process called tokenizing — and incorporating those tokens
into the same training data as the language model. Eventually, just as ChatGPT
or Bard learns to guess what words should come next in a poem or a history
essay, RT-2 can learn to guess how a robot’s arm should move to pick up a ball
or throw an empty soda can into the recycling bin.
“In other words, this model can learn to speak robot,” Mr.
Hausman said.
In an hourlong demonstration, which took place in a Google
office kitchen littered with objects from a dollar store, my podcast co-host
and I saw RT-2 perform a number of impressive tasks. One was successfully
following complex instructions like “move the Volkswagen to the German flag,”
which RT-2 did by finding and snagging a model VW Bus and setting it down on a
miniature German flag several feet away.
It also proved capable of following instructions in
languages other than English, and even making abstract connections between
related concepts. Once, when I wanted RT-2 to pick up a soccer ball, I
instructed it to “pick up Lionel Messi.” RT-2 got it right on the first try.
The robot wasn’t perfect. It incorrectly identified the
flavor of a can of LaCroix placed on the table in front of it. (The can was
lemon; RT-2 guessed orange.) Another time, when it was asked what kind of fruit
was on a table, the robot simply answered “white.” (It was a banana.) A Google
spokeswoman said the robot had used a cached answer to a previous tester’s
question because its Wi-Fi had briefly gone out.
Google has no immediate plans to sell RT-2 robots or release
them more widely, but its researchers believe these new language-equipped
machines will eventually be useful for more than just parlor tricks. Robots
with built-in language models could be put into warehouses, used in medicine or
even deployed as household assistants — folding laundry, unloading the
dishwasher, picking up around the house, they said.
“This really opens up using robots in environments where
people are,” Mr. Vanhoucke said. “In office environments, in home environments,
in all the places where there are a lot of physical tasks that need to be
done.”
Of course, moving objects around in the messy, chaotic
physical world is harder than doing it in a controlled lab. And given that A.I.
language models frequently make mistakes or invent nonsensical answers — which
researchers call hallucination or confabulation — using them as the brains of
robots could introduce new risks.
But Mr. Goldberg, the Berkeley robotics professor, said
those risks were still remote.
“We’re not talking about letting these things run loose,” he
said. “In these lab environments, they’re just trying to push some objects
around on a table.”
Google, for its part, said RT-2 was equipped with plenty of
safety features. In addition to a big red button on the back of every robot —
which stops the robot in its tracks when pressed — the system uses sensors to
avoid bumping into people or objects.
The A.I. software built into RT-2 has its own safeguards,
which it can use to prevent the robot from doing anything harmful. One benign
example: Google’s robots can be trained not to pick up containers with water in
them, because water can damage their hardware if it spills.
If you’re the kind of person who worries about A.I. going
rogue — and Hollywood has given us plenty of reasons to fear that scenario,
from the original “Terminator” to last year’s “M3gan” — the idea of making
robots that can reason, plan and improvise on the fly probably strikes you as a
terrible idea.
But at Google, it’s the kind of idea researchers are
celebrating. After years in the wilderness, hardware robots are back — and they
have their chatbot brains to thank."
Komentarų nėra:
Rašyti komentarą