Training an artificial intelligence agent to do something like navigating in a complex 3D world is arithmetically expensive and time-consuming. In order to better create these potentially usable systems, Facebook engineers have achieved enormous efficiency benefits by essentially leaving the slowest in the pack.
It is part of the company's new focus on “embodied AI”, ie machine learning systems that interact intelligently with their environment. That can mean anything – for example, responding to a voice command using a conversation context, but also more subtle things such as a robot that knows it has invaded the wrong room in a house. Exactly why Facebook is so interested that I leave it to your own speculations, but the fact is that they have recruited and funded serious researchers to investigate this and related domains of AI work.
To create such “embodied” systems, you must train them using a reasonable real-world facsimile. You can't expect an AI that has never seen a real hall to know what walls and doors are. And given how slowly real robots move in real life, you can't expect them to learn their lessons here. That is what led Facebook to create Habitat, a set of simulated real-world environments that are so photo-realistic that what an AI learns by navigating them can also be applied to the real world.
Such simulators, which are common in robotics and AI training, are also useful because, as simulators, you can perform many cases of them simultaneously – for simple, thousands at a time, each with an agent trying to solve a problem and finally report the findings back to the central system that sent it.
Unfortunately, photo-realistic 3D environments use a lot of computation compared to simpler virtual environments, meaning that researchers limit themselves to a handful of simultaneous instances, which slows down learning to a comparative crawl.
The Facebook researchers, led by Dhruv Batra and Erik Wijmans, the first a professor and the second a PhD student at Georgia Tech, have found a way to speed up this process by an order of magnitude or more. And the result is an AI system that can navigate a 3D environment from start to finish with a success rate of 99.9% and few errors.
Easy navigation is fundamental for a working “embodied AI ” or robot, and therefore the team chose to continue it without adding additional difficulties.
“It's the first task. Forget the answer to the question, forget the context – can you just go from point A to point B? If the agent has a card, this is easy, but without a card it's an open problem.” “says Batra. “Navigation failure means that every stack that is built on top of it will fall down.”
The problem, they found, was that the training systems spent too much time waiting for slowpokes. Maybe it's not fair to call them that – these are AI agents who for whatever reason are simply unable to perform their duties quickly.
“It is not necessarily true that they learn slowly,” Wijmans explains. “But if you simulate navigating a one-room apartment, it's much easier to do that than navigating a 10-bedroom townhouse.”
The central system is designed to wait for all deployed agents to complete their virtual tasks and report back. If a single agent takes 10 times longer than the rest, it means that a huge amount of time is wasted while the system is waiting, so that it can update its information and send a new batch.
The innovation of the Facebook team is to intelligently cut off these unfortunate laggards before they are ready. After a certain time in the simulation, they are ready, and the data they have collected is added to the stock.
“You have all these workers at work, and they are all doing their thing, and they are all talking to each other,” said Wijmans. “One will tell the others,” okay, I'm almost done, “and they will all report on their progress. Whoever sees them lagging behind the rest will be the amount of work they do for the great synchronization that takes place, Reduce”.
If a machine learning agent might feel bad, I'm sure it would at this point, and indeed that agent gets “punished ” by the system, in that it doesn't have as much virtual “reinforcement ” as the others. The anthropomorphic terms make this more human than it is – essentially inefficient algorithms or algorithms that are placed in difficult circumstances are degraded in importance. But their contributions are still valuable.
“We use all the experience that workers gain, regardless of whether it is a success or a failure – we are still learning from it,” Wijmans explains.
This means that there are no wasted cycles where some employees wait for others to complete. By gaining more experience in time with the task ahead of us, the next group of slightly better workers goes out much sooner, a self-reinforcing cycle that yields serious gains.
In the experiments they carried out, the researchers found that the system, with the catchy name Decentralized Distributed Proximal Policy Optimization or DD-PPO, seemed to scale almost ideal, with performance increasing almost linearly to more computing power dedicated to the task. That is, increasing the computing power 10x resulted in almost 10x the results. On the other hand, standard algorithms led to very limited scaling, with 10x or 100x the computing power yielding only a small increase in results due to the way these advanced simulators hoard themselves.
These efficient methods allow the Facebook researchers to produce agents who can solve a point-to-point navigation task in a virtual environment within their assigned time with 99.9% reliability. They even demonstrated robustness for mistakes and found a way to quickly recognize that they had taken the wrong turn and go back the other way.
The researchers speculated that the agents had learned to “exploit structural regularities,” an expression that in some circumstances means that the AI has figured out how to cheat. But Wijmans clarified that it is more likely that the environments they used have a number of real-world layout rules.
“These are real houses that we have digitized, so they learn things about how Western-style houses are decorated,” he said. Just as you would not expect the kitchen to enter directly into a bedroom, the AI has learned to recognize other patterns and make other “assumptions”.
The next goal is to find a way for these agents to perform their duties with fewer resources. Every agent had a virtual camera it navigated with that provided normal and depth images, but also an infallible coordinate system to tell where it was traveling and a compass that always focused on the target. If only it were always that easy! But until this experiment, even with those resources, the conversion rate was considerably lower, even with much more training time.
Habitat itself also gets a new coat of paint with some interactivity and adaptability.
“Habitat was a static universe for these improvements,” Wijmans explains. “The agent can move and crash into walls, but he can't open a drawer or knock over a table. We built it because we wanted a fast, large-scale simulation – but if you do tasks like” get my laptop off my desk “To solve, you'd better actually pick up that laptop.”
That is why Habitat now allows users to add objects to rooms, apply forces to those objects, check for collisions, and so on. After all, there is more in real life than just disembodied gliding around a frictionless 3D construction.
The improvements should make Habitat a more robust platform for experimentation, and will also make it possible for agents trained to transfer learning directly to the real world – something the team has already begun to do and a paper about will publish.Tags: #ArtificialIntelligence, #latestNewsAI, #researchAi, facebook