Facebook spends a lot of time and money on augmented reality, including building its own AR glasses with Ray-Ban. Right now, these gadgets can only save and share pictures, but what does the company think these devices will be used for in the future?
A new research project led by Facebook’s AI team suggests the scale of the company’s ambitions. He imagines AI systems that constantly analyze people’s lives using first-person video; record what they see, do and hear in order to help them in their daily tasks. Facebook researchers have described a series of skills it wants these systems to develop, including “episodic memory” (answering questions like “where did I leave my keys?”) And “audiovisual diarization”. (Remember who said what when).
“It is possible that we exploit this type of research along the way”
At present, the tasks described above cannot be performed reliably by any AI system, and Facebook stresses that this is a research project rather than a business development. However, it is clear that the company sees such features as the future of AR computing. “Certainly, thinking about augmented reality and what we’d like to be able to do with it, there’s a possibility that we could benefit from this type of research,” Kristen Grauman, Facebook AI researcher, told The Verge.
Such ambitions have huge privacy implications. Privacy experts are already worried about how Facebook’s AR glasses allow users to secretly register members of the public. Such concerns will only be exacerbated if future versions of the hardware not only record images, but analyze and transcribe them, turning users into walking surveillance machines.
Facebook’s first pair of commercial AR glasses can only record and share videos and images, not analyze them. Photo by Amanda Lopez for The Verge
The name of Facebook’s research project is Ego4D, which refers to the analysis of video in the first person, or “egocentric.” It consists of two main components: an open dataset of egotistical video and a series of benchmarks that Facebook believes AI systems should be able to tackle in the future.
Facebook helped collect 3,205 hours of first-person footage from around the world
The dataset is the largest of its kind ever created, and Facebook has partnered with 13 universities around the world to collect the data. In total, some 3,205 hours of footage were recorded by 855 participants living in nine different countries. Universities, rather than Facebook, were responsible for collecting the data. Participants, some of whom were paid, wore GoPro cameras and AR glasses to record videos of unscripted activities. It ranges from building jobs and baking to playing with pets and socializing with friends. All images were de-identified by universities, which included blurring the faces of passers-by and removing any personally identifiable information.
Grauman says the dataset is “the first of its kind in scale and diversity.” The closest comparable project, she says, contains 100 hours of first-person footage shot entirely in kitchens. “We’ve opened our eyes to these AI systems to more than kitchens in the UK and Sicily, but [to footage from] Saudi Arabia, Tokyo, Los Angeles and Colombia.
The second component of Ego4D is a series of benchmarks, or tasks, that Facebook wants researchers around the world to try to solve using AI systems trained on its dataset. The company describes them as:
Episodic memory: What happened when (for example, “Where did I leave my keys?”)?
Forecast: What am I likely to do next (for example, “Wait, you already added salt to this recipe”)?
Handling of hands and objects: What am I doing (for example, “Teach me to play the drums”)?
Audiovisual logging: Who said what when (for example, “What was the main topic during the lesson?”)?
Interaction sociale: Who interacts with whom (for example, “Help me hear the person talking to me better in this noisy restaurant”?)?
Right now, AI systems would find it incredibly difficult to solve any of these problems, but creating datasets and benchmarks are proven methods to boost development in AI.
Indeed, the creation of a particular dataset and associated annual competition, known as ImageNet, is often credited with starting the recent AI boom. ImagetNet datasets consist of images of a wide variety of objects that researchers have trained AI systems to identify. In 2012, the winning entry to the competition used a particular deep learning method to blow up rivals, ushering in the current era of research.
Facebook’s Ego4D dataset is expected to help spur research into AI systems capable of analyzing data in the first person. Image: Facebook
Facebook hopes its Ego4D project will have similar effects for the augmented reality world. The company says the systems trained on Ego4D could one day be used not only in portable cameras, but also in home assistant robots, which also rely on first-person cameras to navigate the world around them. .
“The project has the chance to really catalyze work in this area in a way that hasn’t really been possible yet,” Grauman says. “To take our field from the ability to analyze stacks of photos and videos taken by humans for a very special purpose, to that smooth, continuous first-person visual flow that AR systems, robots, need to understand. in the context of activity. “
Facebook’s development of AI surveillance systems will be of great concern
While the tasks Facebook describes certainly seem practical, the company’s interest in this area will be of great concern. Facebook’s privacy record is appalling, covering data breaches and $ 5 billion FTC fines. It has also been repeatedly shown that the company values growth and engagement above user well-being in many areas. With this in mind, it is worrying that the references of this Ego4D project do not include important guarantees of confidentiality. For example, the task of “audiovisual diarization” (transcribing what different people say) never mentions the deletion of data on people who do not wish to be recorded.
Asked about these issues, a Facebook spokesperson told The Verge that he expected privacy safeguards to be introduced later. “We expect that as companies use this dataset and repository to develop business applications, they will develop warranties for those applications,” the spokesperson said. “For example, before AR glasses can improve someone’s voice, there might be a protocol in place that they follow to request permission from someone else’s glasses, or they might limit the range. device so that it can only pick up sounds from people with whom I already have a conversation or who are in my immediate vicinity.
For the moment, such guarantees are only hypothetical.