Facebook wants to build artificial intelligence that learns to understand the world like humans—by watching our every move.

The tech giant has announced plans to teach AI to ‘understand and interact with the world like we do’ in the first person. It hopes to do this by using video and audio from augmented reality (AR) glasses like its new high-tech Ray-Bans.

“AI typically learns from photos and videos captured in third-person, but next-generation AI will need to learn from videos that show the world from the center of the action,” the company said.

It went on: “AI that understands the world from this point of view could unlock a new era of immersive experiences.”

For the Ego4D project, Facebook gathered 2,200 hours of first-person video from 700 people going about their daily lives in order to begin training its AI assistants. It says it wants to teach AI to:

  • remember things, so we can ask it ‘what happened when’
  • predict human actions and try to anticipate our needs
  • manipulate hands and objects in order to learn new skills
  • keep a video ‘diary’ of everyday life and recall specific moments
  • learn and understand social interaction

These tasks can’t be performed by an AI system right now, but could play a central role in Facebook’s plans to build the ‘metaverse’; a digital 3D overlay of reality using VR and AR.

Facebook's new Ray-Ban smart glasses are fitted with tiny cameras that can film our every move
Facebook’s new Ray-Ban smart glasses are fitted with tiny cameras that can film our every move (Image: Evan Blass/ RayBan)