New Development Will Give AI a Window into Complex Human Activity – Technology Org

More than 1,400 hours of footage capturing humans performing tasks simultaneously from their point of view and externally, will help give AI models an understanding of how humans carry out activities.

New Development Will Give AI a Window into Complex Human Activity – Technology Org

Image showing participant with bike. Image credit: University of Bristol

Building on the work that two years ago led to the release of Egocentric 4D Live Perception, the world’s most diverse egocentric dataset, the Ego4D consortium has drastically expanded the reach and ambition of their research with the newly published Ego-Exo4D – a foundational dataset to support research on video learning and multimodal perception.

A University of Bristol research team led by Professor Dima Damen at the School of Computer Science is part of an international consortium of 13 universities in partnership with Meta that is driving research in computer vision through collecting joint egocentric and exocentric datasets of human skilled activities.

The result of a two-year effort by Meta’s FAIR (Fundamental Artificial Intelligence Research), Project Aria, and the Ego4D consortium of 13 university partners, Ego-Exo4D is a first-of-its-kind large-scale multimodal multiview dataset and benchmark suite.

Its defining feature is its simultaneous capture of both first-person ‘egocentric’ views from a participant’s wearable camera and multiple ‘exocentric’ views from cameras surrounding the participant.

Together, these two perspectives will give AI models a new window into complex skilled human activity allowing approaches to capture an understanding of how skilled participants perform tasks such as dancing, playing music as well as carry out procedures such as maintaining a bicycle.

Reflecting on the work of the consortium and her team’s contributions, Professor Damen remarked: “We are thrilled to be part of this international consortium releasing the Ego-Exo4D dataset. Today marks the outcome of a 2-year research collaboration, that continues to push the egocentric research community to new grounds.”

Co-leading the project at Bristol, Dr Michael Wray is particularly interested in the interplay between skilled activities and descriptive language. “At Bristol, we proposed the Act-and-Narrate recordings, that is, the capture of the person’s internal state – why they perform tasks in a particular manner.

The Ego-Exo4D project also innovates by providing Expert Commentary narrations – these are domain experts who watch the videos and provide rich feedback on the performance.

Not only through the multiview of vision but also through language we have the ‘ego’ language of the participant and the ‘exo’ language of an expert observer offering rich new insights into the very important research topic of how large language models interplay with assistive technology.”

The Ego4D consortium is a long-running collaboration between FAIR and more than a dozen universities around the world. The consortium members and FAIR researchers collaborated on all aspects of the project, from developing the dataset scope, to collecting the data, to formulating the benchmark tasks.

This project also marks the largest ever deployment of the Aria glasses in the academic research community, with partners at 12 different sites using them.

Prof Damen’s group is a lead research group in egocentric vision internationally, and their expertise has been instrumental in the consortium’s work since its very inception.

“Starting from EPIC-KITCHENS in 2018 and continuing through the massive scale Ego4D and this new addition Ego-Exo4D continues to place the University of Bristol as a key lead in egocentric vision internationally and the only UK research group in this key futuristic area,” Professor Damen commented.

In addition to the captured footage, annotations for novel benchmark tasks, and baseline models for ego-exo understanding are being made available for researchers. The datasets will be publicly available in December of this year for researchers who sign Ego4D’s data use agreement.

The data was collected following rigorous privacy and ethics standards, including formal review processes at each institution to establish the standards for collection, management, and informed consent, as well as a license agreement prescribing proper.

With this release, the Ego4D consortium aims to provide the tools the broader research community needs to explore ego-exo video, multimodal activity recognition, and beyond. 

Source: University of Bristol