Researchers from Intel Labs, in collaboration with academic and industry experts, have introduced a groundbreaking technique for generating realistic and directable human motion from sparse, multi-modal inputs. Their work, highlighted at the European Conference on Computer Vision (ECCV 2024), focuses on overcoming the challenges of generating natural, physically-based human behaviors in high-dimensional humanoid characters. This research is part of Intel Labs’ broader initiative to advance computer vision and machine learning.
Intel Labs and its partners recently presented six cutting-edge papers at ECCV 2024, a premier conference organized by the European Computer Vision Association (ECVA).
The paper Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs showcased innovations including a novel defense strategy for protecting text-to-image models from prompt-based red teaming attacks and the development of a large-scale dataset designed to improve spatial consistency in these models. Among these contributions, the paper highlights Intel’s dedication to advancing generative modeling while prioritizing responsible AI practices.
Generating Realistic Human Motions Using Multi-Modal Inputs
Intel’s Masked Humanoid Controller (MHC) is a breakthrough system designed to generate human-like motion in simulated physics environments. Unlike traditional methods that rely heavily on fully detailed motion capture data, the MHC is built to handle sparse, incomplete, or partial input data from a variety of sources. These sources can include VR controllers, which might only track hand or head movements; joystick inputs that give only high-level navigation commands; video tracking, where certain body parts might be occluded; or even abstract instructions derived from text prompts.
The technology’s innovation lies in its ability to interpret and fill in the gaps where data is missing or incomplete. It achieves this through what Intel terms the Catch-up, Combine, and Complete (CCC) capabilities:
- Catch-up: This feature allows the MHC to recover and resynchronize its motion when disruptions occur, such as when the system starts in a failed state, like a humanoid character that has fallen. The system can quickly correct its movements and resume natural motion without retraining or manual adjustments.
- Combine: MHC can blend different motion sequences together, such as merging upper body movements from one action (e.g., waving) with lower body actions from another (e.g., walking). This flexibility allows for the generation of entirely new behaviors from existing motion data.
- Complete: When given sparse inputs, such as partial body movement data or vague high-level directives, the MHC can intelligently infer and generate the missing parts of the motion. For example, if only arm movements are specified, the MHC can autonomously generate corresponding leg motions to maintain physical balance and realism.
The result is a highly adaptable motion generation system that can create smooth, realistic, and physically accurate movements, even with incomplete or under-specified directives. This makes MHC ideal for applications in gaming, robotics, virtual reality, and any scenario where high-quality human-like motion is needed but input data is limited.
The Impact of MHC on Generative Motion Models
The Masked Humanoid Controller (MHC) is part of a broader effort by Intel Labs and its collaborators to responsibly build generative models, including those that power text-to-image and 3D generation tasks. As discussed at ECCV 2024, this approach has significant implications for industries like robotics, virtual reality, gaming, and simulation, where the generation of realistic human motion is crucial. By incorporating multi-modal inputs and enabling the controller to seamlessly transition between motions, the MHC can handle real-world conditions where sensor data may be noisy or incomplete.
This work by Intel Labs stands alongside other advanced research presented at ECCV 2024, such as their novel defense for text-to-image models and the development of techniques for improving spatial consistency in image generation. Together, these advancements showcase Intel’s leadership in the field of computer vision, with a focus on developing secure, scalable, and responsible AI technologies.
Conclusion
The Masked Humanoid Controller (MHC), developed by Intel Labs and academic collaborators, represents a critical step forward in the field of human motion generation. By tackling the complex control problem of generating realistic movements from multi-modal inputs, the MHC paves the way for new applications in VR, gaming, robotics, and simulation. This research, featured at ECCV 2024, demonstrates Intel’s commitment to advancing responsible AI and generative modeling, contributing to safer and more adaptive technologies across various domains.