Figure's Helix is an advanced Vision-Language-Action (VLA) model designed to enhance humanoid robot control by integrating perception, language comprehension, and learned motor skills. This unified approach enables robots to interpret and execute complex tasks based on natural language instructions, facilitating seamless human-robot interaction. Notably, Helix employs a dual-system architecture: System 2 (S2) processes high-level semantic goals at a slower pace, while System 1 (S1) manages rapid, real-time visuo-motor control, allowing for precise and context-aware actions.

A key feature of Helix is its capability for full upper-body control, encompassing the coordinated movement of wrists, torso, head, and individual fingers. This dexterity is achieved through a single set of neural network weights, enabling the robot to learn a wide range of behaviors—such as picking and placing items, operating household appliances, and engaging in collaborative tasks—without the need for task-specific fine-tuning. Additionally, Helix supports multi-robot collaboration, allowing two robots to work together on shared tasks involving unfamiliar objects, thereby expanding the potential for complex, coordinated operations.

In practical applications, Helix has demonstrated proficiency in logistics, particularly in package handling and sorting. The system's implicit stereo vision provides a rich 3D understanding, enabling precise, depth-aware movements essential for manipulating packages of varying sizes and shapes. Moreover, Helix's learned visual proprioception allows for seamless cross-robot transfer, as each robot can self-calibrate to accommodate hardware variations. A notable enhancement, termed "Sport Mode," permits the system to execute tasks at speeds surpassing those of human demonstrations while maintaining high success rates and dexterity, thereby improving operational efficiency in dynamic environments.

Height 168 m
Payload 20 kg