The emerging field of physical AI is facing a critical bottleneck: a severe shortage of high‑quality training data for robots. Two developments this week highlight how the industry is racing to solve that problem. Startup XDOF emerged from stealth with a $70 million funding round to build the data pipelines that frontier AI labs desperately need, while Nvidia, Carnegie Mellon University, and UC Berkeley released ENPIRE, a framework that lets AI coding agents train robots autonomously—without human supervision.
XDOF’s data ecosystem
XDOF, founded by UC Berkeley robotics researchers Philippe Wu, Fred Shentu, and Nemo Jin, aims to create a full data‑feedback loop for teaching machines physical skills. The company, which already counts 20 customers including several unnamed frontier AI labs, raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo. Wu explained that while language models can lean on vast public text datasets, robot training data—capturing grasping, folding, or insertion—barely exists. XDOF plans to work across three tiers: teleoperation data collected directly on deployment robots, general‑purpose teleoperation data, and “egocentric” data gathered by human operators wearing custom wearable sensors. As a launchpad, the company is releasing the ABC dataset in partnership with UC Berkeley’s AI Research lab, which it calls the largest collection of high‑quality robot manipulation data ever assembled, including 130,000 trajectories, 300 hours of simulation, and 100 hours of evaluations.
ENPIRE: self‑training robot fleets
In a separate paper, Nvidia’s GEAR lab demonstrated ENPIRE, a system where AI coding agents—OpenAI’s Codex, Anthropic’s Claude Code, and Moonshot’s Kimi Code—run the full loop of writing, testing, and rewriting code directly on physical robots. The only human involvement came after the experiment was over. An eight‑robot fleet achieved a 99% success rate on tasks like pin insertion, GPU seating, and zip‑tie cutting. Scaling from one to eight robots cut training time by more than half, though token consumption grew faster than time saved. The work extends Nvidia’s Eureka project from 2023, moving the self‑improvement loop from simulation to the unpredictable real world where friction and lighting matter.
Together, the announcements underscore a shift: as labs from OpenAI to Alibaba reignite robotics programs, the winners may be those who master the data infrastructure, not just model architectures. XDOF’s bet on teleoperation armies and ENPIRE’s hands‑off training both point toward a future where robots are taught by machines—and where high‑quality data, not just compute, becomes the new moat.