PROVENANCE DECIDES WHAT YOU CAN BUILD ON.
As physical AI scales, the question is no longer how much data you have. It is whether you can prove where it came from.
You can either build clean rights in at the moment of capture, or you cannot have them at all. There is no retrofitting consent onto footage that was gathered without it. That single constraint decides who gets to sell to a serious lab.
Training data is now a legal question as much as a technical one. A model inherits the rights — and the liabilities — of everything it learns from. For physical data, captured from real people inside real premises, consent and ownership are not a formality at the end of the process. A dataset with an unclear chain of rights is a risk that compounds silently, invisible right up to the day a buyer's lawyers open it and start asking who agreed to what.
This is the part the cheapest operators skip, because it is the expensive part. A worker can consent to being filmed, but if the premises owner never did, the footage is unusable the moment anyone examines it — and a factory floor, a clinic, a retail back-room each carries its own consent, its own jurisdiction, its own rules on how the resulting data may be pooled and licensed. Get the worker's signature and miss the premises and you have built a liability, not an asset. These things have to be established up front and preserved hour by hour, capture by capture, or they are gone.
We treat the chain of rights as part of the product, not paperwork wrapped around it. Every hour we deliver is consented, cleared, and traceable — built to satisfy the legal, compliance, and procurement teams who increasingly decide what a serious lab is allowed to train on. In a field that will be defined by trust, provenance is not a feature we added. It is the ground the whole thing stands on.