Chapter 5: Deployment

Core Questions

Why is robotics data uniquely difficult for VLMs compared to pure vision/text?
How do we leverage VLMs for auto-labeling at scale?
How do we distill foundation model reasoning into real-time controllers?

Topics

3.1 Data Diversity & The Long Tail

The unique challenges of robotics data
Multi-modal sensor streams (vision, depth, proprioception)
Edge cases and safety-critical scenarios
Why internet-scale pre-training isn't enough

3.2 Semantic Supervision

Using VLMs to auto-label petabytes of sensor data
Bootstrapping low-level policy training
Quality validation for automated labeling
Human-in-the-loop verification strategies

3.3 Policy Distillation

Transferring high-level reasoning to edge hardware
Methods for compressing foundation models
Real-time constraints and latency budgets
Maintaining safety guarantees during distillation

3.4 Safety-Critical Scaling

The LISS framework for autonomous fleets
Formalizing "Similar Miles" validation
Hardware-in-the-loop at scale
OOD detection and graceful degradation in production