Deterministic safety evidence, replayable.
FieldSpace now has a public-data evidence trail across openpilot / comma.ai replay, Waymo observer-mode scenarios, official nuPlan closed-loop simulation, and the first shared nuPlan neural-baseline smoke scenario.
Four public evidence paths, one deterministic layer
The latest package reframes FieldSpace as an independent safety observer around neural ADAS stacks. The point is practical: replay public or partner-selected scenes, compare behavior against baselines, and inspect the audit trail behind every safety judgment.
White-paper replay slice with sub-millisecond CPU latency.
4,550 frames and 15 exported trigger windows.
0 runner failures against SimplePlanner and IDMPlanner.
Full 64-scenario closed-loop vs UrbanDriver + PlanCNN. FieldSpace 0.977 collision, 8.4 ms/step CPU.
FieldSpace vs UrbanDriver vs PlanCNN
Full 64-scenario official nuPlan closed-loop suite. Identical scenario set across all five planners (_comparison.same_scenarios = true). Engine frozen, no nuPlan-specific tuning. UrbanDriver and PlanCNN run from the public tuPlan-Garage checkpoints (SHA256-allowlisted). Numbers below are metric_score means across the 64 scenarios, 1.0 = best.
| Metric (higher = safer / better) | FieldSpace | UrbanDriver | PlanCNN |
|---|---|---|---|
| No at-fault collision | 0.977 | 0.375 | 0.938 |
| TTC within bound | 0.922 | 0.297 | 0.922 |
| Speed-limit compliance | 1.000 | 0.695 | 1.000 |
| Comfort (lat-accel + jerk thresholds) | 0.938 | 1.000 | 0.844 |
| Progress along route | 0.966 | 0.980 | 0.981 |
| CPU compute / planning step | 8.4 ms | 158.6 ms | 150.9 ms |
FieldSpace matches or beats both neural planners on the safety metrics a homologation team cares about (collisions, TTC, speed-limit compliance), at roughly 18× less compute on CPU.
UrbanDriver's public checkpoint is trained open-loop and is a known-weak closed-loop baseline — read the result as FieldSpace vs PlanCNN, where FieldSpace wins on safety and comfort at 18× less compute. Full note: reproducibility/nuplan_official_neural_64_results.md.
The eval-kit runs the same scoring pipeline against your scenarios, air-gapped, in your environment. Click-through EULA, no NDA at the door.
Control loop, per tick
End-to-end drive loop: HD map localize → predict → PDE field solve → control. Measured over 10 000 consecutive ticks on each platform.
| Platform | Mean | p95 | p99.9 | Budget |
|---|---|---|---|---|
| Raspberry Pi 5 (8 GB) | 3.1 ms | 4.8 ms | 7.2 ms | 50 ms |
| Jetson Orin Nano (CPU mode) | 1.9 ms | 2.6 ms | 4.1 ms | 50 ms |
| x86 laptop (i7-12700H) | 0.8 ms | 1.2 ms | 2.3 ms | 50 ms |
Bit-identical over 10 000 runs
Compare the byte-for-byte output of a scenario replay against a stored golden trace. Any diff is a regression.
In-ODD protocol checks passed
Internal implementations of NHTSA 37 pre-crash families and Euro NCAP AEB/FCW-style protocols, implemented scenario-by-scenario in harness/scenarios/. Pass criterion: hazard trigger TTC meets or beats the configured protocol threshold.
Euro NCAP AEB/FCW
79 / 79 pass- · Car-to-Car Rear (stationary / moving / braking)
- · Car-to-Car Front (head-on / turn-across / cut-in)
- · Pedestrian (near-side / far-side / longitudinal)
- · Cyclist (near-side / far-side / longitudinal)
NHTSA 37 pre-crash
121 / 121 pass- · Rear-end (decelerating / stopped / lower constant)
- · VRU (pedestrian / cyclist / animal crashes)
- · Cut-in / opposite-direction encroachment
- · 21 scenarios skipped for out-of-ODD (HD map, V2X, reverse, ECU failure)
182 505 frames of public drive data
We run the FieldSpace Safety Observer frame-for-frame against comma.ai's openpilot CI route bucket — real cars, real roads, real radar and vision, with openpilot's own carControl log as the counterfactual. Lead times are measured, not asserted.
| Metric | Tier-3 hazardous bucket |
|---|---|
| Routes replayed | 2 routes / 31 segments |
| Frames processed | 182 505 |
| Wall-clock replay time | 184 s (≈ 993× faster than real-time) |
| FieldSpace hazard events | 1 warning · 0 false critical |
| FP reduction vs. prior observer | −85% (7 events → 1) |
5 synthetic scenarios, 5 earlier detections
Original closed-loop CARLA suite. Kept here because it isolates the PDE-propagation lead-time contribution from confounds that real-world data introduces.
| Scenario | Baseline TTC | FieldSpace TTC | Lead-time |
|---|---|---|---|
| Cut-in, 40 km/h | 1.2 s | 2.1 s | +0.9 s |
| Sudden brake, lead car | 1.6 s | 2.4 s | +0.8 s |
| Pedestrian step-out | 0.9 s | 1.7 s | +0.8 s |
| Oncoming lane drift | 1.1 s | 1.9 s | +0.8 s |
| Occluded cyclist | 0.7 s | 1.3 s | +0.6 s |
Bring your own log
Replay adapters and benchmark harnesses are checked in for the public datasets below. They exist so customers can point us at fleet data or a public benchmark and replay the same observer.
Bring your drive logs. We'll ship the evidence.
Send us an MCAP or rlog. We run the Safety Observer and send back a side-by-side replay report your technical team can review.