Run FieldSpace on your own data.
One Docker command. Your scenarios stay on your machine. No telemetry, no network calls, air-gapped. Output matches our public nuPlan benchmark so your numbers drop straight into the same comparison framework.
See the public numbers first: FieldSpace vs UrbanDriver vs PlanCNN on 64 nuPlan scenarios. The eval-kit reproduces that exact metric pipeline on your data.
Two tiers. Friction-free door, protected pilot.
Click-through, no NDA
- • 50-scenario cap per license
- • 30-day expiry
- • Watermarked outputs (metric values unaffected)
- • Air-gapped, zero telemetry
- • Internal evaluation only
- • Email turnaround on a license token, typically same day
Mutual NDA + commercial license
- • Production-relevant volume, no scenario cap
- • Annual term, renewable
- • Direct technical engagement
- • Same air-gapped binary; legal scope differs
- • Optional adapter help for your data format
The demo tier exists to break the friction that kills tier-1 evaluations: NDAs take weeks, by which time the conversation is over. Click through, see the numbers on your own data, then we have the pilot conversation.
How it works
- 1Convert your logs to the canonical JSON schema.
Three nouns per frame: ego state, list of obstacles, optional recorded next state for side-by-side comparison. SI units. See the schema doc; most teams convert in a day.
- 2Pull the Docker image and your license token.
One image, one signed token. The token carries your scenario cap and expiry; the binary refuses to run otherwise.
- 3Run, air-gapped, on your hardware.
docker run --rm --network=none \ -e FIELDSPACE_LICENSE="<token>" \ -e FIELDSPACE_EULA_ACCEPT=1 \ -v $PWD/data:/data:ro -v $PWD/out:/out \ ghcr.io/jackalkahwati/fieldspace-eval:demo \ --input /data --output /out - 4Read the report.
Per-scenario metric rows plus a compact summary JSON in the same format as our public nuPlan benchmark — collision avoidance, time-to-collision, comfort, progress, speed-limit compliance, per-step compute latency. SHA256 manifest, git commit, and watermark included.
Your data stays yours. Our source stays ours.
- • Your scenario data never leaves your environment.
- • Container runs with
--network=none. Zero outbound calls. - • Report output is yours, on your filesystem.
- • You decide what (if anything) you share back with us.
- • Compiled Rust binary in a minimal image. No source files.
- • Stripped release build, no debug symbols.
- • Output watermarked with your license ID (deterministic; metric values unaffected).
- • EULA prohibits reverse engineering and use of outputs to train competing planners.
The eval-kit's whole point is friction-free evaluation. The protections above are the mechanical floor; the EULA is the legal floor. See EULA-demo.md in the image, displayed on first run.
What the report contains
{
"fieldspace": {
"runner": { "rows": 50, "succeeded": 50, "scenario_names": [...],
"mean_compute_trajectory_runtime_s": 0.0084 },
"metric_means": {
"no_ego_at_fault_collisions": { "metric_score": 0.977 },
"time_to_collision_within_bound": { "metric_score": 0.922 },
"speed_limit_compliance": { "metric_score": 1.000 },
"ego_is_comfortable": { "metric_score": 0.938 },
"ego_progress_along_expert_route": { "metric_score": 0.966 }
}
},
"_meta": {
"tool_version": "0.1.0",
"git_commit": "7b64da7...",
"license_id": "<your license id>",
"oem_id": "<your org>",
"tier": "demo",
"watermark": "<sha256 fingerprint>"
}
}Same key set as reproducibility/nuplan_official_neural_64_summary_compact.json. Drop your output into any tool that already reads the public benchmark; everything lines up.
Request a demo license
Click-through EULA, no signed NDA. We respond within one business day with a license token and the Docker image link. The token expires in 30 days and is capped at 50 scenarios. Both numbers can move up under a pilot license.
Email eval-kit@fieldspacetech.com with:
- Your name and organization
- An OEM identifier we should bind the license to (we'll bake it into the token)
- A one-line description of what data you plan to evaluate against
We do not store, log, or ask for your evaluation data — only the metadata above. The eval-kit runs entirely in your environment.
Already convinced you want a pilot? Email the same address with pilot in the subject line and we'll send the NDA template.
This is the v0.1 demo tier. Production JWT issuance pipeline, tcpdump-clean CI gating, robust output watermarking, and additional adapter coverage are tracked as follow-ons. The eval-kit's mathematical core and metric format are stable.