EVAL-KIT v0.1 — DEMO TIER

Run FieldSpace on your own data.

One Docker command. Your scenarios stay on your machine. No telemetry, no network calls, air-gapped. Output matches our public nuPlan benchmark so your numbers drop straight into the same comparison framework.

See the public numbers first: FieldSpace vs UrbanDriver vs PlanCNN on 64 nuPlan scenarios. The eval-kit reproduces that exact metric pipeline on your data.

Request a demo license How it works

air-gappedclick-through EULASHA256 pinnedno source exposed

Two tiers. Friction-free door, protected pilot.

DEMO TIER

Click-through, no NDA

• 50-scenario cap per license
• 30-day expiry
• Watermarked outputs (metric values unaffected)
• Air-gapped, zero telemetry
• Internal evaluation only
• Email turnaround on a license token, typically same day

PILOT TIER

Mutual NDA + commercial license

• Production-relevant volume, no scenario cap
• Annual term, renewable
• Direct technical engagement
• Same air-gapped binary; legal scope differs
• Optional adapter help for your data format

The demo tier exists to break the friction that kills tier-1 evaluations: NDAs take weeks, by which time the conversation is over. Click through, see the numbers on your own data, then we have the pilot conversation.

How it works

1
Convert your logs to the canonical JSON schema.
Three nouns per frame: ego state, list of obstacles, optional recorded next state for side-by-side comparison. SI units. See the schema doc; most teams convert in a day.
2
Pull the Docker image and your license token.
One image, one signed token. The token carries your scenario cap and expiry; the binary refuses to run otherwise.

Run, air-gapped, on your hardware.

docker run --rm --network=none \
  -e FIELDSPACE_LICENSE="<token>" \
  -e FIELDSPACE_EULA_ACCEPT=1 \
  -v $PWD/data:/data:ro -v $PWD/out:/out \
  ghcr.io/jackalkahwati/fieldspace-eval:demo \
  --input /data --output /out

4
Read the report.
Per-scenario metric rows plus a compact summary JSON in the same format as our public nuPlan benchmark — collision avoidance, time-to-collision, comfort, progress, speed-limit compliance, per-step compute latency. SHA256 manifest, git commit, and watermark included.

Your data stays yours. Our source stays ours.

YOUR SIDE

• Your scenario data never leaves your environment.
• Container runs with --network=none. Zero outbound calls.
• Report output is yours, on your filesystem.
• You decide what (if anything) you share back with us.

OUR SIDE

• Compiled Rust binary in a minimal image. No source files.
• Stripped release build, no debug symbols.
• Output watermarked with your license ID (deterministic; metric values unaffected).
• EULA prohibits reverse engineering and use of outputs to train competing planners.

The eval-kit's whole point is friction-free evaluation. The protections above are the mechanical floor; the EULA is the legal floor. See EULA-demo.md in the image, displayed on first run.

What the report contains

{
  "fieldspace": {
    "runner": { "rows": 50, "succeeded": 50, "scenario_names": [...],
                "mean_compute_trajectory_runtime_s": 0.0084 },
    "metric_means": {
      "no_ego_at_fault_collisions":      { "metric_score": 0.977 },
      "time_to_collision_within_bound":  { "metric_score": 0.922 },
      "speed_limit_compliance":          { "metric_score": 1.000 },
      "ego_is_comfortable":              { "metric_score": 0.938 },
      "ego_progress_along_expert_route": { "metric_score": 0.966 }
    }
  },
  "_meta": {
    "tool_version": "0.1.0",
    "git_commit":   "7b64da7...",
    "license_id":   "<your license id>",
    "oem_id":       "<your org>",
    "tier":         "demo",
    "watermark":    "<sha256 fingerprint>"
  }
}

Same key set as reproducibility/nuplan_official_neural_64_summary_compact.json. Drop your output into any tool that already reads the public benchmark; everything lines up.

Request a demo license

Click-through EULA, no signed NDA. We respond within one business day with a license token and the Docker image link. The token expires in 30 days and is capped at 50 scenarios. Both numbers can move up under a pilot license.

Email eval-kit@fieldspacetech.com with:

Your name and organization
An OEM identifier we should bind the license to (we'll bake it into the token)
A one-line description of what data you plan to evaluate against

We do not store, log, or ask for your evaluation data — only the metadata above. The eval-kit runs entirely in your environment.

Already convinced you want a pilot? Email the same address with pilot in the subject line and we'll send the NDA template.

This is the v0.1 demo tier. Production JWT issuance pipeline, tcpdump-clean CI gating, robust output watermarking, and additional adapter coverage are tracked as follow-ons. The eval-kit's mathematical core and metric format are stable.