EVAL-KIT v0.1 — DEMO TIER

Run FieldSpace on your own data.

One Docker command. Your scenarios stay on your machine. No telemetry, no network calls, air-gapped. Output matches our public nuPlan benchmark so your numbers drop straight into the same comparison framework.

See the public numbers first: FieldSpace vs UrbanDriver vs PlanCNN on 64 nuPlan scenarios. The eval-kit reproduces that exact metric pipeline on your data.

air-gappedclick-through EULASHA256 pinnedno source exposed

Two tiers. Friction-free door, protected pilot.

DEMO TIER

Click-through, no NDA

  • • 50-scenario cap per license
  • • 30-day expiry
  • • Watermarked outputs (metric values unaffected)
  • • Air-gapped, zero telemetry
  • • Internal evaluation only
  • • Email turnaround on a license token, typically same day
PILOT TIER

Mutual NDA + commercial license

  • • Production-relevant volume, no scenario cap
  • • Annual term, renewable
  • • Direct technical engagement
  • • Same air-gapped binary; legal scope differs
  • • Optional adapter help for your data format

The demo tier exists to break the friction that kills tier-1 evaluations: NDAs take weeks, by which time the conversation is over. Click through, see the numbers on your own data, then we have the pilot conversation.

How it works

  1. 1
    Convert your logs to the canonical JSON schema.

    Three nouns per frame: ego state, list of obstacles, optional recorded next state for side-by-side comparison. SI units. See the schema doc; most teams convert in a day.

  2. 2
    Pull the Docker image and your license token.

    One image, one signed token. The token carries your scenario cap and expiry; the binary refuses to run otherwise.

  3. 3
    Run, air-gapped, on your hardware.
    docker run --rm --network=none \
      -e FIELDSPACE_LICENSE="<token>" \
      -e FIELDSPACE_EULA_ACCEPT=1 \
      -v $PWD/data:/data:ro -v $PWD/out:/out \
      ghcr.io/jackalkahwati/fieldspace-eval:demo \
      --input /data --output /out
  4. 4
    Read the report.

    Per-scenario metric rows plus a compact summary JSON in the same format as our public nuPlan benchmark — collision avoidance, time-to-collision, comfort, progress, speed-limit compliance, per-step compute latency. SHA256 manifest, git commit, and watermark included.

Your data stays yours. Our source stays ours.

YOUR SIDE
  • • Your scenario data never leaves your environment.
  • • Container runs with --network=none. Zero outbound calls.
  • • Report output is yours, on your filesystem.
  • • You decide what (if anything) you share back with us.
OUR SIDE
  • • Compiled Rust binary in a minimal image. No source files.
  • • Stripped release build, no debug symbols.
  • • Output watermarked with your license ID (deterministic; metric values unaffected).
  • • EULA prohibits reverse engineering and use of outputs to train competing planners.

The eval-kit's whole point is friction-free evaluation. The protections above are the mechanical floor; the EULA is the legal floor. See EULA-demo.md in the image, displayed on first run.

What the report contains

{
  "fieldspace": {
    "runner": { "rows": 50, "succeeded": 50, "scenario_names": [...],
                "mean_compute_trajectory_runtime_s": 0.0084 },
    "metric_means": {
      "no_ego_at_fault_collisions":      { "metric_score": 0.977 },
      "time_to_collision_within_bound":  { "metric_score": 0.922 },
      "speed_limit_compliance":          { "metric_score": 1.000 },
      "ego_is_comfortable":              { "metric_score": 0.938 },
      "ego_progress_along_expert_route": { "metric_score": 0.966 }
    }
  },
  "_meta": {
    "tool_version": "0.1.0",
    "git_commit":   "7b64da7...",
    "license_id":   "<your license id>",
    "oem_id":       "<your org>",
    "tier":         "demo",
    "watermark":    "<sha256 fingerprint>"
  }
}

Same key set as reproducibility/nuplan_official_neural_64_summary_compact.json. Drop your output into any tool that already reads the public benchmark; everything lines up.

Request a demo license

Click-through EULA, no signed NDA. We respond within one business day with a license token and the Docker image link. The token expires in 30 days and is capped at 50 scenarios. Both numbers can move up under a pilot license.

Email eval-kit@fieldspacetech.com with:

  • Your name and organization
  • An OEM identifier we should bind the license to (we'll bake it into the token)
  • A one-line description of what data you plan to evaluate against

We do not store, log, or ask for your evaluation data — only the metadata above. The eval-kit runs entirely in your environment.

Already convinced you want a pilot? Email the same address with pilot in the subject line and we'll send the NDA template.

This is the v0.1 demo tier. Production JWT issuance pipeline, tcpdump-clean CI gating, robust output watermarking, and additional adapter coverage are tracked as follow-ons. The eval-kit's mathematical core and metric format are stable.