filepacks

The open artifact layer for agent-generated work.
Portable evidence for humans. Deterministic artifacts for agents.

filepacks is an open-source CLI, format, and core library that turns agent runs, eval outputs, and CI files into deterministic .fpk artifacts that stay portable, inspectable, and comparable across workflows.

Loading terminal…

Runs become artifacts. Artifacts become reviewable evidence.

filepacks gives agent workflows an output layer: deterministic .fpk files that are portable enough for handoff, inspectable enough for review, and stable enough for repeated automation.

agent-run-42/
report.md
results.json
trace.log
eval.csv
↓ pack
run-42.fpk4 files · sha256

Give every run a stable artifact identity.

Pack the files an agent, eval, or CI job produced into one .fpk with canonical ordering and per-file digests. Same logical input, same artifact bytes.

manifest.json
report.mda3f2…c1d8
results.json9b1e…44fa
trace.logd07c…8821
eval.csvfc40…2b6e
✓ verified

Keep review grounded in the actual output.

The manifest is structured enough for agents. The payload stays readable for humans. Review starts from the artifact itself instead of logs, screenshots, or summary text.

baseline.fpklatest.fpk
report.md
+ eval.csv
~ results.json
- trace.log

Treat baseline vs. latest as a first-class workflow.

Diff artifacts across prompt, model, tool, or code changes. Surface real output drift before behavior reaches users or becomes the next accepted baseline.

report.md
results.json
trace.log
run.fpk
sealed · sha2…

Turn loose output into one portable review unit.

One command turns a directory into a self-contained .fpk with per-file SHA-256 digests, ready for local review, CI handoff, or downstream automation.

baseline.fpk
latest.fpk
diff
+ report.md
~ results.js…

See exactly what changed between runs.

Use structural diffs as a reusable review primitive for agents, eval harnesses, and release checks instead of ad hoc directory comparisons.

One CLI. Four public commands.

pack · inspect · verify · compare — a narrow open-source surface for creating, checking, and diffing deterministic .fpk artifacts.

pack

npx filepacks pack ./run --output run.fpk

Seal a run into a portable artifact.

Turn an agent, eval, or CI output directory into a deterministic .fpk with a canonical manifest and preserved payload files.

inspect

Inspect
name=run
files=42

Read what was packaged before you trust it.

Inspect manifest identity, scale, and packaged files without unpacking into a working tree or relying on a separate dashboard.

verify

Verify
ok=true
files_checked=42

Check that the evidence still matches the manifest.

Validate archive structure, payload hashes, and manifest integrity before using an artifact as a baseline or handing it to another step.

compare

- baseline/report.md
+ latest/report.md
~ eval/results.json

Diff baseline versus candidate output.

Surface added, removed, and changed files across two artifacts with a stable exit code that fits CI gates and agent review loops.

Use cases. For teams shipping with agents.

Capture the run, verify the artifact, compare the next one, and hand the same evidence to humans or automation.

baseline.fpk
  │
  ├─ prompt
  ├─ output
  └─ scores

latest.fpk
  │
  └─ Δ output

Keep comparable baselines across prompts, models, and tools.

Store each eval run as a deterministic artifact so output drift is reviewable, diffable, and reusable across experiments.

agent run
   │
review diff
   │
   ├─ approve
   └─ hold

Review agent-generated output before it reaches users.

Compare a candidate artifact against the accepted run and decide from the actual files, not a polished summary of what the agent claims it did.

ci run
  │
pack .fpk
  │
  ├─ verify ✓
  └─ compare Δ

Build review pipelines on an open artifact format.

Use .fpk as a shared unit across local tooling, CI, and downstream automation without locking artifact handling into a single vendor surface.

Start with one run. Keep the evidence.

Run the open-source CLI on one output directory, inspect the artifact, then compare the next run against it. That is how reviewable, reusable agent workflows start.