# filepacks

Deterministic artifact packaging for AI agent runs, eval outputs, CI evidence, and structural diff workflows.

filepacks turns a directory of generated files into a portable `.fpk` artifact that you can inspect, verify, compare, and hand off as durable evidence. It is built for agent systems, evaluation pipelines, and CI workflows that need stable artifacts instead of disposable logs.

[Human view](/) · [Agent markdown](/agent.md) · [Plain text](/agent.txt) · [Docs](/docs) · [GitHub](https://github.com/robertonevarez/filepacks-oss)

---

## What filepacks is

filepacks packages run outputs into portable `.fpk` bundles. A `.fpk` file is a POSIX tar archive containing a canonical `manifest.json` and a `payload/` directory of normalized files.

The manifest records every file path, size, and hash. The payload stores the actual output files. The archive bytes are deterministic: the same input tree produces the same artifact every time.

You can use filepacks to:

- capture the output of an agent run or eval pipeline in one portable file
- verify that an artifact is well-formed and untampered before trusting it
- compare two artifacts structurally to detect regressions, drift, or changes
- hand off evidence to a human reviewer or another agent without hidden state

---

## Who it is for

- AI platform teams building agent or evaluation infrastructure
- eval and prompt engineering teams comparing outputs across prompts, models, and tools
- agent product teams reviewing baseline vs latest behavior before release
- developers building CI or regression workflows around LLM-generated files and reports

---

## Problems it solves

- agent outputs are difficult to audit when evidence lives in logs, traces, or temporary storage
- eval results drift across prompt, model, tool, or code changes without a stable baseline
- CI artifacts are often unstructured and hard to compare across runs
- humans and agents need shared evidence they can both inspect without a separate dashboard

---

## Core concepts

- **deterministic artifacts** — the same input tree yields the same packaged evidence every time
- **structural diffs** — compare files and metadata across runs, not only narrative summaries
- **regression detection** — surface output drift before release or deployment
- **baseline vs latest** — compare new evidence against a known-good reference artifact
- **portable evidence** — one file instead of a loose directory tree or ephemeral log stream

---

## Artifact format

A `.fpk` file is a POSIX tar archive with this structure:

```
artifact.fpk
├── manifest.json     canonical metadata document
└── payload/          normalized output files
    ├── outputs/
    ├── traces/
    └── ...
```

The manifest records:

- `name` — artifact identifier
- `version` — format version
- `digest` — sha256 digest of the archive
- `files` — array of entries, each with `path`, `size`, and `hash`

The manifest is the verification and comparison primitive. Verification checks that the archive is well-formed and every file hash matches. Comparison diffs two manifests to identify added, removed, and changed files.

---

## CLI interface

```text
filepacks pack <input> --output <file>
# Package a directory into a deterministic .fpk artifact.

filepacks inspect <file>
# Read manifest metadata and print a compact summary.

filepacks verify <file>
# Validate archive structure, manifest, and payload hashes.

filepacks compare <baseline> <candidate>
# Compute a structural diff between two artifacts.
```

### Exit codes

| Command   | 0              | 1     | 20              |
| --------- | -------------- | ----- | --------------- |
| pack      | success        | error | —               |
| inspect   | success        | error | —               |
| verify    | valid          | error | —               |
| compare   | identical      | error | changed         |

Exit code `20` from `compare` means the artifacts differ. This is a stable, scriptable signal for CI gates and agent decision loops.

### Output format

Commands print machine-readable key=value lines:

```text
# pack / inspect
input=./run
output=run.fpk
name=run
digest=sha256:...
files=12
bytes=40960

# verify
ok=true
files_checked=12

# compare
ok=false
added=1
removed=0
changed=2
added_file=payload/traces/run.log
changed_file=payload/outputs/summary.json
changed_file=payload/metrics.json
```

---

## Typical workflows

**1. Capture agent run output**

```sh
filepacks pack ./run-output --output run-$(date +%s).fpk
```

**2. Verify before trusting**

```sh
filepacks verify run.fpk
# exits 0 if artifact is valid
```

**3. Compare against accepted baseline**

```sh
filepacks compare baseline.fpk latest.fpk
# exits 0 if identical, 20 if changed
```

**4. Surface a structural diff for review**

```sh
filepacks compare eval-v1.fpk eval-v2.fpk
# outputs per-file changed_file= lines for review
```

**5. Programmatic use via `@filepacks/core`**

```ts
import { pack, inspect, verify, compare } from "@filepacks/core"

const artifact = await pack({ input: "./run", output: "run.fpk" })
const result = await compare("baseline.fpk", "run.fpk")
if (result.changed) {
  // surface diff to reviewer or fail CI gate
}
```

---

## Human and agent review

Agents operate on artifacts directly through `pack`, `inspect`, `verify`, and `compare`. Humans inspect the same manifest and payload files through the same commands. There is no separate dashboard state, no hidden context, and no out-of-band summary that the other party cannot see.

Review stays grounded in shared, portable evidence.

---

## What filepacks is not

- not generic file sharing or cloud storage
- not a full MLOps platform or experiment tracker
- not a streaming log or observability tool
- not a hosted review UI (the web app is the review surface; the artifact is the source of truth)
- not a registry or remote artifact store (v0 is local and portable; cloud features are not in the OSS surface)

---

## Current status

- `.fpk` format is stable from v0.1.0
- CLI commands `pack`, `inspect`, `verify`, and `compare` are in the public OSS surface
- Determinism is a core guarantee: same input tree, same archive bytes
- `@filepacks/core` exposes the same operations as a programmatic API
- Not yet in scope for v0: registries, remote storage, sync, tags, aliases, typed eval artifacts, hosted dashboards

---

## Suggested agent tasks

If you are an agent evaluating filepacks, here are useful things to try:

1. Run `npx filepacks pack ./some-dir --output test.fpk` and inspect the resulting archive.
2. Run `npx filepacks inspect test.fpk` and parse the key=value output.
3. Run `npx filepacks verify test.fpk` and confirm exit code 0.
4. Create a second pack from a modified directory and run `npx filepacks compare test.fpk test2.fpk`.
5. Check that exit code 20 is returned when files differ.
6. Use `@filepacks/core` to integrate pack/compare into an agent harness.

---

## Links

- [Docs](/docs)
- [CLI reference](/docs/cli)
- [Quickstart](/docs/quickstart)
- [GitHub](https://github.com/robertonevarez/filepacks-oss)
- [Human view](/)
- [Agent markdown](/agent.md)
- [Plain text](/agent.txt)
