What is the difference between Eve and Flue?

Eve and Flue are both TypeScript frameworks where an agent is a directory of files, tools and skills are files, and sessions are durable. The difference is posture. Eve is Vercel's framework: it scaffolds an ordinary Vercel project and deploys with vercel deploy, leaning on Vercel Sandbox and Vercel Connect when hosted there. Flue is the Astro team's framework: it is built to run any model and deploy anywhere, from Node and Cloudflare Workers to GitHub Actions, and its 1.0 beta announcement describes zero lock-in. Pick by where you deploy and how much vendor coupling you will accept, not by feature count.

Can I run Flue on any LLM and any host?

That is Flue's stated goal. Its 1.0 beta announcement says to connect any LLM, build your agent, and deploy it anywhere, and lists Node.js, Cloudflare Workers, GitHub Actions, GitLab CI/CD, Daytona, and Render as deploy targets. The model is a field on the agent, so swapping providers is a configuration change at the harness rather than a rewrite. Flue is built on the Pi agent core and ships its own sandbox, persistence, and observability adapters, so the runtime does not assume a single host.

What does agent-as-a-directory mean?

Agent-as-a-directory means the agent's behavior lives in files on disk in conventional locations rather than in one large configuration object. In Eve, an agent directory holds instructions.md for the system prompt, a tools folder of typed functions, a skills folder of markdown procedures, channels, and schedules. Flue takes the same shape: tools are TypeScript modules and skills are SKILL.md markdown files imported into the agent. The payoff is that the project is inspectable and diffable in version control, and a coding agent can read and extend it the way it reads any codebase.

Eve vs Flue: Which TypeScript Agent Framework?

Q: Is Vercel Eve locked to Vercel?

Not strictly. Eve deploys to Vercel at launch and is smoothest there, where Vercel Sandbox and Vercel Connect handle isolation and OAuth for you. But the sandbox is adapter-based: locally it runs on Docker, microsandbox, or just-bash, and Vercel's docs invite you to write an adapter for any other provider. The npm description even calls it a framework for agents that run anywhere. The realistic lock-in is gravitational, not contractual. The defaults pull you toward Vercel platform primitives, and re-creating them elsewhere is work you would own.

Q: Are Eve and Flue production-ready?

Both are young and labeled pre-release, so treat either as a bet rather than a settled standard. Eve is in public preview under Vercel's beta terms at version 0.11.4, with APIs that may change before general availability, though Vercel says it runs more than a hundred agents in production internally. Flue is at 1.0.0-beta.1, published in mid-June 2026, with a public repository that has existed since February 2026. Pin exact versions, expect breaking changes, and keep your own evals and rollback path regardless of which you choose.

If you deploy on Vercel and want the platform to carry durability, sandboxing, and channels, reach for Eve. If running on any model and any host without vendor coupling is a hard requirement, reach for Flue. Both frameworks model an agent the same way, as a directory of files, so the decision is not really about features. It is about deployment gravity and how much lock-in you will accept.

Eve and Flue both arrived as TypeScript agent frameworks in mid-June 2026. Eve is Vercel's, public on 16 June 2026 as a public preview under Vercel's beta terms, at version 0.11.4. Flue is the Astro team's, with a public repository since February 2026 that reached its 1.0 beta on 16 June 2026. Both are Apache-2.0 licensed. This article targets Eve 0.11.x and Flue 1.0.0-beta.1, and because both are pre-release, pin exact versions and expect APIs to move.

TL;DR: Eve or Flue?

Choose by deployment posture and lock-in tolerance, not by a feature checklist, because the feature lists overlap almost completely. Both give you an agent that is a directory, tools and skills as files, durable sessions that survive crashes, a built-in sandbox, subagents, human-in-the-loop approvals, and channel integrations for Slack, GitHub, and the rest. Eve is the platform option: it scaffolds an ordinary Vercel project, deploys with vercel deploy, and is at its smoothest when Vercel Sandbox and Vercel Connect handle isolation and auth for you. Flue is the runtime-agnostic option: it is built on the Pi agent core to connect any model and deploy anywhere, from Node to Cloudflare Workers to GitHub Actions, and its own announcement frames it as zero lock-in. If your infrastructure already lives on Vercel, Eve is the shortest path to production. If avoiding runtime and model lock-in is a first-order requirement, the kind of concern this blog keeps returning to in the model-agnostic fallback layer, Flue is the safer bet. Both are days-old at the 1.0 mark, so neither is a settled standard yet. And if you have not yet decided whether you need a framework at all rather than the Vercel AI SDK, start with do you need an agent framework yet? and come back here once you do.

A side-by-side comparison of Eve and Flue

Before the prose, here is the shape of the decision in one table. Every cell below is drawn from each project's README, official docs, or launch post, verified in mid-June 2026.

Dimension	Eve (Vercel)	Flue (Astro team)
Positioning	"The framework for building agents," filesystem-first	"The sandbox agent framework," a programmable harness
Core API	`defineAgent`, `defineTool`, `defineSchedule`, `defineChannel`	`createAgent(() => ({...}))`
Skills	Markdown files in a `skills/` directory	`SKILL.md` markdown imported with `with { type: 'skill' }`
Scaffold	`npx eve@latest init my-agent`	`npx flue init --target node`
Deploy target	Vercel at launch, adapter-based	Node, Cloudflare Workers, GitHub Actions, GitLab CI/CD, Daytona, Render
Sandbox	Vercel Sandbox when hosted; Docker, microsandbox, or just-bash locally	Built-in virtual sandbox; local or remote container options
Durability	Durable execution via the open-source Workflow SDK	Durable execution in `@flue/runtime`
Model	A field on the agent (example: `anthropic/claude-sonnet-4.6`)	A field on the agent (example: `anthropic/claude-sonnet-4-6`)
Channels	Slack, Discord, Teams, Telegram, Twilio, GitHub, Linear	Slack, Teams, Discord, GitHub, and more
Persistence	Vercel platform primitives	`@flue/postgres` adapter, plus pluggable stores
Observability	Built-in tracing and evals	OpenTelemetry, Braintrust, Sentry, or a custom observer
Built on	Vercel AI SDK, Workflow SDK, Vercel Sandbox	Pi agent core (`@earendil-works/pi-*`)
Maturity (mid-2026)	Public preview, v0.11.4, repo days old	1.0.0-beta.1, repo public since February 2026
License	Apache-2.0	Apache-2.0

The table makes the convergence obvious. Read across most rows and the two frameworks are doing the same job with different vocabulary. The rows that actually separate them are deploy target, sandbox backend, and what each one is built on. Those three are where the recommendation comes from, so the rest of the article digs into them.

For years, the fastest way to build an agent was to make a series of raw model calls in a request handler, wire up your own tool dispatch, and store conversation state in whatever database was nearest. That works for a scripted chatbot. It falls apart the moment the agent needs to run for minutes, survive a deploy, pause for a human, or be understood by a teammate six weeks later. The plumbing becomes the project, and every team rebuilds the same plumbing.

Eve and Flue both answer that with the same structural idea: the agent's behavior lives in files, in conventional places, so the project is inspectable and diffable rather than buried in one configuration blob. Vercel's docs call it filesystem-first. In Eve, a typical agent looks like this.

my-agent/agent/

my-agent/
└── agent/
    ├── agent.ts            # Optional: model and runtime config
    ├── instructions.md     # Required: the always-on system prompt
    ├── tools/              # Optional: typed functions the model can call
    │   └── get_weather.ts
    ├── skills/             # Optional: procedures loaded on demand
    │   └── plan_a_trip.md
    ├── channels/           # Optional: message channels (HTTP, Slack, Discord)
    │   └── slack.ts
    └── schedules/          # Optional: recurring cron jobs
        └── weekly_recap.ts

The directory is the authoring interface. A tool is a file whose name becomes the tool name, a skill is a markdown file, the system prompt is instructions.md, and a schedule is a file that exports a cron job. There is no central registry to keep in sync, because the filesystem is the registry. This is the same instinct that made file-based routing stick in web frameworks: the layout on disk is the source of truth, so the structure is legible to both humans and the coding agents that increasingly write these projects.

Flue lands on the same shape from the other direction. It does not prescribe a directory tree in its README the way Eve does, but the building blocks are identical in spirit: tools are TypeScript modules, and skills are SKILL.md markdown files you import into the agent. Here the boundary between the two starts to show. Eve hands you a fixed, opinionated layout out of the scaffold. Flue hands you composable primitives and lets you assemble the harness in code. Same philosophy, different default on how much structure ships with it.

The practical upshot of agent-as-a-directory is the same one we get from any architecture that favors plain files over hidden state. You can read the whole agent in a pull request, you can grep it, and an AI coding tool can extend it without a special integration, which matters more now that onboarding to a codebase with AI tools is part of how teams actually work. The disagreement is not about whether files are the interface. It is about everything that happens when you press deploy.

Eve: the Vercel-first platform

Eve's pitch is that an agent should be as boring to ship as any other backend. The framework is filesystem-first, but the part Vercel leans on hardest is that an Eve agent is an ordinary Vercel project that deploys the way a Next.js app does. That framing, the platform carries the hard parts, is why people are calling it "Next.js for agents," and it is the strongest reason to pick it.

defineAgent, defineTool, and tools as files

Let's look at the smallest useful Eve agent. You scaffold it with one command, which creates the directory, installs dependencies, initializes Git, and starts the terminal UI.

npx eve@latest init my-agent

A tool is a single file. The filename becomes the tool name, the inputSchema validates the model's arguments with Zod, and execute is the function the model calls.

agent/tools/get_weather.ts

import { defineTool } from "eve/tools";
import { z } from "zod";

export default defineTool({
  description: "Return mock weather data for a city.",
  inputSchema: z.object({ city: z.string().min(1) }),
  async execute({ city }) {
    return { city, condition: "Sunny", temperatureF: 72 };
  },
});

The model and runtime config live in their own file, separate from the tools and the prompt. Notice that the model is a string, not a hardcoded SDK client, which keeps the same separation between intent and provider that a resilient AI layer depends on.

agent/agent.ts

import { defineAgent } from "eve";

export default defineAgent({
  model: "anthropic/claude-sonnet-4.6",
});

Schedules follow the same one-file-per-thing rule. The README does not show the API, but it is real: defineSchedule ships in the eve/schedules module, documented in Vercel's launch post, and pairs a cron expression with a handler so an agent can run on its own clock and, for example, post a weekly summary to a channel.

agent/schedules/monday-summary.ts

import { defineSchedule } from "eve/schedules";
import slack from "../channels/slack.js";

export default defineSchedule({
  cron: "0 9 * * 1",
  async run(agent) {
    const summary = await agent.run("Summarize last week's incidents.");
    await slack.send(summary);
  },
});

The run(agent) body above is illustrative of the shape rather than copied from the docs, so treat the handler internals as a sketch and the import and cron field as the verified parts. What the file demonstrates is the pattern that runs through all of Eve: a capability is a typed default export in a conventional folder, and the framework wires it in. You do not register the schedule anywhere. You drop the file in schedules/ and it exists.

What you get for staying on Vercel, and what you owe it

The real return on Eve is not the API surface. It is what the platform does once the agent is deployed. Durable execution comes from the open-source Workflow SDK, so an agent can checkpoint, pause, and resume across crashes and redeploys without you building a state machine. Human-in-the-loop approvals let the agent pause for a person before an expensive query or a destructive write. Subagents let a root agent delegate to specialists. Tracing and evals are built in, and Vercel's own framing is that if an eval regresses, you can roll a deployment back instantly. Vercel says it runs more than a hundred agents in production internally on this, which is a meaningful signal even discounted as vendor marketing.

Eve's sandbox deserves a closer look, because it is where the lock-in story is more nuanced than "Eve traps you on Vercel." Eve isolates the code an agent writes and runs, and the backend behind that sandbox is an adapter. When deployed, it runs on Vercel Sandbox. Locally, it runs on Docker, microsandbox, or just-bash, and Vercel's docs explicitly invite you to write an adapter for any other provider. Channels work the same way: the HTTP API is on by default, with Slack, Discord, Teams, Telegram, Twilio, GitHub, and Linear included, and defineChannel for custom ones. Auth to external systems goes through Vercel Connect, which brokers OAuth so the model never sees raw credentials.

Here is the trade-off stated plainly. Everything is smoothest on Vercel, and the defaults assume Vercel. The npm package describes Eve as a framework for agents that "run anywhere," and the adapter design is genuine, so this is not a contractual cage. The cost is gravitational. The managed sandbox, the brokered auth, the instant rollback, and the durable execution are the reasons to choose Eve, and re-creating that stack on another host means writing and maintaining adapters yourself. You are trading portability for a platform that carries the operational weight. That is a fine trade when you are already on Vercel and a worse one when you are not, which is exactly the kind of toolchain-coupling calculus behind the SpaceX and Cursor vendor-risk piece.

Flue: the runtime-agnostic harness

Flue starts from the opposite premise. Its README opens with "Not another SDK," and its GitHub description is blunter still: "the sandbox agent framework." Where Eve hands you a platform, Flue hands you a programmable harness and assumes you will decide where it runs. The 1.0 beta announcement is explicit about the goal: connect any LLM, build your agent, and deploy it anywhere, with zero lock-in. For teams that treat the host and the model as things they must be able to change, that sentence is the entire pitch.

createAgent and skills as SKILL.md

Flue's core API is a function, not a directory convention. You compose the harness in code with createAgent, passing the model, the tools, the skills, the sandbox, and the instructions. This is the verified shape from the README, and it is createAgent(() => ({...})), not defineAgent.

agents/triage.ts

import { createAgent, type AgentRouteHandler } from '@flue/runtime';
import { local } from '@flue/runtime/node';
import triage from '../skills/triage/SKILL.md' with { type: 'skill' };
import verify from '../skills/verify/SKILL.md' with { type: 'skill' };
import * as githubTools from '../tools/github.ts';

const instructions = `
Triage a bug report end-to-end: reproduce the bug,
diagnose the root cause, verify whether the behavior is
intentional, and attempt a fix.
`;

export const route: AgentRouteHandler = async (_c, next) => next();

export default createAgent(() => ({
  model: 'anthropic/claude-sonnet-4-6',
  tools: [...githubTools],
  skills: [triage, verify],
  sandbox: local(),
  instructions,
}));

Two details carry Flue's character. First, skills are markdown files imported as modules with import attributes, import triage from '../skills/triage/SKILL.md' with { type: 'skill' }, so reusable expertise is authored in markdown but pulled into the agent through normal TypeScript imports. Second, the sandbox is a value you pass, sandbox: local() here, which makes the runtime environment an explicit, swappable choice rather than a platform assumption. That is the harness philosophy in one line: the agent is assembled from parts you can see and replace.

You install the runtime and the CLI as ordinary dependencies, then scaffold a project for your chosen target.

npm install @flue/runtime
npm install --save-dev @flue/cli
npx flue init --target node

The packages are split by concern: @flue/runtime is the harness, sessions, tools, and sandbox; @flue/cli is the flue binary and dev tooling; @flue/sdk is the client for consuming deployed agents; @flue/opentelemetry and @flue/postgres are the tracing and persistence adapters. A worthwhile caveat on the client side: a @flue/react package is published to npm at the same beta version, but it is not listed in the README's package table, so treat it as undocumented rather than a stable part of the surface and do not build a UI on it expecting stability yet. Eve, for its part, ships eve/react, eve/vue, and eve/svelte client entry points, so both frameworks have front-end bindings in progress and neither has made them the headline.

Deploy anywhere, any model, and the lock-in story

The reason to choose Flue is the inverse of the reason to choose Eve. Nothing in the runtime assumes a single host. The deploy targets in the README are Node.js, Cloudflare Workers, GitHub Actions, GitLab CI/CD, Daytona, and Render, which is a deliberately broad spread across serverless, CI, and container hosts. Persistence is an adapter you choose, with @flue/postgres shipped and the store pluggable. Observability is an adapter too: OpenTelemetry, Braintrust, Sentry, or your own observer. The model is a field, so moving from one provider to another is a configuration change at the harness, not a rewrite of your call sites.

Flue's sandbox is its namesake and worth getting right, because the easy assumption that it is "in-memory bash, no Docker" is not quite what the docs say. Flue ships a built-in virtual sandbox that the docs describe as suitable for many agentic workloads, and the README mentions virtual, local, or remote container sandboxes. So the default is a lightweight virtual environment, with local and remote container options when a workload needs real isolation. It is fair to say Flue does not force a heavyweight container on you for simple cases, but not fair to say it cannot do containers at all.

There is one honest dependency to name. Flue is built on the Pi agent core, visible in its @earendil-works/pi-* dependencies and stated in the 1.0 beta post. That is not lock-in in the platform sense, since Pi is the engine rather than a hosting bill, but it is a foundational dependency whose health matters to Flue's health. "Zero lock-in" is the deployment and model story, and it holds. It does not mean the framework has no foundations of its own. The same model-agnostic reasoning that makes Flue attractive is exactly the discipline argued for in the model-agnostic fallback layer: keep the model and the runtime as things you can swap, and you keep your options when a provider or a platform changes the deal.

How to choose between Eve and Flue

The honest framing is that you are choosing a posture, and the posture should match where your infrastructure already has gravity. Feature-by-feature, these frameworks will keep converging, so a checklist comparison will mislead you. The questions that actually decide it are about deployment and coupling.

Reach for Eve when you are already on Vercel, or planning to be, and you want the platform to carry durability, sandboxing, brokered auth, and channels so your team writes agent behavior instead of agent infrastructure. The payoff is real and immediate: vercel deploy, a managed sandbox, instant rollback on a failed eval, and a hundred-plus internal agents' worth of road-testing behind it. The price you accept is gravitational coupling to Vercel's primitives, mitigated but not erased by the adapter design.

Reach for Flue when running on any model and any host is a requirement you can state out loud, when you self-host or deploy across mixed environments like Cloudflare Workers and CI runners, or when avoiding vendor lock-in is a first-order constraint rather than a nice-to-have. You trade Eve's managed conveniences for adapters you assemble and own, and in return nothing about your agent assumes a single provider. For a team whose threat model includes a platform changing its pricing or a model disappearing, that control is the whole point.

A few specifics can tip a close call. If you need a polished terminal-to-deploy story with the least wiring, Eve's scaffold and Vercel pipeline are ahead. If your persistence and observability stack is already Postgres and OpenTelemetry, Flue's shipped adapters slot in with less friction than re-creating them on platform primitives. If your agents must run inside GitHub Actions or GitLab CI as part of an existing pipeline, Flue lists those as supported deploy targets and Eve does not. And if maturity-by-internal-usage reassures you more than an earlier public repository, Eve's "more than a hundred agents in production" claim weighs against Flue's longer time in the open. Neither answer is wrong. They optimize for different futures.

One thing I would not do is choose based on which API looks prettier. defineAgent versus createAgent is a syntax preference, and both express the same idea. The directory-versus-composition difference is more interesting, but it is a question of how much structure you want shipped by default, not a question of capability. Let deployment posture and lock-in tolerance make the call, and treat the API ergonomics as a tiebreaker at most.

What neither Eve nor Flue solves yet

Picking a framework does not retire the hard problems, so be clear-eyed about what stays on your plate with either one. Both are pre-release. Eve is a public preview under Vercel's beta terms at 0.11.4, with documentation that says the APIs may change before general availability. Flue is at 1.0.0-beta.1, published only in mid-June 2026. Pin exact versions, read the changelogs, and expect breakage. Adopting either is a bet on a fast-moving project, not a settled standard, and that is fine as long as you size the bet accordingly.

Evaluation depth is the gap that bites quietly. Eve ships evals and Flue integrates Braintrust, but a built-in eval harness does not write your evals for you, and an agent that takes real actions needs more than a chatbot's pass-or-fail check. You still have to define what a good outcome is for your domain, build the cases, and gate deploys on them, which is the same discipline as testing AI-generated code pointed at a system that acts rather than only emits text. Neither framework removes that work.

Then there is the operational surface you own no matter what. Cost control across tool calls and subagent fan-out, prompt and output drift when you swap models, rate limits, secrets handling, and the blast radius of an agent with write access to your systems are all yours to manage. A sandbox isolates the code an agent runs. It does not decide whether the agent should have been allowed to run that code at all. The framework gives you primitives. Wiring them into something safe to put in front of users is the same hardening job described in hardening an AI-generated app for production, and it does not get easier because the agent lives in a tidy directory.

Conclusion

Eve and Flue converged on the same good idea, the agent as a directory of files, and then diverged on the question that actually matters: where does this run, and who owns the operational weight. Eve answers with a platform. It is the fastest path to a durable, sandboxed, channel-connected agent if your gravity is already Vercel, and its adapter design means "Vercel-first" is a default rather than a prison. Flue answers with a harness. It is the runtime-agnostic, any-model option for teams who need to keep the host and the model swappable, with the trade-off that you assemble and maintain the conveniences Eve manages for you.

If you ship one decision from this comparison, make it the posture, not the API. Match the framework to where your infrastructure lives and how much vendor coupling you can tolerate, because that is the choice you will still be living with after the syntax stops feeling novel. Both are young enough that the responsible move is to pin versions, keep your own evals, and stay ready to move. The agent-as-a-directory shape is here to stay. Which gravity you pick is the part worth thinking hard about.

Eve vs Flue: Which TypeScript Agent Framework?

Written by Thomas Findlay.