Feedback in, PR out: a tour of the Maestrio stack

A user submits a feature request, or reports a bug. Before you've made coffee, the PR is open and ready to merge. Everything between those two events is Maestrio.

I want to walk through how it's built, because how the system is put together is most of the story. The interesting decisions are not always where you'd expect them to be.

The front door

Feedback comes in through an SDK widget that customers drop into their app. It's a small thing, about sixteen kilobytes of JavaScript before gzip, and it does one job: capture what the user said and POST it to us. There is no clever transport, no batching, no websocket. Just an HTTPS call to a single endpoint.

That endpoint lives on the dashboard. It is a Next.js API route and it does the boring, unsexy work that has to happen before anything interesting can: validate the API key, check that the calling origin is on the project's allowlist, count the request against the monthly quota, and persist a row in Postgres so we have something durable to refer back to. None of that work wants to be at the edge. All of it wants to be next to the database. So we left it next to the database.

Once the row is written, the route hands the payload off over HTTPS to a Cloudflare Worker, which is where the real pipeline begins.

Triage on Cloudflare

The triage Worker is the first piece of the system that runs on Cloudflare. It takes the HTTP handoff from the dashboard, writes the job onto a Cloudflare Queue, and acknowledges. Now the dashboard is off the hook. The user has a response, the request is persisted, and the rest of the work can take as long as it takes.

The same Worker is also the consumer of that queue. When it picks the job back up, it asks the obvious question: have we seen this before? Bug reports especially tend to cluster. If three customers report the same broken dropdown, you don't want three pull requests trying to fix it, you want one. So before we do anything else, we check.

The check itself is cheap. We run a vector search against prior requests in Postgres, using pgvector with an HNSW index, and pull the closest neighbours. For the cases where the answer is obvious (cosine similarity above some threshold, or below some floor) we just decide. For the borderline cases we hand them to an LLM with full context and let it make the call. Cheap first, smart second. We pay for intelligence only where intelligence is actually doing work. Dupes get linked to the original. New requests get a row of their own and land on the board.

The board

The dashboard is a Next.js app. It looks like a kanban, with three columns (triage, in progress, done), and it updates live. When a request moves between columns, we push the change over Server-Sent Events so the user doesn't have to refresh the tab. The board is the cockpit. It's where you see what came in this morning, what the agent is currently working on, and what's already been merged.

The one place we put real frontend effort is making the kanban feel instant. When you drag a card, the move happens on the client immediately and the API call goes out in the background. If the server rejects it, we roll back. It sounds like a small thing, but if you're staring at this board for an hour a day, the difference between snappy and laggy is the difference between a tool you like and a tool you tolerate.

The coding agent

This is the part of the system that does the actual work, and it's the part where we had to make the most interesting trade-off.

The agent does not run in a Worker. Workers are great at fan-out: one request, a thousand small parallel pieces of work. They are terrible at thirty-minute tasks. They have wall-clock limits, they get killed and restarted, and they assume statelessness in ways that get painful when your job has internal state worth preserving. A coding agent is the opposite of a Worker's happy path. It runs for a long time, it has a lot of state, and if it dies halfway through you want to either resume cleanly or abort cleanly.

So each coding run is fronted by a Durable Object, which proxies into a container. The DO owns the lifecycle. It boots the container, streams output back to the dashboard so you can watch the agent work, and is the single source of truth if the user hits cancel halfway through. The container itself does the real work: clones the repo, lets the agent make the change, runs the tests and the linter, and opens the PR against your default branch. When the job finishes, the DO tears the container down and tears itself down with it. There is no idle pool waiting for the next request. There is no orchestration plane to maintain.

Why this shape

The expensive stateful work, the actual coding, is isolated inside the Durable Object plus container pair. Everything else stays cheap and stateless. That split is the whole point. The hot part of the system is small and contained. The cold part is most of the system, and it costs us almost nothing to run.

We run Postgres for the things that need ACID, Cloudflare for everything async, and a single Next.js app for the dashboard and the API surface in front of the database. I won't pretend this is some clever new architecture pattern. It's just that every layer we don't own is a layer we don't get paged on at three in the morning. Every queue we don't run ourselves is a queue that can't drift out of sync, that can't fill up because somebody forgot to add a consumer, that doesn't need a dashboard and an on-call rotation of its own. The fewer moving parts, the fewer Sunday afternoons spent debugging them.

The whole stack is built around one constraint: the time between a user complaining and the PR landing should be measured in minutes, not sprints.

Close the loop

That's the bet. Feedback to feature, automatically, in the time it takes to make coffee. The infrastructure is interesting because the problem is interesting, and the problem fix is what we're shipping.