Building My AI Workbench

Two weeks ago I had a collection of tools. A Hugo blog. An Odoo integration. Some CLI wrappers for Google APIs. A browser automation framework. They existed in separate repos, worked independently, and shared nothing.

Today they’re an ecosystem. Not because I wrote some grand unification layer, but because I found a tool that thinks the way Unix thinks: everything is a file, every program does one thing, and | connects them.

This is the story of how pi — a minimal terminal coding agent — became my operating system for building things.

What pi actually is

Pi is not an IDE. It’s not Cursor, not Copilot, not any of the AI-enhanced editors fighting for your attention. It’s closer to bash: a harness that gives an LLM four tools (read, write, edit, bash) and gets out of the way.

The magic isn’t in what pi ships with. It’s in what pi lets you add:

Skills — markdown files that teach the model how to use specific tools. Like man pages, but for AI.
Extensions — TypeScript modules that add new tool calls. Like adding grep to your PATH.
Agents — named configurations with a persona, a model, and a set of skills. Like shell aliases with opinions.
Chains — sequences of agents, where each step’s output flows into the next. Like pipes.

None of these concepts are revolutionary. They’re borrowed from the oldest playbook in computing: small pieces, loosely joined.

The workbench, two weeks in

Here’s what I’ve built on top of pi in 14 days:

18 skills across two layers. The shared ones are tool-specific and agent-agnostic — Azure CLI, Google Cloud, PM2 process management, web search, spreadsheet modeling, terminal recording, pre-release workflows. The personal ones handle my specific workflows — Bitwarden credential retrieval, browser launch sequences, document generation (Word, Excel, PowerPoint, PDF), multi-model design reviews.

8 agents with different personalities. Four are reviewers that I run in parallel during design sprints — two architecture-focused, two implementation-focused. One is a synthesizer that merges their feedback. One is a scout that explores codebases. One delegates to OpenAI’s Codex for pure coding tasks. One is a researcher.

2 extensions that add persistent state. A TODO system that survives across sessions — the agent claims tasks, journals progress, marks them done. And a knowledge graph (“brain”) that extracts signals from emails, notes, and ERP records, clusters them into entities, and lets the agent resolve “who is this phone number?” against accumulated context.

23 experiments — design documents that went through structured review cycles before any code was written. Each one followed the same pattern: capture the vision, run parallel reviewers across different models, synthesize, iterate.

11 memoria entries — stable facts the agent records when it discovers something non-obvious. Unix sockets don’t work on Windows (use named pipes). npm Trusted Publishing requires Node 24. PGLite date intervals produce timestamps, not dates. Each entry saves 2 hours of rediscovery.

The pattern that makes it work

The Unix philosophy says: write programs that do one thing and do it well, write programs that work together, write programs that handle text streams.

Skills do one thing. The odoo skill teaches the model to query an Odoo ERP — field exploration, record operations, accounting queries. It knows nothing about email. The go-easy skill teaches the model Gmail, Drive, Calendar, and Tasks. It knows nothing about ERPs. The sheet-model skill teaches the model to build spreadsheets with live formulas. It knows nothing about where the data comes from.

But they compose. “Read last month’s invoices from Odoo, build a ratio model, email the CFO” works — not because someone designed an invoice-to-spreadsheet-to-email pipeline, but because each skill handles its piece and the agent holds the thread.

That’s the thing about composability: the interesting behaviors are the ones nobody planned for.

Beyond code: the business design sprint

Here’s where it gets interesting. The same pattern that reviews software architecture also reviews business strategy.

I run a group of professional services companies — legal, accounting, property management. When we needed to prepare for a bank meeting with a client, I didn’t just crunch numbers. I wrote a briefing document with the financial position, the adjustments we’d planned, and what we needed from the bank. Then I ran four reviewers:

4 reviewers in parallel:
  - Fractional CFO         ×  Gemini 2.5 Pro
  - Bank relationship mgr  ×  Gemini 2.5 Pro
  - Fractional CFO         ×  Claude Sonnet
  - Bank relationship mgr  ×  Claude Sonnet
        │
        ▼
  Synthesizer merges all 4 reviews

The CFO personas dissected the client’s EBITDA, found €45K–€80K of legitimate adjustments we hadn’t considered — capitalizing internal development under Spanish GAAP, correcting accrual mismatches, recognizing unbilled revenue for Q4 work. The banking personas explained exactly how the bank’s internal risk tool weights ratios, what the real thresholds are for SME credit in Spain, and why we should change the presentation order.

None of these reviewers are “coding agents.” They’re business personas — a CFO who knows Spanish accounting, a bank analyst who knows how BBVA’s risk engine works. The same chain infrastructure that runs architecture reviews runs financial strategy. The skill system doesn’t care what domain the knowledge is in.

More recently, when a client was forced into a strategic decision — hire a replacement or double down on outsourcing — I wrote a vision document for transforming the business model. Four reviewers again, but this time: a fractional CFO, a COO with outsourcing experience, a strategy consultant, and a regulatory compliance specialist. The strategy reviewer’s verdict was blunt: “this isn’t a strategic transformation, it’s an operational optimization dressed up as one — where’s your moat?” The compliance reviewer mapped every regulatory risk across four professional service areas. The COO said the vision was missing the entire “how” of daily operations.

That feedback reshaped the plan. And it cost us an hour of writing the vision document, plus five minutes of compute time.

The design sprint for code

The same workflow applies to software. When I have an idea — say, a Google Chat gateway for the company’s AI assistant — I write a vision document: problem, goal, architecture, constraints, risks, open questions. Then four reviewers, this time architecture and implementation focused, across two different models. A synthesizer merges their feedback.

23 experiments went through this process. Some survived. Some got killed by the reviews — which is exactly the point. Killing a bad idea at the vision stage costs 10 minutes. Killing it after implementation costs a week.

The key insight: the reviewers are just markdown files with a persona. Creating a CFO reviewer or a banking analyst takes 5 minutes. Creating a regulatory compliance specialist for Spanish professional services takes 10. The chain infrastructure — parallel execution, output aggregation, synthesis — is generic. The domain knowledge lives in the personas.

What I actually built with it

This isn’t theoretical. In two weeks, working evenings and weekends:

go-easy — A unified Google API library for AI agents. Gmail, Drive, Calendar, Tasks. OAuth flows, two-phase auth, CLI tools. Published to npm.
odoo-toolbox — Odoo ERP integration for AI agents. CI runs against a real Odoo instance. Published to npm.
holdpty — Cross-platform detached terminal sessions. Named pipes on Windows, Unix sockets everywhere else. Published to npm.
This website — Rebuilt from two legacy Hugo sites, merged content from two languages, deployed with GitHub Actions. The agent did the migration, I made design decisions.

16 published npm packages, a rebuilt website, 23 design documents, and a knowledge graph. None of it written by the AI. All of it written with the AI — the agent handled the scaffolding, the CI/CD, the release mechanics, the tedious migrations, while I focused on what matters: what to build and why.

We event built a test harness for the agent so it can verify it’s own modifications, pi-test-harness

The carpenter’s workshop

I keep coming back to this metaphor: a carpenter’s workshop.

A good workshop isn’t about having the fanciest tools. It’s about having tools you know intimately, arranged the way you think, adapted to the work you do. The table saw is bolted where you need it. The jigs are purpose-built. The wood rack is sorted by project, not by species.

My pi setup is my workshop. The skills are my jigs — purpose-built for specific operations. The agents are my apprentices — each trained for a particular kind of work. The memoria is the notebook on the wall — hard-won lessons I don’t want them to relearn. The experiments are my sketches — ideas tested cheaply before committing wood.

None of this is a product. It’s not meant to be installed by anyone else. It’s a personal workbench, built incrementally over two weeks, adapted to how I think and what I build.

And that’s the point. The best tools are the ones you build for yourself.

This post is part of the AI Workbench series, where I document the evolving setup. Next up: the Unix philosophy angle — why skills, not prompts, are the right abstraction for AI agents.

What pi actually is#

The workbench, two weeks in#

The pattern that makes it work#

Beyond code: the business design sprint#

The design sprint for code#

What I actually built with it#

The carpenter’s workshop#