The first version had six commands. It was called quantcli.py and it fit in about 80 lines.

The commands were: backtest, papertrade, train-xgb, train-ppo, train-loop, and info. Each one just called a Python script with the right flags. It was a thin wrapper — something to type instead of remembering the exact invocation for train_xgb_gpu.py.

That was December 2025. By February it had grown into something I hadn't planned.

What it became

The current version — kaleo_cli.py — is 2,015 lines. It has a Rich terminal UI with a live status bar showing the current model, context window usage, and whether bypass permissions mode is active. It routes prompts across five different LLMs: Qwen3 14B (self-hosted on Hostinger, free), Claude Sonnet, Claude Opus, GPT-4, and Gemini. You switch with /model qwen or /model opus.

The quant trading commands grew from nothing to about 25 slash commands: /quant:morning-report, /quant:circuit-breaker, /quant:drawdown-monitor, /quant:vps-manager. Each one fires a specific prompt at the right model and formats the output.

There's a heartbeat loop that pings shared state every few minutes. A memory compression system that archives old context and keeps the active window from blowing out. Shift+Tab+Tab enters plan mode. Ctrl+C stops mid-generation without killing the whole session.

Why it grew

The trading system is spread across three servers — Mac, Hostinger, InterServer. The state is in .kaleo/SHARED_STATE.json. The logs are on the VPS. The code is local. The only way to actually see what's happening across all of it is to SSH around and grep things.

The CLI started absorbing that friction. Each time I found myself typing the same thing — pulling a PnL summary, checking which strategies were active, syncing state from the VPS — it became a command. The LLM routing came later, when I wanted to be able to ask questions without switching context to a browser tab.

The model selection matters more than it might look. Qwen3 14B is self-hosted. It's free, it's fast enough for most things, and there's no usage limit. For anything that needs genuine reasoning — architecture decisions, code reviews, complex trading analysis — you switch to Opus. The CLI tracks which model you're on so you don't accidentally run 50 Opus queries against a backlog of routine checks.

The pattern

It started as a trading tool and became a general developer terminal. That wasn't the plan. The quantcli commands are still in there — backtesting, training — buried under two years of accretion. The original 80 lines are somewhere inside 2,015.

The lesson isn't "build a CLI from the start." The lesson is that friction accumulates. Every time you find yourself typing the same thing twice, that's a future command. Every time you open a browser tab to ask a question you could ask from the terminal, that's a future integration. The CLI didn't grow because I sat down and designed a developer tool. It grew because I kept removing friction one piece at a time.

The best tools I've built weren't designed. They were accumulated.