Statistical arbitrage assumes two things: that the relationship between two assets is stable enough to measure, and that when it deviates, it mean-reverts. Both assumptions can be wrong.

I built Angelfish to test whether they hold for liquid crypto pairs — specifically, correlated assets on Hyperliquid where I could execute both legs without too much slippage.

Finding the pairs

The cointegration scanner runs a battery of tests: Engle-Granger cointegration, Johansen trace test, and ADF on the spread. A pair qualifies if it passes all three and the half-life of mean reversion is between 4 and 60 hours — fast enough to be tradeable, slow enough to not be noise.

The scanner runs weekly. Not all pairs that passed last week pass this week. Relationships break down, especially in crypto where correlations can shift rapidly during regime changes. Any pair that fails two consecutive scans gets retired from the live set.

Entry and exit logic

The spread is z-scored using a rolling 30-day window. Entry triggers when the z-score crosses ±2. Exit when it crosses back through ±0.5, or when the position has been open for 48 hours without closing, or when the spread continues diverging past ±3 (something's broken — close immediately).

Position sizing uses the spread's volatility to calibrate leg sizes so the portfolio is dollar-neutral. The long and short legs are placed as limit orders at mid-price with a small offset. Getting both legs filled at the entry spread is the hardest part at small account size — one leg fills and the other misses, and you're suddenly just long or short one asset.

What the backtests showed

On validated data (truth gate running), the best pairs had Sharpe ratios between 0.8 and 1.4 over 12-month windows. The worst pairs — ones that looked cointegrated but had a slow structural break — had Sharpe ratios that degraded from positive to negative across walk-forward windows. The scanner caught most of these, but not all.

Live trading started at small size. The theoretical edge is real. The execution friction is higher than backtesting suggests, particularly for the second leg. Still tuning.