Blog

Notes from the routing layer.

Benchmarks, post-mortems, and the occasional opinion. Written by engineers who run the platform.

Featured · Engineering

We routed 12 billion requests last quarter. Here's what we learned about the spread.

Three observations: provider price differentials are widening, not narrowing; eval scores diverge most on long-tail prompts; and your cache hit rate is almost entirely a function of how you prompt, not your traffic.

May 22, 2026 · 14 min read · by the Inferly Eng team
MAY 14 · ENGINEERING

Building a semantic cache that doesn't lie

Cosine similarity is a sharp tool. We share the thresholds, the failure modes, and the eval suite we use to keep ours honest.

Read →
APR 30 · BENCHMARKS

The 2026 model price index

Quarterly benchmark across 40 frontier and open models. Cost per useful token. Methodology open-sourced.

Read →
APR 17 · POST-MORTEM

The 23-minute incident we caught and you didn't

Provider quota anomaly. Failover behaviour. What we changed in the rerouter.

Read →
APR 03 · OPINION

Routing is the new networking

Why the comparison to the Cisco era is more apt than the AWS one.

Read →
MAR 20 · ENGINEERING

OpenTelemetry for LLM workloads, the missing manual

Spans, attributes, sampling. The conventions we wish had existed.

Read →
MAR 06 · CUSTOMER

How Northwind cut support inference cost 64%

The full breakdown — prompt, policy, model picks, eval drift, and what they did next.

Read →

Get one post a week.

Engineering-grade writing. No marketing emails.