Building a semantic cache that doesn't lie
Cosine similarity is a sharp tool. We share the thresholds, the failure modes, and the eval suite we use to keep ours honest.
Read →Benchmarks, post-mortems, and the occasional opinion. Written by engineers who run the platform.
Three observations: provider price differentials are widening, not narrowing; eval scores diverge most on long-tail prompts; and your cache hit rate is almost entirely a function of how you prompt, not your traffic.
Cosine similarity is a sharp tool. We share the thresholds, the failure modes, and the eval suite we use to keep ours honest.
Read →Quarterly benchmark across 40 frontier and open models. Cost per useful token. Methodology open-sourced.
Read →Provider quota anomaly. Failover behaviour. What we changed in the rerouter.
Read →Why the comparison to the Cisco era is more apt than the AWS one.
Read →Spans, attributes, sampling. The conventions we wish had existed.
Read →The full breakdown — prompt, policy, model picks, eval drift, and what they did next.
Read →Engineering-grade writing. No marketing emails.