Inference is the new networking. We built the router.
Inferly is a small team of infrastructure engineers, ex-researchers and platform veterans building the control plane every AI-native company will eventually need.
Models are a market. Markets need infrastructure.
There used to be one cloud per workload. Now there are ten GPU clouds, a hundred frontier and open models, and prices that move daily. The default of "pick one provider and pray" is no longer rational. Routing is the workload — and that workload deserves a platform.
How we build.
Numbers over adjectives
We publish our latency. Our costs. Our eval scores. Marketing copy that can't be measured doesn't ship.
Boring infrastructure
Routing is a control plane. Control planes must be reliable, observable and dull. Excitement belongs in the product, not the pager.
OpenAI-compatible by default
Adoption requires zero new abstractions. The shortest line between a prototype and Inferly is a base URL change.
Customer evals are the spec
What the customer's golden set says is true is what is true. Our roadmap is shaped by where we lose those scores.
Composable, not bundled
Use the router without the cache. The evals without the router. Adopt one piece. Stay for the system.
Trust is a feature
SOC 2, audit logs, residency, RBAC — at every tier. Compliance is not a paywall.
Building in public.
Founded
Three engineers, one observation: GPU prices had a 10× spread and nobody was routing.
Seed — $7M
Led by Benchmark Foundry. Joined by infra angels from Stripe, Vercel and Anthropic.
Public beta
First 200 design partners. 4M requests/day routed across 40+ models.
Series A — $32M
Led by Index Ventures. Doubled the engineering team. Opened EU residency.
1.2B routed requests / week
SOC 2 Type II. HIPAA-ready. Customers in 23 countries.