About

Inference is the new networking. We built the router.

Inferly is a small team of infrastructure engineers, ex-researchers and platform veterans building the control plane every AI-native company will eventually need.

Thesis

Models are a market. Markets need infrastructure.

There used to be one cloud per workload. Now there are ten GPU clouds, a hundred frontier and open models, and prices that move daily. The default of "pick one provider and pray" is no longer rational. Routing is the workload — and that workload deserves a platform.

Principles

How we build.

Numbers over adjectives

We publish our latency. Our costs. Our eval scores. Marketing copy that can't be measured doesn't ship.

Boring infrastructure

Routing is a control plane. Control planes must be reliable, observable and dull. Excitement belongs in the product, not the pager.

OpenAI-compatible by default

Adoption requires zero new abstractions. The shortest line between a prototype and Inferly is a base URL change.

Customer evals are the spec

What the customer's golden set says is true is what is true. Our roadmap is shaped by where we lose those scores.

Composable, not bundled

Use the router without the cache. The evals without the router. Adopt one piece. Stay for the system.

Trust is a feature

SOC 2, audit logs, residency, RBAC — at every tier. Compliance is not a paywall.

Timeline

Building in public.

Mar 2024

Founded

Three engineers, one observation: GPU prices had a 10× spread and nobody was routing.

Sep 2024

Seed — $7M

Led by Benchmark Foundry. Joined by infra angels from Stripe, Vercel and Anthropic.

Feb 2025

Public beta

First 200 design partners. 4M requests/day routed across 40+ models.

Sep 2025

Series A — $32M

Led by Index Ventures. Doubled the engineering team. Opened EU residency.

Today

1.2B routed requests / week

SOC 2 Type II. HIPAA-ready. Customers in 23 countries.

Backed by

Benchmark

Index

Founders Fund

South Park Commons

Conviction

Angels

Want to build the routing layer of AI?

We're hiring across systems, ML and design.

See roles Say hello