Pricing
Pay for routing. Save on inference.
A flat platform fee on top of provider-passthrough costs. The savings from routing typically exceed the fee by 10–20×.
Starter
For teams shipping their first AI product.
$99/month
- 100k routed requests / mo
- Routing across 100+ models
- Semantic caching
- 7-day trace retention
- Community support
Growth
For teams running AI in production.
$2,500/month
- 5M routed requests / mo
- Continuous evals + golden sets
- SSO, SCIM, RBAC
- 90-day trace retention
- Slack support, 4h SLA
- EU + US residency
Enterprise
For regulated, high-volume deployments.
Custom
- Unlimited routed requests
- VPC / on-prem deployment
- Custom fine-tunes
- HIPAA, FedRAMP, GxP packs
- 24×7 support, 99.99% SLA
- Dedicated FDE
Compare
All plans, side-by-side.
| Feature | Starter | Growth | Enterprise |
|---|---|---|---|
| Routed requests / mo | 100k | 5M | Unlimited |
| Models supported | 100+ | 100+ | 100+ & custom |
| Semantic caching | ✓ | ✓ | ✓ |
| Continuous evals | — | ✓ | ✓ |
| SSO / SCIM / RBAC | — | ✓ | ✓ |
| Trace retention | 7 days | 90 days | 365+ days |
| Data residency | US | EU + US | Any region |
| VPC / on-prem | — | — | ✓ |
| Compliance packs (HIPAA, FedRAMP) | — | — | ✓ |
| Support | Community | Slack, 4h | 24×7, 30min |
| Uptime SLA | 99.9% | 99.95% | 99.99% |
Questions
Pricing FAQ.
Yes — provider token costs pass through at-cost. Inferly's fee is the platform charge. In practice the routing savings exceed our fee by 10–20×.
Yes. 1,000 routed requests per day, free, forever — no credit card. Plenty for prototyping and small side projects.
Overages are billed at $0.0001 per routed request beyond plan caps. You can set hard caps to block instead.
15% off Growth on annual, 20–25% off Enterprise on multi-year. Contact sales.
Yes — 60-day paid pilots at 50% list, convertible to annual.
Try it on your traffic for 30 days.
No procurement cycle. Drop in. See the routing savings on your actual workload.