Pricing

Pay for routing. Save on inference.

A flat platform fee on top of provider-passthrough costs. The savings from routing typically exceed the fee by 10–20×.

Starter

For teams shipping their first AI product.

$99/month
  • 100k routed requests / mo
  • Routing across 100+ models
  • Semantic caching
  • 7-day trace retention
  • Community support
Start free

Enterprise

For regulated, high-volume deployments.

Custom
  • Unlimited routed requests
  • VPC / on-prem deployment
  • Custom fine-tunes
  • HIPAA, FedRAMP, GxP packs
  • 24×7 support, 99.99% SLA
  • Dedicated FDE
Talk to sales
Compare

All plans, side-by-side.

FeatureStarterGrowthEnterprise
Routed requests / mo100k5MUnlimited
Models supported100+100+100+ & custom
Semantic caching
Continuous evals
SSO / SCIM / RBAC
Trace retention7 days90 days365+ days
Data residencyUSEU + USAny region
VPC / on-prem
Compliance packs (HIPAA, FedRAMP)
SupportCommunitySlack, 4h24×7, 30min
Uptime SLA99.9%99.95%99.99%
Questions

Pricing FAQ.

Yes — provider token costs pass through at-cost. Inferly's fee is the platform charge. In practice the routing savings exceed our fee by 10–20×.
Yes. 1,000 routed requests per day, free, forever — no credit card. Plenty for prototyping and small side projects.
Overages are billed at $0.0001 per routed request beyond plan caps. You can set hard caps to block instead.
15% off Growth on annual, 20–25% off Enterprise on multi-year. Contact sales.
Yes — 60-day paid pilots at 50% list, convertible to annual.

Try it on your traffic for 30 days.

No procurement cycle. Drop in. See the routing savings on your actual workload.

Start freeBook a call