What shipped, week by week.
We push to production every Tuesday. The big things land here. The small ones land in the diff.
Latency-aware routing, generally available
Our reranker now incorporates rolling p50/p95/p99 per provider, per region, per model. In internal benchmarks, average end-to-end latency dropped 11% with no change in cost.
- — New
slo_msfield on policy - — SLO breaches now surface in trace UI with the chosen alternative
- — OpenTelemetry attribute
inferly.slo.metemitted on every span
Semantic cache: per-workflow thresholds
Tune cosine similarity per workflow. Default lowered from 0.98 → 0.97 based on the last quarter of customer data.
EU residency, regional routing controls
Pin workloads to eu-west, eu-central. Auditable per request. DPA refresh to support the new region.
Continuous evals: CI hook
Webhook your CI. Block deploys on golden set regression. Built-in GitHub Action.
Inferly 2.0
New dashboard. Workflow primitive. First-class evals. Migration is a one-line base URL change for existing customers.
10 new providers
Added DeepSeek V3, Qwen 2.5, xAI Grok 3, Cerebras, plus 6 long-tail OSS hosts.