Changelog

What shipped, week by week.

We push to production every Tuesday. The big things land here. The small ones land in the diff.

May 27, 2026
v2.4

Latency-aware routing, generally available

Our reranker now incorporates rolling p50/p95/p99 per provider, per region, per model. In internal benchmarks, average end-to-end latency dropped 11% with no change in cost.

  • — New slo_ms field on policy
  • — SLO breaches now surface in trace UI with the chosen alternative
  • — OpenTelemetry attribute inferly.slo.met emitted on every span
May 13, 2026
v2.3

Semantic cache: per-workflow thresholds

Tune cosine similarity per workflow. Default lowered from 0.98 → 0.97 based on the last quarter of customer data.

Apr 29, 2026
v2.2

EU residency, regional routing controls

Pin workloads to eu-west, eu-central. Auditable per request. DPA refresh to support the new region.

Apr 15, 2026
v2.1

Continuous evals: CI hook

Webhook your CI. Block deploys on golden set regression. Built-in GitHub Action.

Apr 1, 2026
v2.0

Inferly 2.0

New dashboard. Workflow primitive. First-class evals. Migration is a one-line base URL change for existing customers.

Mar 11, 2026
v1.18

10 new providers

Added DeepSeek V3, Qwen 2.5, xAI Grok 3, Cerebras, plus 6 long-tail OSS hosts.

Load older releases