Integrations

Everything plugged in. Nothing locked in.

100+ models. 10+ GPU clouds. Every meaningful SDK. Inferly is the universal adapter for the AI stack.

Model providers

OpenAI

GPT-4o, o3, embeddings

Anthropic

Claude 3.5 family, tools

Google

Gemini 2.0, embeddings

Mistral

Large, Codestral, Embed

DeepSeek

V3, Coder, Reasoner

Meta

Llama 3.3, Llama 4

Cohere

Command R+, Rerank

Qwen / AI21 / xAI

and 90+ more

GPU clouds

Together

Open models · serverless

Groq

LPU · sub-100ms

Fireworks

Fine-tunes · embeddings

Replicate

Long-tail OSS models

fal.ai

Image · video

Cerebras

Wafer-scale

Lambda / CoreWeave

Dedicated GPUs

On-prem

vLLM, TGI, Triton

SDKs & frameworks

OpenAI SDK

Drop-in base URL

Anthropic SDK

Compatible endpoint

LangChain

First-class provider

LlamaIndex

Native

Vercel AI SDK

Streaming · tools

Mastra / Agno

Agent frameworks

Pydantic AI

Structured output

MCP

Model Context Protocol

Observability & data

Datadog

Traces · metrics

Honeycomb

OTel native

Snowflake

Spend warehouse

BigQuery

Hourly exports

S3 / GCS

Trace dumps

PagerDuty

SLO breaches

Slack

Spend alerts

Webhooks

Everything else

Missing your stack? We'll wire it up.

Most integrations ship in a sprint. Enterprise customers get them prioritised.

Request an integrationBrowse docs