Now self-hostable · v0.1

One endpoint. Every model. Zero downtime.

Orbyt is a self-hosted LLM gateway. Route, retry, and observe every request to every major AI provider through a single OpenAI-compatible endpoint.

Start building Read the docs

OpenAI-compatibleSelf-hostedMIT licensed

Routes to every major model

OpenAIAnthropicGoogleMetaMistralDeepSeekxAICohereGroqPerplexityOpenAIAnthropicGoogleMetaMistralDeepSeekxAICohereGroqPerplexity

Built for production

Three pillars: interoperability, reliability, transparency.

Every part of the gateway exists to remove friction between your application and the model. Nothing more.

Routing

When one provider blinks, the next takes over.

Define a primary model and a fallback chain. The decision engine cascades on rate limits, 5xx errors, and timeouts — your client never sees the failure.

Configurable retry policy with exponential backoff
Per-request fallback_models override
Strategies: cheap, fast, reliable, or provider-locked

routing-enginecheap

incoming request

model: autostrategy: cheapretries: 3

openaigpt-4o—

anthropicclaude-3.5-sonnet—

googlegemini-2.0-pro—

Resilience

Resilience built into the network layer.

A Redis-leased key pool, a global rate limiter, and a decision engine triage every failure mode before it reaches your application. SSE streaming is normalized across providers.

Health-ranked API keys leased per request
Unified streaming format across every provider
Hard errors triaged: provider-exhausted vs. model-exhausted

key-pool · rate-limiterRedis

Requests1,847

Active Keys4 / 6

Cooldown1

Exhausted1

KeyProviderHealthRPMStatus

key_01openai98%812/1000idle

key_02openai72%720/1000idle

key_03anthropic95%420/500idle

key_04anthropic45%499/500cooldown

key_05google88%380/600idle

key_06google12%600/600exhausted

Self-hosted

Your keys. Your data. Your infra.

Orbyt runs on your infrastructure. Bearer tokens are scoped per project, telemetry persists to your Postgres, and you control rotation, revocation, and audit logs end to end.

Scoped Bearer tokens with instant revocation
Telemetry persisted to your own Postgres
OpenAI-compatible — switch with two lines of config

key-managementencrypted

Postgres: your-infra.internal:5432

Encryption: AES-256-GCM

NameKeyScopeStatus

prod-apisk-or-v1-8f3•••9fball modelsactive

stagingsk-or-v1-947•••533gpt-4o onlyactive

deprecatedsk-or-v1-2a1•••f40all modelsrevoked

Audit log

14:22:08key_01 leased → openai/gpt-4o

14:22:06key_03 leased → anthropic/claude-3.5

14:21:58key_06 exhausted — cooldown 120s

On the roadmap

Shipping next — track progress in the changelog.

Q2PresetsQ2Budget LimitsQ3Tool CallingQ3MultimodalityQ3Zero InsuranceQ4BYOK

Models

Route to every major model.

Define your primary, configure fallbacks, and let the engine handle the rest. New providers ship behind the same endpoint.

OpenAILive

gpt-5

256K ctx$15/M

Fallbacks:gpt-4oclaude-3.5

AnthropicLive

claude-3.5-sonnet

200K ctx$15/M

Fallbacks:gpt-4ohaiku

GoogleLive

gemini-2.0-pro

1000K ctx$10/M

Fallbacks:gemini-flash

MetaLive

llama-3.3-70b

128K ctx$0.9/M

Fallbacks:llama-70bmistral-l

MistralLive

mistral-large-2

128K ctx$6/M

Fallbacks:mixtralllama-70b

DeepSeekLive

deepseek-r1

128K ctx$2/M

Fallbacks:gpt-4o-mini

xAIPreview

grok-3

128K ctx$5/M

Fallbacks:gpt-4o

CohereLive

command-r-plus

128K ctx$2.5/M

Fallbacks:gpt-4o-mini

GroqLive

llama-3.3-70b

128K ctx$0.6/M

Fallbacks:together

Browse the full model catalog

Drop-in

Two lines to switch from OpenAI.

Point your existing OpenAI SDK at the Orbyt gateway. Add an extra block to declare fallbacks, routing strategy, and retry policy per request.

OpenAI-compatible. Existing libraries and frameworks work unchanged.

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://openrouter-clone-api-gateway.onrender.com/v1",
  apiKey: "gateway-sk-12345",
});

const response = await openai.chat.completions.create({
  model: "google/gemini-3.1-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of Germany?" }
  ],
  temperature: 0.7,
  // Orbyt extensions
  extra: {
    fallback_models: [
      "anthropic/claude-3-haiku",
      "google/gemini-2.5-flash"
    ],
    provider: "cheap",
    retry: 3
  }
});

Pipeline

The lifecycle of every request.

Deterministic flow. Every layer isolates faults at its origin so transient errors never propagate to the client.

Rate limit

A global limiter enforces traffic bounds before a request enters the routing pipeline.

429 if exceeded

Select provider

The provider selector evaluates strategy (cheap, fast, reliable) and your fallback chain.

extra.provider

Lease + execute

A health-ranked API key is leased from the Redis pool and the request hits the provider.

redis key pool

Normalize & stream

Provider chunks are normalized to a single SSE format. Telemetry persists asynchronously.

sse · postgres

Hard error?decision engineretryfallback modelnext provider

Tracing

See every request, end to end.

DevTools shows you live status, payloads, latency, and the exact routing decision the engine made — for every request.

Open DevTools

orbyt-devtools · localhost:4983Connected

claude-3.5-sonnetsuccessprovider: anthropicstrategy: cheap

Total1.18s

TTFT210ms

Tokens312 / 184

Retries0

Steps1.18s total

rate-limit

8ms

select provider

12ms

lease key

6ms

provider call

480ms

normalize

14ms

stream

684ms

// payload (truncated)

"model": "anthropic/claude-3.5-sonnet",

"messages": [{ role: "user", content: "Summarize..." }],

"extra": { "provider": "cheap", "retry": 3 }

Ship LLM features
without shipping the chaos.

Self-host Orbyt in minutes. Point your existing OpenAI client at it. Sleep through the next provider outage.