Now self-hostable · v0.1

One endpoint. Every model. Zero downtime.

Orbyt is a self-hosted LLM gateway. Route, retry, and observe every request to every major AI provider through a single OpenAI-compatible endpoint.

OpenAI-compatibleSelf-hostedMIT licensed
Orbyt dashboard — API keys management

Routes to every major model

OpenAIAnthropicGoogleMetaMistralDeepSeekxAICohereGroqPerplexityOpenAIAnthropicGoogleMetaMistralDeepSeekxAICohereGroqPerplexity

Built for production

Three pillars: interoperability, reliability, transparency.

Every part of the gateway exists to remove friction between your application and the model. Nothing more.

Routing

When one provider blinks, the next takes over.

Define a primary model and a fallback chain. The decision engine cascades on rate limits, 5xx errors, and timeouts — your client never sees the failure.

  • Configurable retry policy with exponential backoff
  • Per-request fallback_models override
  • Strategies: cheap, fast, reliable, or provider-locked
routing-enginecheap
incoming request
model: autostrategy: cheapretries: 3
openaigpt-4o
anthropicclaude-3.5-sonnet
googlegemini-2.0-pro
Resilience

Resilience built into the network layer.

A Redis-leased key pool, a global rate limiter, and a decision engine triage every failure mode before it reaches your application. SSE streaming is normalized across providers.

  • Health-ranked API keys leased per request
  • Unified streaming format across every provider
  • Hard errors triaged: provider-exhausted vs. model-exhausted
key-pool · rate-limiterRedis
Requests1,847
Active Keys4 / 6
Cooldown1
Exhausted1
KeyProviderHealthRPMStatus
key_01openai98%812/1000idle
key_02openai72%720/1000idle
key_03anthropic95%420/500idle
key_04anthropic45%499/500cooldown
key_05google88%380/600idle
key_06google12%600/600exhausted
Self-hosted

Your keys. Your data. Your infra.

Orbyt runs on your infrastructure. Bearer tokens are scoped per project, telemetry persists to your Postgres, and you control rotation, revocation, and audit logs end to end.

  • Scoped Bearer tokens with instant revocation
  • Telemetry persisted to your own Postgres
  • OpenAI-compatible — switch with two lines of config
key-managementencrypted
Postgres: your-infra.internal:5432
Encryption: AES-256-GCM
NameKeyScopeStatus
prod-apisk-or-v1-8f3•••9fball modelsactive
stagingsk-or-v1-947•••533gpt-4o onlyactive
deprecatedsk-or-v1-2a1•••f40all modelsrevoked
Audit log
14:22:08key_01 leased → openai/gpt-4o
14:22:06key_03 leased → anthropic/claude-3.5
14:21:58key_06 exhausted — cooldown 120s

On the roadmap

Shipping next — track progress in the changelog.

Q2PresetsQ2Budget LimitsQ3Tool CallingQ3MultimodalityQ3Zero InsuranceQ4BYOK

Models

Route to every major model.

Define your primary, configure fallbacks, and let the engine handle the rest. New providers ship behind the same endpoint.

OpenAILive
gpt-5
256K ctx$15/M
Fallbacks:gpt-4oclaude-3.5
AnthropicLive
claude-3.5-sonnet
200K ctx$15/M
Fallbacks:gpt-4ohaiku
GoogleLive
gemini-2.0-pro
1000K ctx$10/M
Fallbacks:gemini-flash
MetaLive
llama-3.3-70b
128K ctx$0.9/M
Fallbacks:llama-70bmistral-l
MistralLive
mistral-large-2
128K ctx$6/M
Fallbacks:mixtralllama-70b
DeepSeekLive
deepseek-r1
128K ctx$2/M
Fallbacks:gpt-4o-mini
xAIPreview
grok-3
128K ctx$5/M
Fallbacks:gpt-4o
CohereLive
command-r-plus
128K ctx$2.5/M
Fallbacks:gpt-4o-mini
GroqLive
llama-3.3-70b
128K ctx$0.6/M
Fallbacks:together
Drop-in

Two lines to switch from OpenAI.

Point your existing OpenAI SDK at the Orbyt gateway. Add an extra block to declare fallbacks, routing strategy, and retry policy per request.

OpenAI-compatible. Existing libraries and frameworks work unchanged.
import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://openrouter-clone-api-gateway.onrender.com/v1",
  apiKey: "gateway-sk-12345",
});

const response = await openai.chat.completions.create({
  model: "google/gemini-3.1-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of Germany?" }
  ],
  temperature: 0.7,
  // Orbyt extensions
  extra: {
    fallback_models: [
      "anthropic/claude-3-haiku",
      "google/gemini-2.5-flash"
    ],
    provider: "cheap",
    retry: 3
  }
});

Pipeline

The lifecycle of every request.

Deterministic flow. Every layer isolates faults at its origin so transient errors never propagate to the client.

01

Rate limit

A global limiter enforces traffic bounds before a request enters the routing pipeline.

429 if exceeded
02

Select provider

The provider selector evaluates strategy (cheap, fast, reliable) and your fallback chain.

extra.provider
03

Lease + execute

A health-ranked API key is leased from the Redis pool and the request hits the provider.

redis key pool
04

Normalize & stream

Provider chunks are normalized to a single SSE format. Telemetry persists asynchronously.

sse · postgres
Hard error?decision engineretryfallback modelnext provider

Tracing

See every request, end to end.

DevTools shows you live status, payloads, latency, and the exact routing decision the engine made — for every request.

Open DevTools

Ship LLM features
without shipping the chaos.

Self-host Orbyt in minutes. Point your existing OpenAI client at it. Sleep through the next provider outage.