FreeAI // Orchestrator Control

└──▶

PROVIDER ROSTER

Six free-tier endpoints stand by. Drop in API keys, weight your favorites, and FreeAI routes every prompt to whoever has the right capability and the most headroom left in their daily quota.

└──▶

ROUTING STRATEGY

The orchestrator picks a provider for every request. Auto reads each prompt and decides; the others lock to a single criterion. You can define your own strategies — tags are matched against each provider's capability list.

FALLBACK CHAIN

If the chosen provider trips a rate limit or errors, automatically walk down the ranking until one succeeds.

ON OFF

└──▶

LIVE PLAYGROUND

Send a prompt to the orchestrator and watch which provider it picks, how long it took, and the full fallback trail.

SYSTEM PROMPT (optional) USER PROMPT

STRATEGY FORCE PROVIDER TEMP STREAM ON OFF

   ┌──────────────────────────┐
   │   awaiting transmission  │
   └──────────────────────────┘

AUDIO TRANSCRIPTION

Upload an audio file to transcribe via Groq Whisper. Requires Groq provider configured.

AUDIO FILE LANGUAGE (optional)

   ┌──────────────────────────┐
   │   awaiting audio file    │
   └──────────────────────────┘

└──▶

ANALYTICS

Telemetry from every dispatched completion. These numbers come from the usage_events table — they persist across restarts and scale with your data.

CALLS OVER TIME

BY PROVIDER

BY OUTCOME

BY STRATEGY

BY CLIENT

ERRORS BY KIND

BY MODEL

FALLBACK CHAIN

TOKENS IN / OUT

HOURLY PATTERN (last 7d)

└──▶ HISTORICAL aggregated from daily rollups — up to 2 years

DAILY CALLS

BY PROVIDER (window)

BY MODEL (window)

└──▶

CLIENT KEYS

Issue API keys for the apps that will call FreeAI. Each client has its own per-minute rate limit. Without at least one client, the server stays in bootstrap mode and anyone can hit /v1/chat/completions.

CLIENT NAME RPM LIMIT

└──▶

USER MANAGEMENT

Manage user accounts (max 5). Each user has their own provider keys and API clients.

ACTIVITY BY USER calls/day · last 7 days

USAGE BY USER success vs failed · last 7 days

└──▶

API REFERENCE

OpenAI-compatible endpoints. Point any client at your FreeAI instance and it works.
Auth: Authorization: Bearer <fai_...> for /v1/* (created in the Users tab or via POST /api/clients); X-Admin-Token or a JWT cookie for admin routes. For the full reference with edge cases, see docs/API.md.

Error shape: provider-level failures come back as {"detail": {"provider", "kind", "message"}} where kind is one of auth, rate_limited, client_error, server_error, network, parsing, empty_response, content_filtered, unknown. Branch on kind, not on message strings. empty_response and content_filtered normally trigger internal fallback and never reach the client — they only surface if every provider fails.

CLIENT ROUTES Authorization: Bearer fai_… 04 / 12

POST /v1/chat/completions CLIENT

REQUEST BODY

{
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "model": "freeai-fast",
  "strategy": "auto",
  "preferred_provider": null,
  "temperature": 0.7,
  "max_tokens": 512,
  "stream": false,
  "fallback": true
}

OpenAI-compatible chat with multi-provider fallback. strategy controls routing (auto, fastest, best_quality, coding, custom…). preferred_provider boosts a specific provider; fallback: true tries the next candidate on failure.

Virtual models: put one of these in "model" to pick a strategy directly without setting strategy:

freeai-auto      freeai-cheap
freeai-fast      freeai-vision
freeai-quality   freeai-long
freeai-code      freeai-reasoning

The response echoes the virtual name in model and the upstream's real model in real_model.

Vision: multimodal blocks route to vision providers automatically:

{"role": "user", "content": [
  {"type": "text", "text": "Describe this"},
  {"type": "image_url", "image_url":
    {"url": "data:image/png;base64,..."}}
]}

Tool calling: full OpenAI histories round-trip. Assistant turns may use content: null with tool_calls, and role: "tool" messages carry their tool_call_id. Top-level tools, tool_choice, response_format, seed, top_p, stop, presence_penalty, frequency_penalty, logit_bias, user, n are accepted and forwarded to OpenAI-compatible providers.

Fallback chain: response includes "fallback_chain": ["mistral", "groq"] so you can tell whether a request needed failover. fallback_position per attempt is logged to usage_events.

Streaming: set stream: true. Returns text/event-stream with chunks mirroring OpenAI's format plus provider on every frame. The orchestrator enforces a per-chunk idle timeout (default 45 s, from app_config.stream_idle_timeout_s); a stalled upstream falls back only if no bytes were flushed yet.

POST /v1/embeddings CLIENT

REQUEST / RESPONSE

// Request:
{
  "input": ["first doc", "second doc"],
  "model": "mistral-embed",
  "preferred_provider": "mistral",
  "fallback": true
}

// Response:
{
  "object": "list",
  "data": [
    {"object": "embedding", "index": 0, "embedding": [...]},
    {"object": "embedding", "index": 1, "embedding": [...]}
  ],
  "model": "mistral-embed",
  "provider": "mistral",
  "usage": {"prompt_tokens": 14, "total_tokens": 14},
  "fallback_position": 1
}

OpenAI-compatible embeddings with Mistral → Gemini fallback. input accepts a string or a list; vectors are aligned with the list order.

⚠ Only native model names are valid. mistral-embed (1024-dim) or text-embedding-004 (Gemini, 768-dim). OpenAI names like text-embedding-3-small are passed verbatim and the upstream returns 400. Omit model to use each provider's default safely.

Vectors from different models are not comparable; tag every stored vector with its provider+model. For production RAG, pin preferred_provider and set fallback: false so a silent provider switch doesn't corrupt your index.

POST /v1/audio/transcriptions CLIENT

MULTIPART / RESPONSE

// multipart/form-data
file: <audio file>
model: "whisper-1"    // optional, ignored
language: "en"        // optional (ISO 639-1)

// Response:
{
  "text": "Hello world...",
  "provider": "groq",
  "model": "whisper-large-v3-turbo",
  "latency_ms": 1230,
  "fallback_position": 1
}

Audio transcription with fallback: Groq Whisper → Gemini. Accepts mp3, wav, ogg, flac, webm, m4a, aac. Max 20 MB for Gemini inline. Same auth as chat completions.

GET /v1/models PUBLIC

RESPONSE

{
  "object": "list",
  "data": [
    {"id": "freeai-auto",    "object": "model", "owned_by": "freeai"},
    {"id": "freeai-fast",    "object": "model", "owned_by": "freeai"},
    {"id": "freeai-quality", "object": "model", "owned_by": "freeai"},
    {"id": "freeai-code",    "object": "model", "owned_by": "freeai"},
    ...
  ]
}

OpenAI-compatible model list. Public — no auth required. Returns the 8 virtual models (strategies) exposed for chat completions. Intended for SDK model-picker UIs.

USER ROUTES JWT cookie · multi-user scope 03 / 12

GET /api/me/providers USER

RESPONSE

[
  {
    "provider_name": "groq",
    "has_key": true,
    "key_preview": "gsk_***abc",
    "enabled": true,
    "rpm_limit": 30,
    "rpd_limit": 14400,
    "tpd_limit": 500000,
    "weight": 1.0,
    "tags": ["fast", "cheap"],
    "default_model": "llama-3.3-70b-versatile",
    "max_retries": null
  }, ...
]

Per-user provider credentials and overrides. Every field (except tags and provider_name) can override the catalog default. max_retries overrides app_config.provider_max_retries for this user+provider; null = use the global.

PATCH /api/me/providers/{name} USER

REQUEST BODY

{
  "api_key": "sk-...",          // empty string to remove
  "enabled": true,
  "rpm_limit": 30,
  "rpd_limit": 14400,
  "tpd_limit": 500000,
  "weight": 1.2,
  "default_model": "llama-3.1-8b-instant",
  "max_retries": 2              // override global retry budget
}

Upsert your credentials for a provider. All fields optional — send only what you want to change. The raw key is encrypted at rest (Fernet).

POST /api/clients USER

REQUEST / RESPONSE

// Request:
{ "name": "my-app", "rpm_limit": 60 }

// Response:
{
  "name": "my-app",
  "api_key": "fai_EzjQ2OcPp_...",
  "key_hash": "a1b2c3...",
  "rpm_limit": 60
}

Issue a client API key for /v1/*. The raw key is shown only once — save it immediately. Clients use Authorization: Bearer fai_.... Keys are scoped to the issuing user; each user sees only their own.

ADMIN ROUTES X-Admin-Token or JWT (admin role) 03 / 12

GET /api/providers ADMIN

RESPONSE

[{
  "name": "groq",
  "enabled": true,
  "has_key": true,
  "healthy": true,
  "requests_today": 42,
  "requests_this_minute": 3,
  "rpm_limit": 30,
  "rpd_limit": 14400,
  "tpd_limit": 500000,
  "tokens_today": 84200,
  "weight": 1.0,
  "last_latency_ms": 420,
  "latency_ema_ms": 385.2,
  "tags": ["fast", "cheap", "audio"],
  "default_model": "llama-3.3-70b-versatile"
}]

Admin only. Live health + rate status for every provider (for the current user). latency_ema_ms is a smoother average than last_latency_ms; the ranker uses it for scoring.

GET /api/strategies ADMIN

RESPONSE

[
  {
    "name": "fastest",
    "description": "Lowest observed latency",
    "is_builtin": true,
    "definition": {
      "prefer": [
        {"when": "latency_ema_ms < 500", "weight": 30}
      ]
    }
  }, ...
]

Admin only. Lists every routing strategy. Built-ins can be edited but not deleted. Custom strategies go via POST /api/strategies. DSL reference: docs/STRATEGY_DSL.md.

GET /api/analytics ADMIN

QUERY / RESPONSE

GET /api/analytics?window_seconds=86400&bucket_count=24

{
  "total_calls": 142,
  "success_rate": 0.9577,
  "by_provider": [...],
  "by_strategy": [...],
  "by_outcome": [...],
  "by_client": [...],
  "time_buckets": [...]
}

Admin only. Aggregated telemetry with breakdowns by provider, strategy, outcome, and client. window_seconds: 60–604800 (7 d). bucket_count: 1–168. Windows > 30 d are served from usage_daily_rollup so they stay fast.

PUBLIC / HEALTH No auth · probes & metrics 02 / 12

GET /api/health PUBLIC

RESPONSE

{ "status": "ok" }

Public. Liveness probe, no auth. Intentionally minimal so an unauthenticated scanner can't fingerprint the deployment. Use /api/setup/status and /api/auth/status for the frontend bootstrap flow, and /api/providers / /api/analytics (admin) for fleet state.

GET /metrics PUBLIC

PROMETHEUS EXPOSITION

# HELP freeai_provider_calls_total ...
# TYPE freeai_provider_calls_total counter
freeai_provider_calls_total{provider="groq",outcome="success"} 1847
freeai_provider_circuit_breaker_trips_total{provider="mistral"} 2
freeai_orchestrator_fallbacks_total{from_provider="mistral",to_provider="groq"} 14
...

Public. Prometheus exposition format. See docs/OPERATIONS.md § 3.2 for the full metric inventory and the Grafana dashboard bundled under the observability docker-compose profile.