[ FREE.AI ]
ORCHESTRATOR CONTROL v0.5.0 // 

Orchestrating free intelligence across providers.

UPLINK
CONNECTING
PROVIDERS ONLINE
— / —
STRATEGY
auto
SAVED VS GPT-4o
$0.00
└──▶

PROVIDER ROSTER

Six free-tier endpoints stand by. Drop in API keys, weight your favorites, and FreeAI routes every prompt to whoever has the right capability and the most headroom left in their daily quota.

└──▶

ROUTING STRATEGY

The orchestrator picks a provider for every request. Auto reads each prompt and decides; the others lock to a single criterion. You can define your own strategies — tags are matched against each provider's capability list.

Click a strategy to make it the default. Hover custom ones to edit or delete.

FALLBACK CHAIN

If the chosen provider trips a rate limit or errors, automatically walk down the ranking until one succeeds.

└──▶

LIVE PLAYGROUND

Send a prompt to the orchestrator and watch which provider it picks, how long it took, and the full fallback trail.

   ┌──────────────────────────┐
   │   awaiting transmission  │
   └──────────────────────────┘
          

AUDIO TRANSCRIPTION

Upload an audio file to transcribe via Groq Whisper. Requires Groq provider configured.

   ┌──────────────────────────┐
   │   awaiting audio file    │
   └──────────────────────────┘
            
└──▶

ANALYTICS

Telemetry from every dispatched completion. These numbers come from the usage_events table — they persist across restarts and scale with your data.

CALLS OVER TIME

BY PROVIDER

BY OUTCOME

BY STRATEGY

BY CLIENT

└──▶

CLIENT KEYS

Issue API keys for the apps that will call FreeAI. Each client has its own per-minute rate limit. Without at least one client, the server stays in bootstrap mode and anyone can hit /v1/chat/completions.

└──▶

API REFERENCE

OpenAI-compatible endpoints. Point any client at your FreeAI instance and it works. Admin routes use X-Admin-Token header; client routes use Authorization: Bearer <key>.

POST/v1/chat/completions
{
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
  ],
  "strategy": "auto",
  "preferred_provider": "groq",
  "temperature": 0.7,
  "stream": false,
  "fallback": true
}

OpenAI-compatible chat. strategy controls routing (auto, fast, quality, or custom). preferred_provider boosts a specific provider. fallback: true tries the next provider on failure. Auth: Bearer <client_key> or admin token. Open in bootstrap mode (no clients).

Virtual models: set "model" to freeai-auto, freeai-fast, freeai-quality, freeai-code, freeai-reasoning, freeai-cheap, freeai-vision, or freeai-long to pick a strategy directly.

Vision: send images via multimodal content blocks:

{"role": "user", "content": [
  {"type": "text", "text": "Describe this"},
  {"type": "image_url", "image_url":
    {"url": "data:image/png;base64,..."}}
]}

The orchestrator auto-detects images and routes to vision providers (Gemini, OpenRouter). Works with strategy: "auto" or model: "freeai-vision".

POST/v1/audio/transcriptions
// multipart/form-data
file: <audio file>
model: "whisper-1"    // optional, ignored
language: "en"        // optional (ISO 639-1)

// Response:
{
  "text": "Hello world...",
  "provider": "groq",
  "model": "whisper-large-v3-turbo",
  "latency_ms": 1230,
  "fallback_position": 1
}

Audio transcription with fallback: Groq Whisper → Gemini. Accepts mp3, wav, ogg, flac, webm, m4a, aac. Max 20MB for Gemini inline. Same auth as chat completions.

GET/api/providers
[{
  "name": "groq",
  "enabled": true,
  "has_key": true,
  "healthy": true,
  "requests_today": 42,
  "requests_this_minute": 3,
  "rpm_limit": 30,
  "rpd_limit": 14400,
  "tpd_limit": 500000,
  "tokens_today": 84200,
  "weight": 1.0,
  "last_latency_ms": 420,
  "latency_ema_ms": 385.2,
  "tags": ["fast", "cheap", "audio"],
  "default_model": "llama-3.3-70b-versatile"
}]

Live status of every provider including rate limits, token usage, and health. latency_ema_ms is an exponential moving average (smoother than last_latency_ms).

PATCH/api/providers/{name}
{
  "api_key": "sk-...",
  "enabled": true,
  "weight": 1.5,
  "default_model": "llama-3.1-8b-instant",
  "rpm_limit": 30,
  "rpd_limit": 14400,
  "tpd_limit": 500000,
  "tags": ["fast", "cheap"]
}

Update any provider field. All fields optional — only send what you want to change. Resets health/quarantine on save.

POST/api/clients
// Request:
{ "name": "my-app", "rpm_limit": 60 }

// Response:
{
  "api_key": "fai_xxxx_...",
  "key_hash": "a1b2c3...",
  "name": "my-app"
}

Issue a client API key. The raw key is shown only once — save it immediately. Clients use Authorization: Bearer <key> to call /v1/* endpoints.

GET/api/analytics
GET /api/analytics?window_seconds=86400&bucket_count=24

{
  "total_calls": 142,
  "success_rate": 0.9577,
  "by_provider": [...],
  "by_strategy": [...],
  "by_outcome": [...],
  "by_client": [...],
  "time_buckets": [...]
}

Aggregated usage telemetry. Includes breakdowns by provider, strategy, outcome, and client. window_seconds: 60–604800. bucket_count: 1–168.