OpenAI-compatible endpoints. Point any client at your FreeAI instance and it works.
Admin routes use X-Admin-Token header; client routes use Authorization: Bearer <key>.
{
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
"strategy": "auto",
"preferred_provider": "groq",
"temperature": 0.7,
"stream": false,
"fallback": true
}
OpenAI-compatible chat. strategy controls routing (auto, fast, quality, or custom).
preferred_provider boosts a specific provider. fallback: true tries the next provider on failure.
Auth: Bearer <client_key> or admin token. Open in bootstrap mode (no clients).
Virtual models: set "model" to freeai-auto, freeai-fast,
freeai-quality, freeai-code, freeai-reasoning,
freeai-cheap, freeai-vision, or freeai-long to pick a strategy directly.
Vision: send images via multimodal content blocks:
{"role": "user", "content": [
{"type": "text", "text": "Describe this"},
{"type": "image_url", "image_url":
{"url": "data:image/png;base64,..."}}
]}
The orchestrator auto-detects images and routes to vision providers (Gemini, OpenRouter).
Works with strategy: "auto" or model: "freeai-vision".
POST/v1/audio/transcriptions
// multipart/form-data
file: <audio file>
model: "whisper-1" // optional, ignored
language: "en" // optional (ISO 639-1)
// Response:
{
"text": "Hello world...",
"provider": "groq",
"model": "whisper-large-v3-turbo",
"latency_ms": 1230,
"fallback_position": 1
}
Audio transcription with fallback: Groq Whisper → Gemini.
Accepts mp3, wav, ogg, flac, webm, m4a, aac. Max 20MB for Gemini inline.
Same auth as chat completions.
[{
"name": "groq",
"enabled": true,
"has_key": true,
"healthy": true,
"requests_today": 42,
"requests_this_minute": 3,
"rpm_limit": 30,
"rpd_limit": 14400,
"tpd_limit": 500000,
"tokens_today": 84200,
"weight": 1.0,
"last_latency_ms": 420,
"latency_ema_ms": 385.2,
"tags": ["fast", "cheap", "audio"],
"default_model": "llama-3.3-70b-versatile"
}]
Live status of every provider including rate limits, token usage, and health.
latency_ema_ms is an exponential moving average (smoother than last_latency_ms).
PATCH/api/providers/{name}
{
"api_key": "sk-...",
"enabled": true,
"weight": 1.5,
"default_model": "llama-3.1-8b-instant",
"rpm_limit": 30,
"rpd_limit": 14400,
"tpd_limit": 500000,
"tags": ["fast", "cheap"]
}
Update any provider field. All fields optional — only send what you want to change.
Resets health/quarantine on save.
// Request:
{ "name": "my-app", "rpm_limit": 60 }
// Response:
{
"api_key": "fai_xxxx_...",
"key_hash": "a1b2c3...",
"name": "my-app"
}
Issue a client API key. The raw key is shown only once — save it immediately.
Clients use Authorization: Bearer <key> to call /v1/* endpoints.
GET /api/analytics?window_seconds=86400&bucket_count=24
{
"total_calls": 142,
"success_rate": 0.9577,
"by_provider": [...],
"by_strategy": [...],
"by_outcome": [...],
"by_client": [...],
"time_buckets": [...]
}
Aggregated usage telemetry. Includes breakdowns by provider, strategy, outcome, and client.
window_seconds: 60–604800. bucket_count: 1–168.