Platform
A complete AI infrastructure
Six pillars that cover every layer of the stack, from model serving to enterprise compliance.
Inference
High-performance model serving with OpenAI compatibility
- OpenAI-compatible API
- 10+ open-source models (Llama, Mistral, DeepSeek, Qwen)
- Streaming & non-streaming responses
- Structured outputs with JSON Schema constrained decoding
- Vision / multimodal inputs (JPEG, PNG, GIF, WebP)
- Extended thinking with configurable token budgets
- Responses API — agentic multi-turn tool orchestration
- Prompt prefix caching with 90% token discount
RAG & Knowledge
End-to-end retrieval-augmented generation pipelines
- Knowledge Base management
- Hybrid search (semantic + BM25)
- Citations with source references
- Multiple chunking strategies
- Data source connectors (S3, databases, URLs)
Security & Auth
Enterprise-grade security from day one
- Email/password authentication
- Google & GitHub OAuth
- SAML 2.0 SSO (Okta, Azure AD, Google Workspace)
- SCIM user provisioning
- IP allowlisting
- API key scopes
- Audit logging
- Content moderation with per-org guardrail policies
- Webhook events for all async operations (14 event types)
Billing & Usage
Transparent pricing with full visibility into spend
- Pay-as-you-go credit system
- Transparent per-model pricing
- Stripe-powered payments
- Usage analytics & dashboards
- Spending limits & alerts
- Thinking tokens (extended reasoning) billed at 50% of the standard output token rate
Developer Experience
First-class tooling for every stack
- Python & Node.js SDKs
- OpenAI SDK compatible (just change base URL)
- LangChain, LlamaIndex, Haystack, DSPy, CrewAI integrations
- Prompt playground
- API explorer
Enterprise
Built for teams with demanding requirements
- SAML SSO + SCIM provisioning
- Multi-tenant isolation
- Custom rate limits
- Dedicated support & SLA
- Fine-tuning & A/B testing
Compare Plans
Feature comparison
See exactly what is included in every plan.
| Feature | Free | Developer | Pro | Enterprise |
|---|---|---|---|---|
| Open-source models | Community | All models | All models | All + custom |
| Rate limit (RPM) | 30 | 600 | 3,000 | 10,000+ |
| Knowledge Bases | 1 | 10 | 50 | Unlimited |
| Vector storage | 100 MB | 10 GB / KB | 50 GB / KB | Unlimited |
| Document storage | 100 MB | 50 GB | 200 GB | Unlimited |
| Streaming & tool calling | ||||
| Prompt caching | ||||
| Structured Outputs (JSON Schema) | ||||
| Extended Thinking / Reasoning | ||||
| Vision / Multimodal Inputs | ||||
| Content Moderation & Guardrails | ||||
| Responses API (Agentic) | ||||
| Webhook Events | ||||
| Google & GitHub OAuth | ||||
| SAML SSO | ||||
| SCIM provisioning | ||||
| IP allowlisting | ||||
| Audit logging | ||||
| Usage analytics | Basic | Full | Full | Full + export |
| Spending limits & alerts | ||||
| Support | Community | Email + Discord | Priority email | Dedicated + SLA |
| Uptime SLA | 99.9% | 99.95% | 99.99% | |
| Fine-tuning |
