Endpoints
Instantly usable serverless inference endpoints with OpenAI-compatible APIs for OpenCode, KiloCode, Dify, OpenWebUI, and other agent-native workloads. Built for fast integration and subsecond cold start workflows.
Browse published endpoints here. Log in to get a tenant-scoped endpoint URL and inference API key.
Browse the published endpoints first, then log in when you want a tenant-scoped URL, API key, and runnable integration flow.
1. Choose an endpoint
Filter by provider, tag, or use case, then open the endpoint that matches your workload.
2. Copy setup values
Use the API base URL, model name, and API key to configure Dify, OpenWebUI, Continue, OpenCode, or plain curl.
3. Validate the integration
Open the endpoint detail page to copy a sample REST call or run the playground before wiring it into your app.
Public pages show the published endpoint metadata and model identifiers. Sign in to get the real tenant path and API key.
| Name | Brief Intro | GPU Count | Context Length | Concurrency | Integration |
|---|---|---|---|---|---|
| Qwen3-Coder-Next-FP8 | A code-specialized LLM optimized for: Code generation & completion Debugging & refactoring Agent-style tool use (coding workflows) | 2 | 256144 | 6.43x |
|
| Qwen3.5-122B-A10B-NVFP4 | Qwen3.5-122B-A10B-NVFP4 | 2 | 256144 | 6.40x |
|
| Qwen3.5-9B-Mem | A 9B general-purpose model designed for strong reasoning, coding, and chat performance | 1 | 262000 | — |
|
| Qwen3.6-27B-FP8 | Usage Complex Coding Agents: Handling massive contexts in tools like OpenCode. Enterprise RAG: Searching across thousands of documents. | 1 | 262144 | 2.29x |
|
| Qwen3.6-35B-A3B-FP8 | Qwen3.6-35B-A3B-FP8 This MoE (Mixture-of-Experts) model strikes the ultimate balance between "large-model intelligence" and "small-model speed." | 1 | 262000 | 4.80x |
|
| Qwen3.6-35B-A3B-FP8-no-think | Qwen3.6-35B-A3B-FP8 (officially released on April 16, 2026) is the first natively quantized FP8 variant of the Qwen3.6 series. | 1 | 262000 | — |
|
| gemma-4-31B-it-fp8 | Gemma-4-31B-it-FP8 is a state-of-the-art, instruction-tuned dense model from Google, optimized for high-performance inference | 1 | 262144 | 2.91x |
|