Hippo Token
Developer-friendly platform with the latest open-source models — built for development and production, with maximum privacy and no vendor lock-in.
30 seconds to set up · No credit card needed
What can you run?
Cutting-edge models, ready now
Production-grade models hosted in Germany. Catalogue updated continuously.
openai/gpt-oss-120bOpenAI's 120B open-source model. Optimized for tool use and ready for agentic setups — ideal for autonomous workflows that act, decide, and execute.
Extract text, tables, and structured data from images and PDFs with high accuracy. Ideal for document automation.
Qwen/Qwen3.6-27BQwen's 27B open-source model. Exceptional coding performance on par with Claude Sonnet — strong output quality across a broad range of tasks.
How do you get started?
Change one line. That's it.
Point your existing OpenAI SDK to Hippo Token. No new dependencies, no rewriting — just a different base URL.
from openai import OpenAI
client = OpenAI(
base_url="https://api.openhippo.io/v1",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Works with your stack?
Compatible with thousands of apps.
Your existing tools and workflows stay intact. No migration project, no retraining — we handle the technical compatibility.
What does it cost?
Pay per token, no minimums
No seats, no commitments. Scale from prototype to production without renegotiating.
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| BAAI/bge-reranker-base | €0.02 | — |
| mistralai/Ministral-3-3B-Instruct-2512 | €0.03 | €0.10 |
| nvidia/Nemotron-3-Nano-Omni-30B-A3B | €0.05 | €0.20 |
| openai/gpt-oss-120b | €0.15 | €0.60 |
| Qwen/Qwen3-Embedding-0.6B | €0.02 | — |
| Qwen/Qwen3.6-27B | €0.32 | €3.20 |
| zai-org/GLM-OCR | €0.02 | €0.03 |
What if you want to leave?
Export the exact stack we run.
No lock-in. Every component is open source — take it, self-host it, run it on your own hardware.

vLLM
High-throughput LLM serving with PagedAttention. The same engine powering our API — deploy it on your own GPU in minutes.

LiteLLM
OpenAI-compatible proxy across all models. Swap our endpoint for your own and every integration keeps working.
Hugging Face
Every model we serve is on Hugging Face. Download the weights, run fully offline — no permission needed.

Where do our managed services actually run?
The LEW Green Data Center — where our AI lives.
Our infrastructure runs in the LEW Green Data Center, powered by 100% renewable energy from the river Lech. This isn't just a hosting choice — it's a statement. Powerful AI doesn't have to cost the planet.
By operating our own hardware in a sovereign, GDPR-compliant facility, we guarantee that your data never leaves European soil. No US cloud dependencies, no compromises on privacy — just clean energy and full control.
See what runs hereSpin up a key in 30 seconds
Move off whenever you want — same SDK, same models, your own infrastructure.



