Engineering
On-device inference: running LLMs locally with the desktop bridge
Cloud LLMs are powerful, but for many workflows the best place to run a model is right next to your code. Atelier ships a local runtime that does exactly that: accelerated, on-device inference wired into the same agent loop you use everywhere else.
A native runtime, not a wrapper
The desktop bridge connects Studio to a native inference engine on your machine. On Apple Silicon it uses Metal-accelerated backends; on NVIDIA hardware it uses CUDA. Either way, generation streams token-by-token into the same chat surface, with full support for tool calls and reasoning.
Why local matters
- Your prompts and code never leave your machine for inference
- No per-token cost - the model runs on hardware you already own
- Low latency for tight, iterative loops
- Works offline once a model is downloaded
Pair it with a curated model catalog that matches recommendations to your hardware, and you get a fast, private, cost-free inference path that complements - rather than replaces - frontier cloud models.