Engineering

On-device inference: running LLMs locally with the desktop bridge

June 6, 20266 min readThe Atelier Team

Cloud LLMs are powerful, but for many workflows the best place to run a model is right next to your code. Atelier ships a local runtime that does exactly that: accelerated, on-device inference wired into the same agent loop you use everywhere else.

A native runtime, not a wrapper

The desktop bridge connects Studio to a native inference engine on your machine. On Apple Silicon it uses Metal-accelerated backends; on NVIDIA hardware it uses CUDA. Either way, generation streams token-by-token into the same chat surface, with full support for tool calls and reasoning.

Why local matters

Your prompts and code never leave your machine for inference
No per-token cost - the model runs on hardware you already own
Low latency for tight, iterative loops
Works offline once a model is downloaded

Pair it with a curated model catalog that matches recommendations to your hardware, and you get a fast, private, cost-free inference path that complements - rather than replaces - frontier cloud models.

On-device inference: running LLMs locally with the desktop bridge

A native runtime, not a wrapper

Why local matters

Ready to build, ship, and run with Atelier?