LLM
Build AI chat into your app with one SDK call. Atelier routes between your dev BYOS tunnel, Atelier credits, and end-user-supplied keys automatically. Streaming, tool use, structured output, vision, and RLS-aware RAG are all built in.
What you don’t have to build
- Streaming SSE wrapper
- Provider key management for end-users
- Conversation persistence schema
- RLS-aware vector retrieval (the user’s permissions filter their RAG context automatically)
- Token usage metering
- Tool-call orchestration loop
- Cost routing (free during dev, credit/BYOS in prod)
That’s the work @atelier/sdk handles.
Model aliases — never hardcode a model name
Models move fast. Your code should declare intent, the Base console maps intent to a concrete model. When Claude 5 ships or you want to A/B test, you change one row in the console — no redeploy.
Two axes of aliases — pick whichever reads better in your code.
Tier — performance vs cost:
| Alias | What it means |
|---|---|
high | Best quality. Slowest, priciest. |
medium | Balanced. Default for most calls. |
low | Fast and cheap. Good for high-volume, low-stakes work. |
Intent — purpose-driven:
| Alias | What it means |
|---|---|
reasoning | Complex inference, planning, chain-of-thought. |
coding | Code generation, review, refactoring. |
creative | Writing, design, ideation. |
fast | Snappy interactive UX. |
multimodal | Vision, audio, mixed input. |
atelier.llm.chat({ model: 'medium' }); // tier
atelier.llm.chat({ model: 'reasoning' }); // intent
atelier.llm.chat({ model: 'fast' }); // intent
You can still pin to an exact model if you need to:
atelier.llm.chat({ model: 'claude-4-7-sonnet' });
But that’s the exception. Default to aliases.
Console mapping
In the Base console (Settings → Models), set what each alias resolves to:
| Alias | Resolves to |
|---|---|
high | claude-4-7-opus |
medium | claude-4-7-sonnet |
low | gemini-2-flash |
reasoning | claude-4-7-opus |
coding | claude-4-7-sonnet |
creative | claude-4-7-sonnet |
fast | gemini-2-flash |
multimodal | claude-4-7-sonnet |
You can also configure traffic splits for evals:
| Alias | Split |
|---|---|
medium | 80% claude-4-7-sonnet, 20% gpt-5 |
The SDK reports back per-alias metrics — quality, latency, cost — so you can see whether the split is worth keeping.
Quick chat
import { atelier } from '@atelier/sdk';
const text = await atelier.llm.generate({
model: 'medium',
prompt: 'Summarize this document: ' + doc,
});
Streaming
const stream = await atelier.llm.chat({
model: 'medium',
messages: [
{ role: 'system', content: 'You are concise.' },
{ role: 'user', content: question },
],
});
for await (const chunk of stream) {
if (chunk.type === 'text') process.stdout.write(chunk.delta);
}
Structured output
Pass a Zod schema; you get a typed object back.
import { z } from 'zod';
const todo = await atelier.llm.object({
model: 'medium',
schema: z.object({
title: z.string(),
priority: z.number().min(1).max(5),
tags: z.array(z.string()),
}),
prompt: 'Create a todo from: ' + userInput,
});
// todo.title is typed string
Tool use
Define tools as plain functions. The SDK runs the tool-call loop until the model returns text.
const result = await atelier.llm.chat({
model: 'medium',
messages,
tools: {
searchTodos: {
description: 'Search todos for the current user.',
input: z.object({ query: z.string() }),
run: async ({ query }) => {
const todos = await atelier.from('todos')
.select('id, title')
.ilike('title', `%${query}%`);
return todos;
},
},
createTodo: {
description: 'Create a new todo.',
input: z.object({ title: z.string() }),
run: async ({ title }) => atelier.from('todos').insert({ title }),
},
},
});
Tool calls automatically thread the calling user’s identity through to DB queries — RLS holds.
React hook
The chat hook ships streaming, persistence, RAG, and tool calling in one component.
import { atelier } from '@atelier/sdk/react';
function Chat() {
const { messages, sendMessage, isStreaming, error } = atelier.useChat({
model: 'medium',
persist: { table: 'conversations', id: conversationId },
rag: { table: 'docs', limit: 5 },
tools: { searchTodos, createTodo },
});
return (
<>
{messages.map((m) => <Message key={m.id} role={m.role} text={m.text} />)}
<input
disabled={isStreaming}
onKeyDown={(e) => e.key === 'Enter' && sendMessage(e.currentTarget.value)}
/>
</>
);
}
The hook automatically:
- Streams SSE tokens to the UI
- Persists every message to your
conversationstable (RLS scoped to the user) - Runs vector similarity search before each turn and injects results as context (RLS scoped)
- Handles tool calls in a loop
- Recovers from interrupted streams on reconnect
RLS-aware RAG
This is the part no other backend does. When you pass rag, the SDK:
- Embeds the user’s message via your BYOS provider
- Calls
atelier.from('docs').similar('embedding', query, ...)— as the signed-in user - Your RLS policies on
docsfilter what the user can retrieve - Only permitted docs become LLM context
Multi-tenant SaaS where every team has their own data: solved without writing any retrieval code. The model literally can’t answer using documents the user couldn’t read directly.
Cost routing
Routing is per-alias, not per-concrete-model. Define the rules once; concrete model swaps in the console don’t touch your code.
// atelier.config.ts
export default {
llm: {
devProvider: 'desktop-byos', // dev — your keychain, free
routes: {
high: 'end-user-byos', // premium tier → end-user supplies key
medium: 'atelier-credit', // default → project owner pays
low: 'atelier-credit',
reasoning: 'end-user-byos',
coding: 'atelier-credit',
multimodal: 'atelier-credit',
},
},
};
Three routes:
| Route | Who pays | When to use |
|---|---|---|
desktop-byos | Atelier builder (you) | Dev / personal apps |
atelier-credit | Project owner (your subscription) | Apps where you absorb LLM cost |
end-user-byos | End-user of your deployed app | SaaS where users plug in their own key |
end-user-byos ships with a <KeyVault> component the user can configure once.
Embeddings
const embedding = await atelier.llm.embed({
model: 'text-embedding-3-small',
input: text,
});
Embeddings flow through whichever cost route the calling identity uses — same routing logic as chat. See Vector Search for storing and querying.
Multi-modal
await atelier.llm.chat({
model: 'medium',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image', url: atelier.storage.url(fileId) },
],
}],
});
Vision works with any model that supports it. Audio (transcription, speech) uses the same atelier.llm.transcribe() and atelier.llm.speak() surface.
Inside Functions
ctx.llm is the server-side equivalent. Useful when the response shouldn’t reach the browser directly (PII redaction, content moderation, agent loops).
export default async function agent(req, ctx) {
const stream = await ctx.llm.chat({
model: 'medium',
messages,
tools: {
queryDb: async ({ sql }) => ctx.db.query(sql),
sendEmail: async ({ to, body }) => ctx.email.send(...),
},
});
return new Response(stream.toReadableStream());
}
Compared to
Atelier llm | Vercel AI SDK | OpenAI SDK + custom backend | |
|---|---|---|---|
| Streaming | Built in | Built in | DIY SSE |
| Tool calling loop | Built in | Built in | DIY |
| Structured output (Zod) | Built in | Built in | DIY |
| Multi-provider | Built in | Built in | DIY per provider |
| Conversation persistence | Built in | DIY | DIY |
| RLS-aware RAG | Built in | DIY | DIY |
| Cost routing (dev BYOS / credit / end-user BYOS) | Built in | DIY | DIY |
<KeyVault> for end-user BYOS | Built in | DIY | DIY |
The first four are table stakes (and Vercel AI SDK is excellent at them). The last four are why you’d want this on top of a Base that already knows your user, your tables, and your RLS.