LLM

Build AI chat into your app with one SDK call. Atelier routes between your dev BYOS tunnel, Atelier credits, and end-user-supplied keys automatically. Streaming, tool use, structured output, vision, and RLS-aware RAG are all built in.

What you don’t have to build

Streaming SSE wrapper
Provider key management for end-users
Conversation persistence schema
RLS-aware vector retrieval (the user’s permissions filter their RAG context automatically)
Token usage metering
Tool-call orchestration loop
Cost routing (free during dev, credit/BYOS in prod)

That’s the work @atelier/sdk handles.

Model aliases — never hardcode a model name

Models move fast. Your code should declare intent, the Base console maps intent to a concrete model. When Claude 5 ships or you want to A/B test, you change one row in the console — no redeploy.

Two axes of aliases — pick whichever reads better in your code.

Tier — performance vs cost:

Alias	What it means
`high`	Best quality. Slowest, priciest.
`medium`	Balanced. Default for most calls.
`low`	Fast and cheap. Good for high-volume, low-stakes work.

Intent — purpose-driven:

Alias	What it means
`reasoning`	Complex inference, planning, chain-of-thought.
`coding`	Code generation, review, refactoring.
`creative`	Writing, design, ideation.
`fast`	Snappy interactive UX.
`multimodal`	Vision, audio, mixed input.

atelier.llm.chat({ model: 'medium' });      // tier
atelier.llm.chat({ model: 'reasoning' });   // intent
atelier.llm.chat({ model: 'fast' });        // intent

You can still pin to an exact model if you need to:

atelier.llm.chat({ model: 'claude-4-7-sonnet' });

But that’s the exception. Default to aliases.

Console mapping

In the Base console (Settings → Models), set what each alias resolves to:

Alias	Resolves to
`high`	`claude-4-7-opus`
`medium`	`claude-4-7-sonnet`
`low`	`gemini-2-flash`
`reasoning`	`claude-4-7-opus`
`coding`	`claude-4-7-sonnet`
`creative`	`claude-4-7-sonnet`
`fast`	`gemini-2-flash`
`multimodal`	`claude-4-7-sonnet`

You can also configure traffic splits for evals:

Alias	Split
`medium`	80% `claude-4-7-sonnet`, 20% `gpt-5`

The SDK reports back per-alias metrics — quality, latency, cost — so you can see whether the split is worth keeping.

Quick chat

import { atelier } from '@atelier/sdk';
 
const text = await atelier.llm.generate({
  model: 'medium',
  prompt: 'Summarize this document: ' + doc,
});

Streaming

const stream = await atelier.llm.chat({
  model: 'medium',
  messages: [
    { role: 'system', content: 'You are concise.' },
    { role: 'user', content: question },
  ],
});
 
for await (const chunk of stream) {
  if (chunk.type === 'text') process.stdout.write(chunk.delta);
}

Structured output

Pass a Zod schema; you get a typed object back.

import { z } from 'zod';
 
const todo = await atelier.llm.object({
  model: 'medium',
  schema: z.object({
    title: z.string(),
    priority: z.number().min(1).max(5),
    tags: z.array(z.string()),
  }),
  prompt: 'Create a todo from: ' + userInput,
});
 
// todo.title is typed string

Tool use

Define tools as plain functions. The SDK runs the tool-call loop until the model returns text.

const result = await atelier.llm.chat({
  model: 'medium',
  messages,
  tools: {
    searchTodos: {
      description: 'Search todos for the current user.',
      input: z.object({ query: z.string() }),
      run: async ({ query }) => {
        const todos = await atelier.from('todos')
          .select('id, title')
          .ilike('title', `%${query}%`);
        return todos;
      },
    },
    createTodo: {
      description: 'Create a new todo.',
      input: z.object({ title: z.string() }),
      run: async ({ title }) => atelier.from('todos').insert({ title }),
    },
  },
});

Tool calls automatically thread the calling user’s identity through to DB queries — RLS holds.

React hook

The chat hook ships streaming, persistence, RAG, and tool calling in one component.

import { atelier } from '@atelier/sdk/react';
 
function Chat() {
  const { messages, sendMessage, isStreaming, error } = atelier.useChat({
    model: 'medium',
    persist: { table: 'conversations', id: conversationId },
    rag: { table: 'docs', limit: 5 },
    tools: { searchTodos, createTodo },
  });
 
  return (
    <>
      {messages.map((m) => <Message key={m.id} role={m.role} text={m.text} />)}
      <input
        disabled={isStreaming}
        onKeyDown={(e) => e.key === 'Enter' && sendMessage(e.currentTarget.value)}
      />
    </>
  );
}

The hook automatically:

Streams SSE tokens to the UI
Persists every message to your conversations table (RLS scoped to the user)
Runs vector similarity search before each turn and injects results as context (RLS scoped)
Handles tool calls in a loop
Recovers from interrupted streams on reconnect

RLS-aware RAG

This is the part no other backend does. When you pass rag, the SDK:

Embeds the user’s message via your BYOS provider
Calls atelier.from('docs').similar('embedding', query, ...) — as the signed-in user
Your RLS policies on docs filter what the user can retrieve
Only permitted docs become LLM context

Multi-tenant SaaS where every team has their own data: solved without writing any retrieval code. The model literally can’t answer using documents the user couldn’t read directly.

Cost routing

Routing is per-alias, not per-concrete-model. Define the rules once; concrete model swaps in the console don’t touch your code.

// atelier.config.ts
export default {
  llm: {
    devProvider: 'desktop-byos',  // dev — your keychain, free
    routes: {
      high:       'end-user-byos',   // premium tier → end-user supplies key
      medium:     'atelier-credit',  // default → project owner pays
      low:        'atelier-credit',
      reasoning:  'end-user-byos',
      coding:     'atelier-credit',
      multimodal: 'atelier-credit',
    },
  },
};

Three routes:

Route	Who pays	When to use
`desktop-byos`	Atelier builder (you)	Dev / personal apps
`atelier-credit`	Project owner (your subscription)	Apps where you absorb LLM cost
`end-user-byos`	End-user of your deployed app	SaaS where users plug in their own key

end-user-byos ships with a <KeyVault> component the user can configure once.

Embeddings

const embedding = await atelier.llm.embed({
  model: 'text-embedding-3-small',
  input: text,
});

Embeddings flow through whichever cost route the calling identity uses — same routing logic as chat. See Vector Search for storing and querying.

await atelier.llm.chat({
  model: 'medium',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is in this image?' },
      { type: 'image', url: atelier.storage.url(fileId) },
    ],
  }],
});

Vision works with any model that supports it. Audio (transcription, speech) uses the same atelier.llm.transcribe() and atelier.llm.speak() surface.

Inside Functions

ctx.llm is the server-side equivalent. Useful when the response shouldn’t reach the browser directly (PII redaction, content moderation, agent loops).

export default async function agent(req, ctx) {
  const stream = await ctx.llm.chat({
    model: 'medium',
    messages,
    tools: {
      queryDb: async ({ sql }) => ctx.db.query(sql),
      sendEmail: async ({ to, body }) => ctx.email.send(...),
    },
  });
  return new Response(stream.toReadableStream());
}

Compared to

	Atelier `llm`	Vercel AI SDK	OpenAI SDK + custom backend
Streaming	Built in	Built in	DIY SSE
Tool calling loop	Built in	Built in	DIY
Structured output (Zod)	Built in	Built in	DIY
Multi-provider	Built in	Built in	DIY per provider
Conversation persistence	Built in	DIY	DIY
RLS-aware RAG	Built in	DIY	DIY
Cost routing (dev BYOS / credit / end-user BYOS)	Built in	DIY	DIY
`<KeyVault>` for end-user BYOS	Built in	DIY	DIY

The first four are table stakes (and Vercel AI SDK is excellent at them). The last four are why you’d want this on top of a Base that already knows your user, your tables, and your RLS.

LLM