Atelier

LLM

Build AI chat into your app with one SDK call. Atelier routes between your dev BYOS tunnel, Atelier credits, and end-user-supplied keys automatically. Streaming, tool use, structured output, vision, and RLS-aware RAG are all built in.

What you don’t have to build

  • Streaming SSE wrapper
  • Provider key management for end-users
  • Conversation persistence schema
  • RLS-aware vector retrieval (the user’s permissions filter their RAG context automatically)
  • Token usage metering
  • Tool-call orchestration loop
  • Cost routing (free during dev, credit/BYOS in prod)

That’s the work @atelier/sdk handles.

Model aliases — never hardcode a model name

Models move fast. Your code should declare intent, the Base console maps intent to a concrete model. When Claude 5 ships or you want to A/B test, you change one row in the console — no redeploy.

Two axes of aliases — pick whichever reads better in your code.

Tier — performance vs cost:

AliasWhat it means
highBest quality. Slowest, priciest.
mediumBalanced. Default for most calls.
lowFast and cheap. Good for high-volume, low-stakes work.

Intent — purpose-driven:

AliasWhat it means
reasoningComplex inference, planning, chain-of-thought.
codingCode generation, review, refactoring.
creativeWriting, design, ideation.
fastSnappy interactive UX.
multimodalVision, audio, mixed input.
atelier.llm.chat({ model: 'medium' });      // tier
atelier.llm.chat({ model: 'reasoning' });   // intent
atelier.llm.chat({ model: 'fast' });        // intent

You can still pin to an exact model if you need to:

atelier.llm.chat({ model: 'claude-4-7-sonnet' });

But that’s the exception. Default to aliases.

Console mapping

In the Base console (Settings → Models), set what each alias resolves to:

AliasResolves to
highclaude-4-7-opus
mediumclaude-4-7-sonnet
lowgemini-2-flash
reasoningclaude-4-7-opus
codingclaude-4-7-sonnet
creativeclaude-4-7-sonnet
fastgemini-2-flash
multimodalclaude-4-7-sonnet

You can also configure traffic splits for evals:

AliasSplit
medium80% claude-4-7-sonnet, 20% gpt-5

The SDK reports back per-alias metrics — quality, latency, cost — so you can see whether the split is worth keeping.

Quick chat

import { atelier } from '@atelier/sdk';
 
const text = await atelier.llm.generate({
  model: 'medium',
  prompt: 'Summarize this document: ' + doc,
});

Streaming

const stream = await atelier.llm.chat({
  model: 'medium',
  messages: [
    { role: 'system', content: 'You are concise.' },
    { role: 'user', content: question },
  ],
});
 
for await (const chunk of stream) {
  if (chunk.type === 'text') process.stdout.write(chunk.delta);
}

Structured output

Pass a Zod schema; you get a typed object back.

import { z } from 'zod';
 
const todo = await atelier.llm.object({
  model: 'medium',
  schema: z.object({
    title: z.string(),
    priority: z.number().min(1).max(5),
    tags: z.array(z.string()),
  }),
  prompt: 'Create a todo from: ' + userInput,
});
 
// todo.title is typed string

Tool use

Define tools as plain functions. The SDK runs the tool-call loop until the model returns text.

const result = await atelier.llm.chat({
  model: 'medium',
  messages,
  tools: {
    searchTodos: {
      description: 'Search todos for the current user.',
      input: z.object({ query: z.string() }),
      run: async ({ query }) => {
        const todos = await atelier.from('todos')
          .select('id, title')
          .ilike('title', `%${query}%`);
        return todos;
      },
    },
    createTodo: {
      description: 'Create a new todo.',
      input: z.object({ title: z.string() }),
      run: async ({ title }) => atelier.from('todos').insert({ title }),
    },
  },
});

Tool calls automatically thread the calling user’s identity through to DB queries — RLS holds.

React hook

The chat hook ships streaming, persistence, RAG, and tool calling in one component.

import { atelier } from '@atelier/sdk/react';
 
function Chat() {
  const { messages, sendMessage, isStreaming, error } = atelier.useChat({
    model: 'medium',
    persist: { table: 'conversations', id: conversationId },
    rag: { table: 'docs', limit: 5 },
    tools: { searchTodos, createTodo },
  });
 
  return (
    <>
      {messages.map((m) => <Message key={m.id} role={m.role} text={m.text} />)}
      <input
        disabled={isStreaming}
        onKeyDown={(e) => e.key === 'Enter' && sendMessage(e.currentTarget.value)}
      />
    </>
  );
}

The hook automatically:

  • Streams SSE tokens to the UI
  • Persists every message to your conversations table (RLS scoped to the user)
  • Runs vector similarity search before each turn and injects results as context (RLS scoped)
  • Handles tool calls in a loop
  • Recovers from interrupted streams on reconnect

RLS-aware RAG

This is the part no other backend does. When you pass rag, the SDK:

  1. Embeds the user’s message via your BYOS provider
  2. Calls atelier.from('docs').similar('embedding', query, ...)as the signed-in user
  3. Your RLS policies on docs filter what the user can retrieve
  4. Only permitted docs become LLM context

Multi-tenant SaaS where every team has their own data: solved without writing any retrieval code. The model literally can’t answer using documents the user couldn’t read directly.

Cost routing

Routing is per-alias, not per-concrete-model. Define the rules once; concrete model swaps in the console don’t touch your code.

// atelier.config.ts
export default {
  llm: {
    devProvider: 'desktop-byos',  // dev — your keychain, free
    routes: {
      high:       'end-user-byos',   // premium tier → end-user supplies key
      medium:     'atelier-credit',  // default → project owner pays
      low:        'atelier-credit',
      reasoning:  'end-user-byos',
      coding:     'atelier-credit',
      multimodal: 'atelier-credit',
    },
  },
};

Three routes:

RouteWho paysWhen to use
desktop-byosAtelier builder (you)Dev / personal apps
atelier-creditProject owner (your subscription)Apps where you absorb LLM cost
end-user-byosEnd-user of your deployed appSaaS where users plug in their own key

end-user-byos ships with a <KeyVault> component the user can configure once.

Embeddings

const embedding = await atelier.llm.embed({
  model: 'text-embedding-3-small',
  input: text,
});

Embeddings flow through whichever cost route the calling identity uses — same routing logic as chat. See Vector Search for storing and querying.

Multi-modal

await atelier.llm.chat({
  model: 'medium',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is in this image?' },
      { type: 'image', url: atelier.storage.url(fileId) },
    ],
  }],
});

Vision works with any model that supports it. Audio (transcription, speech) uses the same atelier.llm.transcribe() and atelier.llm.speak() surface.

Inside Functions

ctx.llm is the server-side equivalent. Useful when the response shouldn’t reach the browser directly (PII redaction, content moderation, agent loops).

export default async function agent(req, ctx) {
  const stream = await ctx.llm.chat({
    model: 'medium',
    messages,
    tools: {
      queryDb: async ({ sql }) => ctx.db.query(sql),
      sendEmail: async ({ to, body }) => ctx.email.send(...),
    },
  });
  return new Response(stream.toReadableStream());
}

Compared to

Atelier llmVercel AI SDKOpenAI SDK + custom backend
StreamingBuilt inBuilt inDIY SSE
Tool calling loopBuilt inBuilt inDIY
Structured output (Zod)Built inBuilt inDIY
Multi-providerBuilt inBuilt inDIY per provider
Conversation persistenceBuilt inDIYDIY
RLS-aware RAGBuilt inDIYDIY
Cost routing (dev BYOS / credit / end-user BYOS)Built inDIYDIY
<KeyVault> for end-user BYOSBuilt inDIYDIY

The first four are table stakes (and Vercel AI SDK is excellent at them). The last four are why you’d want this on top of a Base that already knows your user, your tables, and your RLS.