Testing

Ships next

This page describes a Studio surface that is not yet shipped. The architecture and acceptance behavior below reflect the design we’re committed to and that trackers #174–181 cover; the live Studio does not have a Testing surface today. Track progress on the public roadmap.

Production-grade SaaS needs tests from minute one, not bolt-on later. Atelier ships the full test pipeline by default and adds the things only Atelier can do — RLS-aware isolation checks, AI-generated test cases that read your schema, Design Cascade-aware visual regression.

Run the suite

atelier test           # everything
atelier test unit
atelier test e2e
atelier test rls       # multi-tenant isolation
atelier test visual    # screenshot diff vs main
atelier test a11y      # axe-core
atelier test load

Every command runs locally and on every PR. Results land as a single PR comment.

What runs by default

Unit (Vitest) — fast, isolated, runs on every file save in dev.
Integration — real PGlite database, real RLS context, no mocks.
E2E (Playwright) — Browser Rendering API hosts the headless Chrome. No third-party CI service.
A11y (axe-core) — every page checked for WCAG 2.1 AA. Lighthouse a11y > 95.
Lint (ESLint + TypeScript strict) — failing lint fails the PR.

.atelier/tests/
├── unit/
├── integration/
├── e2e/
└── rls/

Atelier scaffolds opinionated examples for each on atelier init.

RLS test runner

The multi-tenant isolation check Atelier alone can write for you. Because Atelier owns the RLS policies, the user identity model, and the schema, it can generate the tenant-isolation cases automatically.

import { atelier } from '@atelier/sdk/testing';
 
const { tenantA, tenantB } = await atelier.test.tenants({ count: 2 });
 
await tenantA.from('invoices').insert({ amount: 100 });
 
// One assertion generates positive + negative cases per policy.
await atelier.test.rls.assert({
  table: 'invoices',
  tenantIsolation: true,
});

What it verifies, per policy on every protected table:

Owner reads / writes succeed.
Cross-tenant reads return empty.
Cross-tenant writes throw.
Role-gated mutations honor the role.
Admin overrides match the policy.

Add a new CREATE POLICY and the next PR automatically tests it. Coverage shows in the PR comment — every policy must have a positive and a negative case.

Test cases as the source of truth

TestRail-style test cases live alongside your code. The same case is the description a PM reads, the artifact a QA runs, and the spec the code compiles from. Cases come in via AI from the PRD; compile to Playwright or Vitest; sync both ways when either changes.

# .atelier/test-cases/INV-001-send-invoice.yaml
id: INV-001
title: Send an invoice to a client
type: e2e
persona: Freelancer
linkedPrdSection: 3.1 Invoice send
preconditions:
  - User signed in
  - At least one client exists
steps:
  - action: Open Invoices page
    expected: List of invoices visible
  - action: Click 'New invoice'
    expected: New invoice form opens
  - action: Fill amount, client, due date
  - action: Click 'Send'
    expected: Email sent, invoice status 'sent'
tags: [smoke, regression]

Browse, filter, and edit in the Console. Coverage view shows which PRD section, persona, or journey step has no cases.

Auto-generate cases

In the Console, open Project → Quality → Test cases → Generate. Pick what to source from: PRD section, persona, journey step, schema, or all four. Atelier drafts cases and opens them as a pull request — you review, edit, approve, or reject per case.

Atelier reads the PRD, the personas, the journey, the schema, and the RLS policies. Empty PRD sections produce TODO cases. New migrations get CRUD + isolation cases automatically.

Or run atelier test cases improve from CI to do the same headlessly.

From the Console’s Generate panel, paste a sentence — When a user upgrades from trial, their billing immediately reflects the new tier — and Atelier walks the journey to draft the steps. Or via CLI:

atelier test cases new \\
  "When a user upgrades from trial, their billing immediately reflects the new tier"

The agent picks the right schema, walks the journey, drafts the steps.

Cases compile to code

The YAML becomes a Playwright or Vitest spec on the next build:

// .atelier/tests/generated/INV-001.spec.ts
// from test-case: INV-001 (Send an invoice to a client)
 
import { test, expect } from '@atelier/test';
 
test('Send an invoice to a client', async ({ atelier, page }) => {
  const user = await atelier.test.signIn({ persona: 'Freelancer' });
  await atelier.test.fixture('client', { tenant: user.tenant });
 
  // Step 1: Open Invoices page
  await page.goto('/invoices');
  await expect(page.getByRole('list', { name: 'Invoices' })).toBeVisible();
 
  // Step 2: Click 'New invoice'
  await page.getByRole('button', { name: 'New invoice' }).click();
  await expect(page.getByRole('form', { name: 'New invoice' })).toBeVisible();
 
  // ... etc.
});

The compiler is schema-aware: “Fill amount” knows it’s a number input on the invoice form. “Send” knows it triggers the send mutation. The result is a working test, not a stub.

Two-way sync

The Console’s Drift tab shows any case ↔ spec pair that’s out of sync — left side the YAML, right side the compiled spec, the diff in the middle. Accept the change to whichever side won; Atelier rewrites the other to match.

Edit the code (smarter selector, extra assertion, helper extracted) and Atelier reverse-compiles into the YAML. Edit the YAML (new step, refined expected) and the code recompiles. A daily cron checks the whole project; mismatches surface as Console notifications and as PR review comments on any open PRs that touched the file.

When both sides moved independently, the Console drops the conflict into a Resolve modal — pick one, blend manually, or roll back either side.

Test plans and runs

Group cases into a test plan for a release. Assign environments (preview, staging, production), assign cases to testers, set priority and risk. Open the plan as a test run:

Automated cases (compiled to specs) execute in CI and update their status.
Manual cases show in the Console with step-by-step UI — tester ticks pass / fail / blocked / retest, attaches screenshot or comment per step.

A unified view shows the run state across both — pass rate per persona, per PRD section, per persona / journey, trended over releases. Flaky cases bubble up automatically.

When a case fails

Failure surfaces with the step number, the expected value, and the actual. One click pulls the bug-repro bundle (Testing → Bug reproduction) — stack trace, session replay, DB snapshot, deploy SHA — and attaches it to a tracker issue. The tracker issue links back to the case; closing the issue marks the case retestable.

For multi-tenant SaaS sold to teams, test plans can be scoped per tenant — operating customers can run their own plans against their own data slices.

AI-generated test cases

atelier test improve reads your codebase, your schema, your RLS, your SDK calls, and writes the tests you forgot.

atelier test improve

Triggers built in:

New migration → schema-shaped query tests.
New component → snapshot + interaction + a11y test.
New Function → input / output / error-path tests.
New RLS policy → coverage test.

Or write a spec in natural language and let it compile:

atelier.test.spec(`
  When a trial user creates their 4th invoice,
  the system should prompt them to upgrade.
`);

The agent reads the schema, finds the right routes, writes the Playwright steps, opens a PR.

Visual regression

Browser Rendering captures every page on every PR. Pixel-diff against main. PR comment shows what changed.

// atelier.config.ts
export default {
  testing: {
    visual: {
      pages: ['/', '/pricing', '/app/dashboard'],
      variants: ['desktop', 'mobile', 'dark'],
      locales: ['en', 'ko', 'ja'],
    },
  },
};

Design Cascade-aware: change a token in the design system and Atelier tells you which pages it will move before you ship. Bound projects get a heads-up too.

No Percy, no Chromatic, no per-snapshot bill.

Synthetic monitoring

Build-time tests prove what was true at deploy. Synthetics prove what’s true right now.

// .atelier/synthetic/checkout.ts
export const schedule = '*/5 * * * *';  // every 5 min
 
export default async function checkoutHealth(ctx) {
  const session = await ctx.test.signIn('synthetic@test.com');
  const cart = await session.api.post('/cart', { item: 'test' });
  const checkout = await session.api.post('/checkout', { cartId: cart.id });
  ctx.test.assert(checkout.status === 'ok');
}

Runs from multiple regions, measures latency, screenshots on failure, alerts through the same on-call surface as everything else.

Load testing

// .atelier/load/checkout.ts
export const stages = [
  { duration: '1m', users: 50 },
  { duration: '3m', users: 500 },
  { duration: '1m', users: 50 },
];
 
export default async function checkoutLoad(session) {
  const user = await session.signUp();
  await session.api.post('/cart', { ... });
  await session.api.post('/checkout', { ... });
  session.assert.responseTime < 500;
}

Run on demand or wire as a pre-deploy gate before canary. The traffic runs through your own Functions pool, so cost is just function invocations — no k6 Cloud subscription.

Contract testing

The SDK is typed end-to-end. Every release snapshots the public type surface; the next build diffs it. Breaking changes become explicit migrations:

semver bump suggested
migration guide drafted (atelier.llm)
consumer apps’ usage analyzed for impact
deprecation pipeline (warn → error → remove) over configurable windows

Bug reproduction

When something breaks in production, Atelier bundles four artifacts at the moment of failure:

Stack trace + breadcrumb (from Observability)
Session replay (from Analytics, with PII masked)
DB snapshot of the rows the user could see (RLS-applied)
Deploy SHA of the exact code running at that moment

Click Reproduce on the bundle and Atelier spins up a preview deploy:

Same SHA. Same DB snapshot. Same identity. The replay plays back through the live preview.

What was a half-day of “I can’t repro this locally” is a minute.

Per-PR pipeline

Every pull request gets the full suite, isolated:

Push to PR branch
  ↓ preview deploy (#159)
  ↓ DB branch (#160)
  ↓ unit + integration + e2e + rls + visual + a11y
  ↓ all green? → PR comment with preview URL + coverage
  ↓ merge → main pipeline + canary rollout

Vercel hands you preview deploys. Supabase hands you DB branches. Atelier wires both and runs the tests that need both — no one else combines them.

In Functions

ctx.test mirrors the client surface, scoped to the caller’s identity so RLS holds. Useful for scheduled spec checks and migration-time validation:

export default async function postMigrate(_req, ctx) {
  await ctx.test.rls.assert({ table: 'invoices', tenantIsolation: true });
  await ctx.test.spec('No invoices became visible across tenants.');
}