Skip to content

Test-Driven Development

The Iron Law: NO PRODUCTION CODE MAY BE WRITTEN WITHOUT A FAILING TEST FIRST.


Why TDD is Non-Negotiable

Test-Driven Development (TDD) is the most frequently debated and most frequently skipped practice in software engineering. Developers argue that writing tests first is slow, that it's awkward for certain kinds of code, that the deadline is too close, or that they'll add tests later.

Superpowers treats all of these arguments as rationalizations and none of them as acceptable reasons to skip the test-first discipline.

Here's the core problem with writing code first: when you write code before the test, the test is written to match the code, not to verify the intent. You end up testing what the code does, not what it should do. The test becomes a documentation artifact rather than a correctness guarantee. Bugs that were present in the original code get encoded into the tests as expected behavior.

When you write the test first:

  • You are forced to specify the exact behavior you want before you decide how to implement it
  • You discover interface problems before you've built the implementation (cheap to fix)
  • The failing test is proof that your test is actually testing something
  • The passing test after implementation is proof that you built what you intended

TDD is not slower. It is the elimination of the slow part: the time spent debugging code that was never properly specified.


The RED-GREEN-REFACTOR Cycle

TDD follows a simple three-phase cycle for every piece of functionality:

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│   RED ──────────────────→ GREEN ──────────────→ REFACTOR   │
│                                                             │
│   Write a failing       Make it pass with     Improve the  │
│   test for the          the simplest code     code without  │
│   required behavior.    possible.             changing     │
│                                               behavior.    │
└─────────────────────────────────────────────────────────────┘

RED: Write a Failing Test

Write a test that describes exactly one piece of required behavior. Run it. Watch it fail.

This step is not optional. If you write a test and it passes immediately without any implementation, one of two things is true:

  1. The functionality already exists (and you should check why)
  2. Your test is wrong — it's not actually testing what you think it's testing

A test that passes before the implementation is worthless. The red phase exists to confirm your test is real.

// RED: This test fails because calculateDiscount doesn't exist yet
test('applies 10% discount for orders over $100', () => {
  const order = { total: 150, items: [] };
  expect(calculateDiscount(order)).toBe(15);
});

GREEN: Make it Pass (Simply)

Write the simplest code that makes the failing test pass. Not the most elegant code. Not the most extensible code. The simplest code.

This rule prevents over-engineering in the implementation phase. You are not allowed to build for cases that don't have tests yet. If you think "what about when the discount is 20% for orders over $500?" — write a test for that first, then make it pass.

// GREEN: The simplest implementation that makes the test pass
function calculateDiscount(order: Order): number {
  if (order.total > 100) {
    return order.total * 0.10;
  }
  return 0;
}

REFACTOR: Improve Without Breaking

Once the test passes, you have a safety net. Now you can improve the code: better naming, extract a helper function, remove duplication, improve readability — anything that improves quality without changing behavior.

Run the tests after every refactor. If they still pass, you haven't broken anything. If a test fails, undo the last change.

// REFACTOR: Clearer naming and constants
const DISCOUNT_THRESHOLD = 100;
const DISCOUNT_RATE = 0.10;

function calculateDiscount(order: Order): number {
  const qualifiesForDiscount = order.total > DISCOUNT_THRESHOLD;
  return qualifiesForDiscount ? order.total * DISCOUNT_RATE : 0;
}

The test still passes. The code is cleaner. The cycle is complete.


What To Do When Code Is Written Before Tests

This situation will arise. An AI agent will write implementation code before the tests. A developer under deadline pressure will do it. The question is what to do when it happens.

Superpowers is explicit:

DELETE IT. START OVER.

Not "add tests now." Not "that's close enough." Delete the implementation code and write the test first.

This seems harsh until you understand why. Code written without a failing test first has no verified specification. The implementation encodes assumptions that have not been challenged. When you write the test after the fact, you are writing a test that you know will pass — which means you are not discovering anything. You are just checking that your code does what your code does.

The test-first discipline is not just about having tests. It is about the process of specifying behavior before implementing it. Retrofitted tests skip this step.

IF: Production code was written without a preceding failing test
THEN:
  1. Delete the production code
  2. Commit the deletion
  3. Write a failing test
  4. Implement against the test
  5. Continue

Rationalization Table

When the pressure to skip TDD builds, these are the common arguments and why they don't hold:

The ExcuseThe Reality
"I'll add tests later"Tests added later test what the code does, not what it should do. Later rarely comes.
"This code is too simple to need tests"Simple code has bugs too. The test takes 2 minutes.
"The deadline is tomorrow"Untested code delivered fast produces bugs that take longer to fix than writing the tests would have.
"I can't write a test for this kind of code"You can. If it's genuinely hard to test, that's a signal the code design is wrong.
"We'll do exploratory coding first, then formalize"Exploratory code becomes production code. Delete it or test it — no third option.
"The AI generated it, so it's probably fine"AI-generated code has bugs. AI-generated code without tests has undetected bugs.
"Adding tests now would slow us down"Adding tests after bugs are found in production is far slower.
"Testing this requires mocking too many things"Too many mocks means the design has too many dependencies. Fix the design.

Verification Before Completion

Before claiming any feature is complete, the Superpowers verification protocol requires a 5-step gate:

┌────────────────────────────────────────────────────────────────┐
│                  VERIFICATION GATE                             │
│                                                                │
│  Step 1: IDENTIFY — List every test relevant to this feature   │
│  Step 2: RUN      — Execute the test suite (don't just assume) │
│  Step 3: READ     — Read the actual output, line by line       │
│  Step 4: VERIFY   — Confirm each test passes in the output     │
│  Step 5: CLAIM    — Only now state that the feature is complete │
└────────────────────────────────────────────────────────────────┘

Step 1: IDENTIFY

List every test file and test case that covers the feature being completed. Not just the tests you wrote today — any existing tests that touch the affected code.

Step 2: RUN

Actually run the test suite. Not in your head. Not assumed. Run it.

npm test -- --coverage --testPathPattern=order

Step 3: READ

Read the actual output. Every line. Not just "did it pass or fail overall." Look at:

  • Which tests ran
  • Which tests passed
  • Which tests were skipped
  • Coverage numbers

Step 4: VERIFY

For each test in your IDENTIFY list, confirm it appears in the output and shows as passing. If a test is missing from the output, it didn't run. If a test is failing, the feature is not complete.

Step 5: CLAIM

Only after completing steps 1–4 may the AI (or developer) state: "This feature is complete."

Saying "it should work" is not completing step 5. Only actual test output completes step 5.


Red Flags That Require Stopping

If any of these situations arise during TDD, stop work immediately and escalate:

Red FlagWhat It Means
A test passes before implementation existsThe test is probably wrong
Tests pass but the feature doesn't behave correctlyTests are testing the wrong thing
Every test in the suite passes after a large refactor on the first runSuspicious — verify no tests were inadvertently skipped
Test coverage goes down when adding a new featureGaps are being introduced
A test requires changing 15 other tests to make it passThe design change was too large; break it into smaller steps
The AI claims tests pass without showing outputUnacceptable — require actual output before proceeding

TDD in the Context of Superpowers Plans

As noted in Writing Plans, every implementation task in a plan must be preceded by a test task. The plan structure enforces TDD at the planning level:

Task N:   Write failing test for [behavior]     ← RED
Task N+1: Implement [behavior]                  ← GREEN
[Refactor happens within Task N+1 or as Task N+2 if scope warrants]

When executing a plan, if a subagent is dispatched to Task N+1 (implementation) without Task N (test) having been completed first, the subagent must reject the task and report it as BLOCKED. An implementation task without a preceding failing test is a plan defect.


Running TDD with Superpowers

To invoke the TDD skill explicitly:

/test-driven-development I need to implement user authentication

The AI will guide you through the RED-GREEN-REFACTOR cycle, enforce the failing-test-first requirement, and apply the 5-step verification gate before marking anything complete.


TDD is the difference between believing your code works and knowing it works. In professional software development, belief is not an acceptable standard. Evidence is.