TECH · PART 8

Testing and Testability

Part 8 of 8: How the architecture makes testing straightforward — and why testability is the proof the design works

Adelphi Liong • April 23, 2026 • 13 min read

tech

#software-design

#testing

#test-pyramid

#architecture

#best-practices

On this page

Parts 1 through 7 gave us philosophy, principles, patterns, domain modeling, architecture, and wiring. Now the question: how do you know it works? This part shows how the design we have built makes testing straightforward — and why testability is the proof that the design works.

Why Tests Matter

A test is a second opinion on your code. You write a function, you think it works. Then you write a test that calls it with specific inputs and checks the output. The test asks a different question: not "how do I make this work?" but "does this actually do what I intended?"

A codebase without tests has been checked exactly once — by the person who wrote it, in the moment they wrote it. The off-by-one error, the forgotten null check, the empty list case — these slip through because one pass is never enough.

A codebase with tests is checked twice, and the second check runs every time the code changes. Forever.

But here is the deeper insight, and the real reason this part exists: if code is hard to test, it is hard to change. The properties that make code testable — explicit dependencies, immutable data, pure functions, injectable collaborators — are exactly the properties that make code changeable. This entire series has been building toward code that is easy to test. Testing is not a separate concern bolted on at the end. It is the proof that the design works.

If you followed the principles in Parts 1 through 7 and your code is still hard to test, something went wrong in the design. The tests are telling you.

The Test Pyramid

The shape of the pyramid is deliberate: tests at the bottom are fast, cheap, and numerous. Tests at the top are slow, expensive, and few. A healthy codebase has many unit tests, some functional tests, fewer integration tests, and a minimal number of SIT/E2E tests.

The Test Pyramid

Slow · Expensive · Few

E2E

Playwright / Cypress

SIT

k6 / Gatling

Integration

Testcontainers

Functional

black-box · LSP contracts

Unit

white-box · 100% coverage

Fast · Cheap · Many

many fast dots few slow dots

Tiers reveal from the base up; then the dots run — the wide Unit base streams many fast tests, while each higher tier carries progressively fewer, slower ones, the same trade-off the pyramid encodes.

Why this shape? Because of a fundamental trade-off: the higher you go, the more confidence you get that the whole system works, but the slower and more brittle the tests become. A unit test runs in milliseconds and tells you exactly which function broke. An E2E test takes seconds (or minutes), and when it fails, you have to hunt through the entire stack to find the culprit.

The strategy is to catch as many bugs as possible at the lowest level, then use higher levels only for things that lower levels cannot catch.

Unit Tests (White-Box)

Unit tests examine the internal implementation of a single class or function. They are white-box — they know about dependencies and internal structure. They aim for 100% code coverage.

"Wait, 100%? Isn't that excessive?" you might ask. If your classes follow SRP and your methods are small, 100% coverage is trivially achievable. If it feels excessive, the class is probably doing too much. The difficulty of achieving coverage is, itself, design feedback.

The AAA Pattern

Every unit test follows the same structure: Arrange, Act, Assert.

typescript

describe('PostService', () => {
  it('should create post with valid title', () => {
    // Arrange - set up the subject, inputs, expected result
    const mockRepo = { create: r => Ok(mockPost) };
    const subject = new PostService(mockRepo);
    const input = { title: 'Hello', description: 'World', tags: [] };
    const expected = mockPost;

    // Act - call the method under test
    const actual = subject.create(input);

    // Assert - verify the result
    actual.should.eql(Ok(expected));
  });
});

Three sections. Clean separation. When a test is hard to write, the difficulty usually shows up in Arrange — too many collaborators to set up, too much state to initialize. That is not the test's fault. That is the design telling you the class has too many responsibilities.

Standard Variable Names

Variable	Meaning
`subject`	The class or function being tested
`input`	Data passed to the method
`expected`	What we expect the method to return
`actual`	What the method actually returned

These names are a team convention. When every test uses them, you can scan any test file and immediately know what is being tested, what goes in, and what comes out.

Triangulation: Why One Test Case Is Not Enough

A single test case can pass by accident. Maybe the implementation is hardcoded. Maybe it works for that specific input but breaks for everything else. Multiple cases prove correctness.

typescript

// Risky - single case, might pass by luck
it('should format status', () => {
  expect(formatStatus('pending')).toBe('Pending');
});

// Better - multiple cases prove the logic
it.each([
  ['pending', 'Pending'],
  ['running', 'Running'],
  ['completed', 'Completed'],
])('should format status (%s -> %s)', (input, expected) => {
  expect(formatStatus(input)).toBe(expected);
});

This is triangulation: you test enough cases that the only reasonable implementation is the correct one.

Deterministic and Fast

Unit tests must be:

Deterministic — no random values, no real time, no network calls
Fast — no sleep, no real IO
Isolated — no dependence on test execution order

typescript

// Wrong - uses real time (slow, flaky)
it('should timeout after 1 second', async () => {
  const start = Date.now();
  await subject.doSomething();
  const elapsed = Date.now() - start;
  expect(elapsed).toBeGreaterThan(1000);
});

// Right - uses injected clock (fast, deterministic)
it('should timeout after deadline', () => {
  const clock = new FakeClock();
  const subject = new Service(clock);
  clock.tick(1001);
  subject.hasTimedOut().should.be.true();
});

If you need to inject a clock to test time-dependent logic, that is not a hack — that is the architecture working as intended. The dependency was injectable (Part 2), so the test controls it.

Functional Tests (Black-Box)

Functional tests verify behavior through interfaces. They do not know about internal implementation — only inputs, outputs, and the interface contract.

Here is the key difference from unit tests: a unit test knows how the class works internally. A functional test only knows what the interface promises.

Why Functional Tests Matter

Functional tests validate the interface contract. They ensure that any implementation of the interface will behave correctly. This is the essence of the Liskov Substitution Principle from Part 3 — if you can swap implementations, you need tests that prove the swap is safe.

typescript

// The interface
interface IPaymentProcessor:
  charge(amount: Money, card: CardDetails): Result<Charge, PaymentError>
  refund(chargeId: string): Result<Refund, PaymentError>

// Functional test - tests the contract, not a specific implementation
describe('IPaymentProcessor contract', () => {
  function testContract(createProcessor: () => IPaymentProcessor):
    it('should charge successfully with valid card', () => {
      const subject = createProcessor();
      const result = subject.charge(Money.usd(10.00), validCard);
      expect(result.isOk()).toBe(true);
    });

    it('should reject invalid card', () => {
      const subject = createProcessor();
      const result = subject.charge(Money.usd(10.00), invalidCard);
      expect(result.isErr()).toBe(true);
    });

  // Run the same contract tests against every implementation
  describe('StripePaymentProcessor', () => {
    testContract(() => new StripePaymentProcessor(mockStripeClient));
  });

  describe('PaypalPaymentProcessor', () => {
    testContract(() => new PaypalPaymentProcessor(mockPaypalClient));
  });
});

The beauty of this pattern: when you add a new implementation of IPaymentProcessor, you just add one more describe block pointing at the existing contract tests. The contract is defined once and verified everywhere.

Unit vs Functional: Same Folder, Different Purpose

In practice, unit tests and functional tests often live side by side. But they test different things:

Aspect	Unit Test	Functional Test
Knows about	Internal dependencies	Interface only
Mocks	All collaborators	All collaborators
Validates	Implementation correctness	Contract correctness
Fails when	Code bug	Interface violation

Integration Tests

Integration tests verify that modules work together correctly. Unit tests prove individual pieces work; integration tests prove the wiring between them works.

Why Not Just Unit Tests?

You can have a perfectly unit-tested PostService and a perfectly unit-tested PostgresPostRepository, and the system still breaks because the mapper between them has a subtle bug — maybe it swaps created_at and updated_at, or serializes tags as a comma-separated string instead of JSON.

Integration tests catch these boundary errors.

Example: Repository + Database

typescript

describe('PostgresPostRepository', () => {
  let db: TestDatabase;
  let subject: PostgresPostRepository;

  beforeAll(async () => {
    db = await TestDatabase.create(); // Testcontainers spins up a real Postgres
    subject = new PostgresPostRepository(db.connection);
  });

  afterAll(async () => {
    await db.cleanup();
  });

  it('should persist and retrieve', async () => {
    const created = await subject.create(validRecord);
    const retrieved = await subject.get(created.value.principal.id);
    retrieved.value.should.eql(created.value);
  });
});

Integration tests test module by module, not the whole system at once. A repository integration test uses a real database (via Testcontainers) but still mocks external services like payment processors.

SIT (System Integration Testing)

SIT tests the entire system from a client's perspective. This is fully black-box testing: the test has no access to the code, no coverage metrics, only the external API. It is, essentially, what your users will experience — minus the UI.

Why SIT?

Integration tests verify pairs of modules. SIT verifies that the entire assembled system works when deployed. This catches things that no lower-level test can:

Configuration errors ("the environment variable was named differently in production")
Wiring mistakes ("the DI container registered the wrong implementation")
Environment-specific issues ("this query works locally but times out against the production database")
Performance problems under load

Example with k6

javascript

// k6 script - tests from outside the system
import http from 'k6/http';
import { check } from 'k6';

export default function () {
  // Create order
  const createRes = http.post(
    'https://api.example.com/orders',
    JSON.stringify({
      items: [{ productId: 'widget-1', quantity: 2 }],
    }),
    {
      headers: { 'Content-Type': 'application/json' },
    },
  );

  check(createRes, {
    'create returns 201': r => r.status === 201,
    'create returns order id': r => r.json('id') !== undefined,
  });

  const orderId = createRes.json('id');

  // Get order
  const getRes = http.get('https://api.example.com/orders/' + orderId);

  check(getRes, {
    'get returns 200': r => r.status === 200,
    'get returns correct order': r => r.json('id') === orderId,
  });
}

No Coverage Metrics

SIT is black-box. You cannot measure code coverage, and you should not try. Instead, you measure:

Response times (are they within SLA?)
Error rates (are they below threshold?)
Throughput (can the system handle expected load?)
User journey completion (do the critical flows work end-to-end?)

E2E (End-to-End Testing)

E2E tests verify the entire user experience, including the frontend. These are the most expensive tests to write and maintain, and they should be treated accordingly.

When You Need E2E

If you are building a backend API with no frontend, you do not need E2E tests. SIT covers you. E2E is specifically for verifying that:

The frontend renders correctly
User interactions work (clicks, form submissions, navigation)
Frontend and backend integrate properly (API calls, error handling, loading states)

Example with Playwright

typescript

test('user can create order', async ({ page }) => {
  await page.goto('/orders/new');

  await page.fill('[data-testid="product-search"]', 'widget');
  await page.click('[data-testid="product-widget-1"]');
  await page.click('[data-testid="add-to-order"]');

  await page.click('[data-testid="submit-order"]');

  await expect(page.locator('[data-testid="order-success"]')).toBeVisible();
  await expect(page.locator('[data-testid="order-id"]')).toHaveText(/ORD-/);
});

Keep E2E Tests Minimal

E2E tests are brittle. A CSS class changes, a loading spinner takes 50ms longer, a third-party script loads differently — and the test breaks. The maintenance cost is high.

Keep E2E to:

The critical happy path (can a user complete the core workflow?)
The most common user journey (what does 80% of traffic do?)
Leave edge cases to unit and functional tests where they are cheap

The Complete Pyramid

Level	Type	Visibility	Speed	What It Catches	Tools
Unit	White-box	Internal	Milliseconds	Logic bugs, edge cases	Mocha, xUnit, pytest
Functional	Black-box	Interface	Milliseconds	Contract violations, LSP breaks	Same as unit
Integration	Black-box	Module pair	Seconds	Wiring bugs, mapper errors	Testcontainers
SIT	Black-box	Full system	Seconds-minutes	Config errors, perf issues	k6, Gatling, Postman
E2E	Black-box	Full stack + UI	Minutes	UI bugs, frontend integration	Playwright, Cypress

Testability as Design Feedback

This is perhaps the most important section in this entire article. When a test is hard to write, do not blame the testing framework. The difficulty is telling you something about the design.

Hard to test because...	The design problem is...
Too many collaborators to set up	Class has too many responsibilities (SRP, Part 3)
Hidden state needs to be initialized	Implicit dependencies (dependency model, Part 2)
Cannot swap an implementation	Hard dependencies, missing interfaces (Part 2)
Side effects everywhere	Impure functions (functional thinking, Part 4)
Cannot control behavior of a dependency	Missing injection point (Part 2)

The fix is never a better mocking library. The fix is better architecture. If you find yourself reaching for complex test utilities — mocking static methods, patching globals, using reflection to access private fields — stop. You are fighting the design instead of fixing it.

This is why testability is the proof that the design works. If you can write a test in five lines with a trivial Arrange section, the class is well-designed. If the Arrange section is thirty lines of setup, the class needs refactoring.

Quick Checklist

Unit Tests:

Concern	Check
Structure	AAA pattern (Arrange, Act, Assert)
Naming	subject, input, expected, actual
Coverage	Multiple test cases per behavior (triangulation)
Determinism	No random values, no real time, no real IO
Speed	Milliseconds per test
Goal	100% code coverage

Functional Tests:

Concern	Check
Target	Tests against interfaces, not implementations
Reuse	Same contract tests run against all implementations
Purpose	Verifies LSP (Liskov Substitution Principle)

Integration Tests:

Concern	Check
Scope	Module pairs, not full system
Infrastructure	Real databases/queues via Testcontainers
Boundaries	Still mocks external services

SIT:

Concern	Check
Perspective	Full system from client's eye
Access	Black-box only, no code access
Metrics	Behavior metrics (latency, error rate, throughput), not coverage

E2E:

Concern	Check
When needed	Only if there is a frontend
Scope	Critical happy paths only
Maintenance	Minimal test count, high value per test

Design Feedback:

Concern	Check
Hard tests	Refactor the code, not the test

The Complete Circle

We started in Part 1 with a claim: the most important property of software is how easy it is to change.

Part 1 identified the enemy: unmanaged dependencies, invisible coupling, action at a distance.

Part 2 gave us the dependency model: visible and flexible dependencies through injection and interfaces.

Part 3 gave us structural rules: SOLID principles for organizing code into cohesive, decoupled units.

Part 4 gave us behavioral patterns: functional thinking for writing predictable, composable code.

Part 5 showed how to model the domain: Records, Principals, Aggregate Roots — the language of the business, pure from infrastructure.

Part 6 arranged everything into layers: pure domain in the center, controllers inward, repositories outward, mappers at every boundary.

Part 7 showed how to wire it all together: the composition root as the big bang, stateless services forming an immutable tree.

And this part revealed the payoff. Tests give you the confidence to make changes. Without tests, changeability is theoretical — you think you can change the code, but you are not sure what will break. With tests, changeability is real. You change the code, run the suite, and know in seconds whether something broke.

Every principle in this series exists so that, at the end, you can write a test in five lines, run it in two seconds, and know that your code works. That is not just good engineering — that is the freedom to move fast without breaking things.

That is the AtomiCloud way.

← Prev: Wiring It Together

Why Tests Matter

The Test Pyramid

Unit Tests (White-Box)

The AAA Pattern

Standard Variable Names

Triangulation: Why One Test Case Is Not Enough

Deterministic and Fast

Functional Tests (Black-Box)

Why Functional Tests Matter

Unit vs Functional: Same Folder, Different Purpose

Integration Tests

Why Not Just Unit Tests?

Example: Repository + Database

SIT (System Integration Testing)

Why SIT?

Example with k6

No Coverage Metrics

E2E (End-to-End Testing)

When You Need E2E

Example with Playwright

Keep E2E Tests Minimal

The Complete Pyramid

Testability as Design Feedback

Quick Checklist

The Complete Circle

Enjoyed this? Share it with your friends!