Schema-Last: Why the best API documentation starts with reality

The API industry has a doctrine: design first, build second. Sketch your endpoints in OpenAPI before writing a line of code. Get stakeholder buy-in on the contract. Then implement to spec.

It's elegant in theory. In practice, it's a fantasy that most engineering teams abandoned somewhere around their third sprint.

The Design-First Lie

Design-first assumes a world that doesn't exist for most companies. It assumes:

You're building APIs from scratch
You have time to design before deadlines hit
Your implementation will faithfully follow the spec
Someone will update the spec when reality diverges, and
The spec was right to begin with.

Nobody updates the spec. The spec becomes a historical artefact, a snapshot of good intentions from six-eight months ago, before the authentication flow changed, before you added those query parameters the mobile team needed, before the schema grew to include new customer requirements.

Design-first works beautifully for greenfield projects with disciplined teams and generous timelines. For everyone else, (which is almost everyone) it produces documentation that lies.

The Real World Has APIs Already

Here's what most companies actually have: dozens or hundreds of APIs built over years by different teams, with varying levels of documentation, serving traffic right now. Some have specs. Most specs are wrong. Many APIs have no documentation at all beyond tribal knowledge and Slack threads.

These aren't bad companies. They're normal companies. They shipped product. They hit deadlines. They made pragmatic trade-offs. Documentation lost to features, every time.

The design-first crowd would tell these companies to stop everything and document what they have. Audit every endpoint. Manually author specs for years of accumulated APIs. Then—and only then—can they have a proper API catalog.

This advice doesn't always work in practice. It won't happen. It's too expensive, too slow, and by the time you finish, the APIs will have changed again.

Schema-Last: Start With What's True

There's another way. Instead of documenting what APIs should be, document what they actually are. Observe the traffic. Capture the requests and responses. Infer the schema from reality.

This is schema-last, and it inverts the traditional approach entirely.

Schema-last doesn't ask engineers to write documentation. It watches what their APIs do and generates documentation from behaviour. The schema isn't an aspiration, it's a fact. It can't drift from implementation because it is the implementation, captured and structured.

This approach accepts a fundamental truth: your APIs are already telling you their schema. Every request teaches you about parameters. Every response reveals the shape of your data.

Why Reality Beats Intention

Schemas derived from traffic have properties that hand-written specs can't match.

They're accurate by definition. A traffic-derived schema describes what your API actually does, not what someone hoped it would do. If the schema says a field exists, traffic proved it.

They're complete. Traffic captures edge cases, optional parameters, and response variations that spec authors forget or never knew about. Real usage surfaces more than any documentation sprint.

They stay current. When your API changes, the traffic changes, and the schema updates. No manual intervention. No drift.

They're honest. If your API returns inconsistent response shapes, a traffic-derived schema exposes that. It won't paper over the mess—it reveals it, which is the first step to fixing it.

Design-first advocates will counter that traffic-derived schemas only tell you what is, not what should be. Fair point. But for most companies, knowing what is would be a revelation. They might know their service names, but not the sprawl of endpoints, the undocumented query parameters, or the tribal knowledge buried in Slack threads.

The Documentation Graveyard Problem

Every company has a documentation graveyard. Confluence pages from 2019. A Swagger file that hasn't been touched in eighteen months. README files that reference endpoints which no longer exist.

This graveyard exists because documentation is a maintenance burden. Every change requires two updates: one to the code, one to the docs. Under pressure, the docs lose. Always.

Schema-last eliminates this maintenance burden. Documentation becomes a byproduct of operation, not a separate deliverable. Your API serves traffic, and documentation emerges. No discipline required. No second step to forget.

This isn't laziness - it's acknowledging how software actually gets built. Engineering teams will always prioritise shipping over documenting. Schema-last works with that reality instead of against it.

Design-First Still Has Its Place

To be clear: design-first isn't wrong. For new APIs, especially public ones, designing the contract before implementation produces better, more consistent interfaces. The discipline is valuable.

But design-first and schema-last aren't mutually exclusive. You can design your new APIs carefully while capturing schemas from your existing ones. You can start with a spec and then validate it against traffic. The approaches complement each other.

The mistake is treating design-first as the only legitimate path and dismissing traffic-derived schemas as somehow lesser because they emerged from observation rather than self authoring. A schema that's true is more valuable than a schema that's beautiful but wrong, let alone not implemented.

From Schema to Intelligence

Here's where schema-last gets interesting. Once you have accurate, current schemas for your APIs—derived from what they actually do, you have the foundation for something bigger.

You can detect breaking changes automatically, because you know what the schema was yesterday. You can identify unused endpoints by comparing the schema to actual traffic patterns. You can generate SDKs, mock servers, and test suites from schemas you trust.

Most importantly, you can make your APIs legible to AI systems. Agents need self-describing, deterministic APIs to function. They can't work with documentation that lies. Traffic-derived schemas give them the truth they require. I wrote more about that here.

The schema becomes infrastructure. Not a static document, but a living layer that powers automation, enables intelligence, and evolves with your system.

Start With Reality

If your API documentation is a mess (and it probably is) stop feeling guilty about it. You're not going to fix it by assigning engineers to write specs. That approach has failed for a decade.

Instead, accept that your APIs are already documented in the only place that matters: production traffic. The requests are there. The responses are there. The schema is there, implicit in every interaction.

Your job isn't to write documentation. Your job is to capture it.

Ready to automate your API schemas? Get started for free.