Building a Schema Registry: Patterns and Best Practices

A Protocol Buffers schema registry is more than a directory of .proto files. It is the central nervous system of a service-oriented architecture: the single source of truth for every data structure, every API contract, and every event definition that flows between services. Building one is straightforward. Operating one at scale, with dozens of contributing teams and hundreds of schemas, requires deliberate architectural choices and disciplined operational practices.

The Nebula schema registry has evolved through several iterations as the Klivvr engineering organization has grown. This article distills the patterns and lessons from that evolution into a guide for teams building or improving their own schema registry.

Why a Centralized Registry

The alternative to a centralized registry is distributed schema ownership: each service maintains its own proto files, publishes them through its own mechanism, and consumers retrieve them through ad-hoc means (copying files, git submodules, or language-specific packages).

This works for two or three services. Beyond that, it breaks down in predictable ways.

Discovery becomes difficult. A new team member cannot answer the question "what APIs exist?" without surveying every service repository. There is no single place to search for a message type or understand the relationships between schemas.

Consistency degrades. Without shared linting rules, each team develops its own naming conventions. Field naming, enum structure, pagination patterns, and error formats diverge, making cross-service integration increasingly painful.

Breaking change detection becomes impossible. If schemas are scattered across repositories, there is no automated way to verify that a change in one repository is compatible with the consumers in other repositories. Breaking changes are discovered in staging or production, not in code review.

A centralized registry solves all three problems. It provides a single location for discovery, a single set of enforced conventions, and a single CI pipeline that checks every change for compatibility.

Repository Architecture

The Nebula registry is structured as a monorepo with a clear directory hierarchy:

nebula-schemas/
  proto/
    nebula/
      common/
        v1/
          money.proto
          pagination.proto
          error_details.proto
          timestamps.proto
      accounts/
        v1/
          account.proto
          account_service.proto
        v2/
          account.proto
          account_service.proto
      payments/
        v1/
          payment.proto
          payment_service.proto
          payment_events.proto
      lending/
        v1/
          application.proto
          decision.proto
          lending_service.proto
      notifications/
        v1/
          notification.proto
          notification_service.proto
          notification_events.proto
  gen/
    go/
    ts/
    java/
    swift/
  scripts/
    publish-go.sh
    publish-ts.sh
    publish-java.sh
  buf.yaml
  buf.gen.yaml
  buf.lock
  CODEOWNERS

Several architectural decisions deserve explanation.

Separation of service definitions and event definitions. Within each domain directory, service-related protos (*_service.proto) are separate from event-related protos (*_events.proto). This separation makes it clear which schemas define synchronous RPC contracts and which define asynchronous event contracts. Different consumers care about different subsets: a gRPC client imports the service proto, while a Kafka consumer imports the events proto.

Version directories at the domain level. Versions are scoped to domains, not to individual messages. The accounts/v1/ directory contains all v1 schemas for the accounts domain. When a v2 is needed, the entire domain gets a new directory. This prevents the confusion of mixing v1 messages with v2 services within the same package.

Generated code lives alongside source protos. The gen/ directory contains the output of buf generate. Committing generated code enables meaningful code review and ensures reproducible builds.

Configuration at the root. buf.yaml and buf.gen.yaml live at the repository root. A single configuration governs the entire registry, ensuring uniform linting rules, breaking change policies, and code generation settings.

The Quality Gate Pipeline

Every change to the registry passes through a multi-stage CI pipeline:

name: Schema Registry CI
 
on:
  pull_request:
    paths: ['proto/**', 'buf.yaml', 'buf.gen.yaml']
  push:
    branches: [main]
    paths: ['proto/**']
 
jobs:
  format:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
        with:
          version: '1.47.2'
      - run: buf format --diff --exit-code proto/
 
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
        with:
          version: '1.47.2'
      - run: buf lint proto/
 
  breaking:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
        with:
          version: '1.47.2'
      - run: buf breaking proto/ --against '.git#branch=main'
 
  generate:
    needs: [format, lint, breaking]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
        with:
          version: '1.47.2'
      - run: buf generate
      - name: Verify generated code matches
        run: |
          if ! git diff --quiet gen/; then
            echo "ERROR: Generated code is out of date."
            echo "Run 'buf generate' locally and commit the results."
            git diff --stat gen/
            exit 1
          fi
 
  publish:
    if: github.ref == 'refs/heads/main'
    needs: [generate]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
        with:
          version: '1.47.2'
      - run: buf generate
      - run: ./scripts/publish-go.sh
      - run: ./scripts/publish-ts.sh
      - run: ./scripts/publish-java.sh
      - run: buf push proto/ --tag $(git rev-parse --short HEAD)

The pipeline enforces four quality gates in sequence: formatting consistency, lint compliance, backward compatibility, and code generation correctness. Only after all four pass does the publish stage run.

The buf push command at the end publishes the schemas to the Buf Schema Registry, making them available as a versioned module that external tools and consumers can reference.

Governance Model

Technical tooling enforces rules; governance determines which rules to enforce and how to handle exceptions.

The Nebula registry uses a tiered ownership model:

Domain owners are the teams responsible for schemas in their domain directory. They have merge authority for changes that pass all automated checks. They are responsible for designing their schemas, coordinating with consumers, and managing their domain's version lifecycle.

Schema architects are a cross-cutting group of senior engineers who own the common packages, the Buf configuration, and the code generation pipeline. They review changes to shared infrastructure and provide guidance on schema design patterns. They are also the escalation path for breaking change exceptions.

Registry administrators manage the CI pipeline, the publishing scripts, and the access control configuration. They do not review schema content but ensure that the operational infrastructure is reliable.

This separation of concerns prevents bottlenecks. Domain teams can move quickly within their boundaries. Cross-cutting changes receive deeper scrutiny. Operational concerns are handled by specialists.

Handling Breaking Change Exceptions

Despite best efforts, intentional breaking changes sometimes occur. The registry governance process handles these through a structured exception flow:

The contributing team opens a PR with the breaking change and a detailed justification.
Buf's breaking change detection flags the change. The CI pipeline fails.
A schema architect reviews the justification and the impact analysis.
If approved, the architect adds a buf:breaking:ignore annotation or temporarily adjusts the comparison baseline.
The PR is merged with an explicit record of the exception in the commit message.
A migration tracker is created to monitor consumer adoption of the breaking change.

// buf:lint:ignore FIELD_NO_DELETE
// Exception approved by @schema-architect on 2025-03-15.
// Justification: Field 4 contained PII that must be removed per
// compliance requirement CR-2025-042.
// Migration tracker: JIRA-4521
message LegacyCustomerRecord {
  reserved 4;
  reserved "national_id";
 
  string customer_id = 1;
  string display_name = 2;
  string email = 3;
}

The key principle is that exceptions are explicit, justified, tracked, and rare. If exceptions become frequent, it signals a problem with either the schema design practices or the breaking change rules.

Schema Documentation and Discovery

A registry is only valuable if developers can find what they need. The Nebula registry invests in several discovery mechanisms.

The Buf Schema Registry provides a browsable web interface with search, type cross-referencing, and documentation generated from proto comments. Every message, field, enum, service, and RPC that has a doc comment is rendered in the BSR's documentation view.

A custom internal portal aggregates additional metadata: which services produce and consume each schema, the current deployment status of each version, and links to the relevant runbooks and dashboards. This portal is built on top of the BSR's API and the organization's service catalog.

// AccountService manages the lifecycle of customer accounts.
//
// Producing services: account-service (Go)
// Consuming services: payment-service (Go), loan-service (Go),
//   mobile-bff (TypeScript), analytics-worker (Go)
//
// SLA: 99.95% availability, p99 latency < 50ms
// On-call: #accounts-oncall
service AccountService {
  // ...
}

Embedding this operational metadata in the proto comments ensures that it is visible everywhere the schema is viewed: in the BSR, in generated documentation, and in IDEs.

Scaling Patterns

As a schema registry grows, several scaling patterns become important.

Module splitting. Buf supports multi-module workspaces. When the monorepo grows large enough that full linting and breaking change detection become slow, the registry can be split into modules that are checked independently but coexist in the same repository.

# buf.yaml with multiple modules
version: v2
modules:
  - path: proto/nebula/common
    name: buf.build/klivvr/nebula-common
  - path: proto/nebula/accounts
    name: buf.build/klivvr/nebula-accounts
  - path: proto/nebula/payments
    name: buf.build/klivvr/nebula-payments

Selective generation. Instead of generating code for all languages on every change, the pipeline can detect which domain directories changed and generate only the affected packages. This reduces CI time and avoids unnecessary package version bumps.

Caching. Buf caches module resolution and plugin downloads. In CI, persisting the Buf cache across runs significantly reduces pipeline duration for large registries.

Dependency graph visualization. As the number of inter-package imports grows, a visual dependency graph helps identify unhealthy coupling. The Nebula team generates this graph weekly from the proto import statements and reviews it for unexpected dependencies.

# Generate a dependency graph from proto imports
buf dep graph proto/ | dot -Tsvg -o schema-dependencies.svg

Operational Health Metrics

The Nebula team tracks several metrics to monitor the health of the schema registry:

Schema count and growth rate. The total number of messages, enums, services, and RPCs in the registry, tracked over time. Sudden spikes may indicate duplicated schemas; plateaus may indicate that teams are routing around the registry.

Lint suppression count. The number of buf:lint:ignore annotations. A rising count suggests either overly strict rules or degrading compliance.

Breaking change exception rate. The number of approved breaking change exceptions per quarter. This measures how often the registry's compatibility guarantees are overridden.

Time to first consumption. The elapsed time between a schema being published and the first consumer importing the generated package. Long delays suggest friction in the generation or distribution pipeline.

Stale schema ratio. The percentage of schemas that have not been modified in over a year. Some stability is healthy, but a very high ratio may indicate abandoned schemas that should be archived.

These metrics are reviewed monthly and inform decisions about tooling investment, governance adjustments, and team training.

Conclusion

A schema registry is infrastructure that compounds in value over time. Each schema added makes the registry more useful for discovery. Each automated check prevents a potential production incident. Each generated package saves a team from writing and maintaining hand-crafted client code. Building a registry requires upfront investment in repository structure, CI pipelines, governance processes, and documentation. Operating it requires ongoing attention to quality metrics, scaling patterns, and organizational alignment. The Nebula schema registry demonstrates that this investment is worthwhile: it serves as the authoritative source of inter-service contracts for the entire Klivvr platform, enabling teams to build, integrate, and evolve their services with confidence.