Automating Code Generation with Protocol Buffers

A practical guide to automating protobuf code generation across multiple languages, covering toolchain setup, Buf-based workflows, CI integration, and strategies for managing generated code in a monorepo or multi-repo environment.

technical8 min readBy Klivvr Engineering
Share:

One of Protocol Buffers' most powerful features is code generation. A single .proto file can produce type-safe client and server stubs in Go, TypeScript, Java, Kotlin, Swift, and a dozen other languages. But moving from a handful of hand-run protoc commands to a reliable, automated pipeline that serves an entire engineering organization is a non-trivial undertaking. The toolchain has sharp edges, the plugin ecosystem is fragmented, and the generated output must be versioned and distributed in a way that every consuming team can adopt without friction.

The Nebula schema registry addresses this by treating code generation as a first-class CI artifact. Every merged change to a .proto file triggers a pipeline that produces, tests, and publishes language-specific packages. This article walks through the architecture of that pipeline, the tooling decisions behind it, and the lessons learned from operating it at scale.

The Traditional protoc Approach

The official Protocol Buffers compiler, protoc, is the foundation of all code generation. It parses .proto files, resolves imports, and delegates output to language-specific plugins.

protoc \
  --proto_path=proto \
  --go_out=gen/go \
  --go_opt=paths=source_relative \
  --go-grpc_out=gen/go \
  --go-grpc_opt=paths=source_relative \
  proto/nebula/payments/v1/payments.proto

This single invocation produces Go structs and gRPC client/server interfaces. Each --*_out flag invokes a plugin binary (protoc-gen-go, protoc-gen-go-grpc) that must be installed on the developer's machine and match a compatible version.

The challenges multiply quickly. Each language requires its own plugin, and often multiple plugins per language (for example, Go needs both protoc-gen-go for message types and protoc-gen-go-grpc for service stubs). Version mismatches between protoc and its plugins produce subtle bugs in generated code. Managing --proto_path across nested imports and well-known types is error-prone. In a team of twenty developers, ensuring that everyone has identical toolchain versions is a losing battle.

Adopting Buf for Reproducible Builds

Buf replaces the ad-hoc protoc workflow with a declarative configuration. Two files drive the entire process: buf.yaml defines the module and its dependencies, while buf.gen.yaml specifies the code generation targets.

# buf.yaml
version: v2
modules:
  - path: proto
    name: buf.build/klivvr/nebula
deps:
  - buf.build/googleapis/googleapis
  - buf.build/grpc-ecosystem/grpc-gateway
lint:
  use:
    - STANDARD
    - COMMENTS
breaking:
  use:
    - WIRE_JSON
# buf.gen.yaml
version: v2
plugins:
  - remote: buf.build/protocolbuffers/go
    out: gen/go
    opt: paths=source_relative
  - remote: buf.build/grpc/go
    out: gen/go
    opt: paths=source_relative
  - remote: buf.build/community/ts-proto
    out: gen/ts
    opt:
      - esModuleInterop=true
      - outputServices=grpc-js
  - remote: buf.build/protocolbuffers/java
    out: gen/java

Running buf generate reads both files, resolves dependencies from the Buf Schema Registry (BSR), downloads the correct plugin versions, and produces output in a single deterministic pass. No local plugin installation is needed. Every developer and CI runner gets identical results because the plugin versions are pinned by Buf's remote execution.

This shift from imperative commands to declarative configuration eliminates an entire class of "works on my machine" problems. The Nebula team enforces that all code generation goes through buf generate; direct protoc invocations are flagged in code review.

Multi-Language Generation Pipeline

The Nebula registry serves consumers in four primary languages: Go for backend services, TypeScript for BFF (Backend-for-Frontend) layers and internal tools, Kotlin for Android clients, and Swift for iOS clients. Each language has its own idiomatic expectations for package layout, import paths, and build system integration.

The generation pipeline handles this through a set of post-generation scripts that restructure the raw output into language-appropriate packages:

#!/usr/bin/env bash
set -euo pipefail
 
# Generate all language targets
buf generate
 
# Go: no post-processing needed, paths=source_relative handles layout
 
# TypeScript: bundle into an npm package
cp package.json.tmpl gen/ts/package.json
cd gen/ts && npm install && npm run build
 
# Kotlin: generate a Gradle module
cd gen/java && ./gradlew generateProto
 
# Swift: run swift-protobuf plugin separately (not yet on BSR)
protoc \
  --proto_path=proto \
  --swift_out=gen/swift \
  --grpc-swift_out=gen/swift \
  $(find proto -name '*.proto')

The Go output is used directly because paths=source_relative produces a directory structure that maps naturally to Go module paths. TypeScript output is wrapped in an npm package with a package.json template that includes version, dependencies, and TypeScript compilation settings. The Kotlin/Java output feeds into a Gradle build that produces a Maven-compatible artifact.

Each language target is published to its respective package registry: a private Go module proxy, an npm registry, a Maven repository, and a Swift Package Registry. Consuming teams add the generated package as a normal dependency and receive updates through their standard package manager.

CI Integration and Versioning

The Nebula CI pipeline runs on every push to a pull request and on every merge to the main branch:

# .github/workflows/proto-ci.yml
name: Proto CI
 
on:
  push:
    branches: [main]
    paths: ['proto/**']
  pull_request:
    paths: ['proto/**']
 
jobs:
  lint-and-break:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
      - run: buf lint
      - run: buf breaking --against '.git#branch=main'
 
  generate:
    needs: lint-and-break
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
      - run: buf generate
      - name: Verify no diff
        run: |
          git diff --exit-code gen/ || {
            echo "Generated code is out of date. Run 'buf generate' locally."
            exit 1
          }
 
  publish:
    if: github.ref == 'refs/heads/main'
    needs: generate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bufbuild/buf-setup-action@v1
      - run: buf generate
      - name: Publish Go module
        run: ./scripts/publish-go.sh
      - name: Publish npm package
        run: ./scripts/publish-ts.sh
      - name: Publish Maven artifact
        run: ./scripts/publish-java.sh

On pull requests, the pipeline lints, checks for breaking changes, and verifies that the generated code in the repository matches what buf generate produces. This last step catches situations where a developer modified a .proto file but forgot to regenerate.

On merge to main, the pipeline additionally publishes updated packages. Versioning follows semantic versioning: patch bumps for additive, backward-compatible changes; minor bumps for new services or significant new message types; and major bumps only when a new package version (e.g., v2) is introduced.

Checked-In vs. Generated-on-the-Fly

A persistent debate in the protobuf community is whether generated code should be committed to the repository or generated fresh in CI. The Nebula team commits generated code, for several reasons.

First, it makes code review meaningful. Reviewers can see exactly how a schema change affects the generated interfaces, catch unexpected API surface changes, and verify that the output is sensible.

Second, it improves build reproducibility. A checked-out repository is immediately buildable without running the code generation toolchain. This matters for new developers onboarding, for emergency hotfixes, and for build environments that cannot easily install the Buf toolchain.

Third, it provides a clear audit trail. The git history shows when generated code changed, who approved it, and which .proto change caused it.

The downside is repository size. Generated code can be verbose, especially for Java. The Nebula team mitigates this with .gitattributes rules that mark generated directories as "linguist-generated," hiding them from GitHub's diff statistics and defaulting to collapsed diffs in pull requests.

gen/go/** linguist-generated=true
gen/ts/** linguist-generated=true
gen/java/** linguist-generated=true

Custom Plugins and Templates

Sometimes the standard plugins do not produce exactly the output a team needs. Protobuf's plugin protocol is well-documented and straightforward: a plugin reads a CodeGeneratorRequest from stdin and writes a CodeGeneratorResponse to stdout, both encoded as protobuf messages.

The Nebula team maintains a custom plugin that generates validation code based on field annotations:

import "validate/validate.proto";
 
message CreateAccountRequest {
  string email = 1 [(validate.rules).string.email = true];
  string display_name = 2 [(validate.rules).string = {
    min_len: 1,
    max_len: 100
  }];
  int32 age = 3 [(validate.rules).int32 = {
    gte: 18,
    lte: 150
  }];
}

The plugin reads these annotations and generates a Validate() method on the message struct that returns a descriptive error if any constraint is violated. This moves validation logic from hand-written application code into the schema layer, where it is automatically kept in sync across all languages.

Practical Tips

Pin your Buf CLI version in CI. Use the version input on the buf-setup-action to avoid surprises when a new Buf release changes behavior.

Run buf generate in a clean output directory. Delete gen/ before generating to ensure that removed proto files do not leave stale generated code behind.

Test the generated code. Include a small set of integration tests that import the generated packages, serialize and deserialize sample messages, and verify round-trip correctness. This catches subtle plugin bugs that linting alone cannot detect.

Document the generation workflow. A CONTRIBUTING.md section that explains how to run buf generate locally, how to add a new language target, and how to debug plugin errors saves hours of onboarding time.

Conclusion

Automated code generation transforms Protocol Buffers from a schema language into a full contract enforcement system. By adopting Buf's declarative configuration, integrating generation into CI, committing the output for review, and publishing language-specific packages, the Nebula registry ensures that every service in the architecture works against the same, verified contract. The upfront investment in tooling pays for itself many times over in reduced integration bugs, faster onboarding, and confidence that a schema change will not break production.

Related Articles

technical

Building a Schema Registry: Patterns and Best Practices

A comprehensive guide to building and operating a Protocol Buffers schema registry, covering architecture patterns, governance models, tooling integration, and the operational practices that keep a registry healthy as it scales.

9 min read
business

Using Protocol Buffers Across a Microservices Architecture

A business and architecture-focused guide to adopting Protocol Buffers as the standard contract language across a microservices ecosystem, covering shared types, dependency management, team workflows, and the role of a centralized schema registry.

10 min read
business

API Versioning Strategies with Protocol Buffers

A business-oriented guide to API versioning with Protocol Buffers, covering when and how to version, migration strategies, multi-version support, and the organizational processes that make versioning sustainable.

9 min read