Custom automation layer over GitHub Actions

On this page

Skip to content

    Overview

    GitHub Actions is great for simple pipelines. It falls apart when you have 20+ microservices with interdependent build orders, shared artifact caches that evict under load, and no way to roll back a deployment that broke three services at once.

    This project replaces the standard Actions workflow with a custom orchestration layer that understands the dependency graph between services, manages artifact caches with versioned keys, and supports atomic rollbacks with a single command.

    The Problem

    The original pipeline had these issues:

    1. Blind parallelism — All services built in parallel regardless of dependencies. If service A depends on service B, both built simultaneously, and A’s tests failed because B hadn’t been updated yet.
    2. Cache thrashing — Shared caches evicted under load, causing rebuilds of unchanged services. No cache versioning strategy.
    3. No rollback — Deployments were fire-and-forget. A bad release meant manually reverting commits and re-running pipelines.
    4. Opaque failures — When a pipeline failed, it was unclear whether the failure was in the build, test, deploy, or integration stage.

    The Solution

    The orchestration layer is a Go binary that runs as a GitHub Actions workflow step. It:

    • Reads a services.yaml manifest that declares build order, dependencies, and deployment targets
    • Constructs a DAG and schedules builds in dependency order
    • Uses content-addressed cache keys based on file hashes, so only changed services rebuild
    • Promotes artifacts through an environment pipeline (dev → staging → prod) with manual gates
    • Stores deployment history with rollback support

    Results

    After three months of use:

    • Build time reduced by 40% (fewer redundant builds)
    • Deployment failures dropped 60% (dependency ordering catches issues earlier)
    • Rollback time from 15 minutes (manual) to 30 seconds (single command)
    • Engineering team spent 80% less time debugging pipeline failures

    Lessons Learned

    • Start with the dependency graph. Everything else flows from understanding what depends on what.
    • Cache keys should be content-addressed, not branch-named. Branch names change; file hashes don’t.
    • Manual gates for production deployments are not optional. Automate everything except the final click.