Custom automation layer over GitHub Actions —
On this page
Skip to contentOverview
GitHub Actions is great for simple pipelines. It falls apart when you have 20+ microservices with interdependent build orders, shared artifact caches that evict under load, and no way to roll back a deployment that broke three services at once.
This project replaces the standard Actions workflow with a custom orchestration layer that understands the dependency graph between services, manages artifact caches with versioned keys, and supports atomic rollbacks with a single command.
The Problem
The original pipeline had these issues:
- Blind parallelism — All services built in parallel regardless of dependencies. If service A depends on service B, both built simultaneously, and A’s tests failed because B hadn’t been updated yet.
- Cache thrashing — Shared caches evicted under load, causing rebuilds of unchanged services. No cache versioning strategy.
- No rollback — Deployments were fire-and-forget. A bad release meant manually reverting commits and re-running pipelines.
- Opaque failures — When a pipeline failed, it was unclear whether the failure was in the build, test, deploy, or integration stage.
The Solution
The orchestration layer is a Go binary that runs as a GitHub Actions workflow step. It:
- Reads a
services.yamlmanifest that declares build order, dependencies, and deployment targets - Constructs a DAG and schedules builds in dependency order
- Uses content-addressed cache keys based on file hashes, so only changed services rebuild
- Promotes artifacts through an environment pipeline (dev → staging → prod) with manual gates
- Stores deployment history with rollback support
Results
After three months of use:
- Build time reduced by 40% (fewer redundant builds)
- Deployment failures dropped 60% (dependency ordering catches issues earlier)
- Rollback time from 15 minutes (manual) to 30 seconds (single command)
- Engineering team spent 80% less time debugging pipeline failures
Lessons Learned
- Start with the dependency graph. Everything else flows from understanding what depends on what.
- Cache keys should be content-addressed, not branch-named. Branch names change; file hashes don’t.
- Manual gates for production deployments are not optional. Automate everything except the final click.