Local LLM inference pipeline on M-series Macs
Architecting a local inference stack that chains quantization, vRAM management, and prompt routing across a single MacBook Pro.
Deep-dives into AI infrastructure, local model workflows, custom automation, retrieval layers, and the interfaces between tools. Each note documents a system that was built, tested, and shipped—or learned from in the attempt.
Architecting a local inference stack that chains quantization, vRAM management, and prompt routing across a single MacBook Pro.
A structured evaluation framework that measures retrieval precision, answer faithfulness, and latency across document chunks.
Replacing brittle CI pipelines with a custom orchestration layer that handles dependency graphs, artifact caching, and rollback strategies.