Engineering Notes

Real build logs, performance tuning, and technical observations

How I reduced LLM Token Usage by 37% using Semantic Caching

February 15, 2026

Using Redis and sentence embeddings to catch similar LLM prompt queries before hitting the OpenAI API, drastically improving response times from 1200ms down to 180ms.

Cost EngineeringRedisOpenAI

Setting up a Local AI Infrastructure Stack using Docker

February 02, 2026

Why I chose to run Postgres, Redis, and Prometheus locally via Compose before spinning up full cloud resources to save costs and simulate production.

DockerDevOps