All
AI Tools Coding Agents Comparisons Deep Dives Guides Industry Industry Analysis Infrastructure News Tutorials
- Infrastructure
Building an AI Platform Team: Roles, Tools, and Rituals
Balys Kriksciunas • - Infrastructure
GPU FinOps: Reducing Your $10M AI Compute Bill
Balys Kriksciunas • - Infrastructure
Disaggregated Inference: Prefill, Decode, and the New Serving Topology
Balys Kriksciunas • - Infrastructure
Multi-Agent Orchestration Infrastructure: Lessons from Production
Balys Kriksciunas • - Infrastructure
Context Engineering: Storage, Retrieval, and the New Memory Stack
Balys Kriksciunas • - Infrastructure
Agent Infrastructure: What's Different from LLM Serving
Balys Kriksciunas • - Infrastructure
Inference at the Edge: Running LLMs on Consumer GPUs
Balys Kriksciunas • - Infrastructure
Running Sovereign AI: EU and India Infrastructure Playbooks
Balys Kriksciunas • - Infrastructure
MI300X vs H100: AMD's Bet on Inference
Balys Kriksciunas • - Infrastructure
The AI Infrastructure Stack: 2026 Edition
Balys Kriksciunas • - Infrastructure
NVIDIA B200 vs H100: Should You Upgrade?
Balys Kriksciunas • - Infrastructure
Model Evals in Production: Regression Testing Prompts
Balys Kriksciunas • - Infrastructure
LoRA, QLoRA, and PEFT: The Fine-Tuning Infrastructure Guide
Balys Kriksciunas • - Infrastructure
Securing RAG Pipelines: Prompt Injection via Data
Balys Kriksciunas • - Infrastructure
Hybrid Search in Production: BM25 + Dense Retrieval
Balys Kriksciunas • - Infrastructure
Ray Serve vs Kubernetes for Model Serving
Balys Kriksciunas • - Infrastructure
AI FinOps: Tracking Token Spend Across Your Org
Balys Kriksciunas • - Infrastructure
KV Cache Optimization Techniques for LLM Serving
Balys Kriksciunas • - Infrastructure
Speculative Decoding for Production LLMs
Balys Kriksciunas • - Infrastructure
LLM Gateway Patterns: LiteLLM, Portkey, and Kong AI
Balys Kriksciunas • - Infrastructure
FP8 and Quantization: Serving LLMs at Half the Cost
Balys Kriksciunas • - Infrastructure
pgvector at Scale: When Postgres Is Enough
Balys Kriksciunas • - Infrastructure
vLLM vs TGI vs Triton: LLM Inference Server Benchmarks
Balys Kriksciunas • - Infrastructure
Multi-Cloud GPU Strategy: Avoiding Lock-in and Saving 40%
Balys Kriksciunas • - Infrastructure
The State of AI Infrastructure 2025
Balys Kriksciunas • - Infrastructure
Self-Hosting Llama 3: A Production Deployment Guide
Balys Kriksciunas • - Infrastructure
Tracing LLM Applications with OpenTelemetry
Balys Kriksciunas • - Infrastructure
GPU Clouds Compared: CoreWeave, Lambda, Runpod, Fly and the Neoclouds
Balys Kriksciunas • - Infrastructure
PagedAttention Explained: How vLLM Achieves 24x Throughput
Balys Kriksciunas • - Infrastructure
Continuous Batching for LLMs: Why It Matters
Balys Kriksciunas • - Infrastructure
Kubernetes for GPU Workloads: A Primer
Balys Kriksciunas • - Infrastructure
Choosing a Vector Database in 2024: A Practical Guide
Balys Kriksciunas • - Infrastructure
vLLM: The Open-Source Inference Engine Changing LLM Serving
Balys Kriksciunas • - Infrastructure
NVIDIA H100 vs A100: Which GPU Should You Deploy?
Balys Kriksciunas • - Infrastructure
The AI Infrastructure Stack Explained (2024)
Balys Kriksciunas •