Position Overview
We're working with a high-growth AI infrastructure company building foundational systems powering next-generation AI products and intelligent search infrastructure. The team is building a search engine designed for AI agents - operating large-scale distributed systems that crawl the web, train state-of-the-art embedding models, and power high-performance vector search infrastructure. On the compute side, they operate a rapidly growing multi-million-dollar H200 GPU cluster alongside large-scale distributed batch processing systems running across tens of thousands of machines. This is a deeply technical infrastructure role focused on building the internal platform and tooling that enables the entire engineering organization to move fast at scale. What You'll Work On Build and scale Kubernetes orchestration for large GPU clusters Design distributed infrastructure powering large-scale AI workloads Scale cloud batch job systems handling map-reduce workloads across tens of thousands of machi...