Position Overview
Description We are looking for an experienced Site Reliability Engineer to strengthen the reliability, scalability, and operational maturity of our platform in San Francisco, California. This role will focus on improving service health, refining observability, and partnering with engineering teams to build systems that perform consistently under real-world demand. The ideal candidate brings deep production experience, a strong automation mindset, and a practical approach to incident response and continuous improvement.
Responsibilities:
• Establish measurable reliability standards for critical services by creating and maintaining service indicators, objectives, and error budget practices.
• Take ownership of production stability by monitoring uptime, latency, and availability, and driving improvements that reduce operational risk.
• Lead live incident response efforts, coordinate troubleshooting during outages, and ensure issues are resolved efficiently and thoroughly...