Position Overview
Role Summary
As a Site Reliability Engineer (SRE), you will build and operate highly available, globally distributed advertising/monetization services. You will improve reliability, scalability, and operability through automation, observability, incident management, and sound engineering practices.
Key Responsibilities
- Own reliability across the service lifecycle: design reviews, capacity planning, launch, deployment, operations, and continuous improvement.
- Build and operate highly available services across multiple regions/data centers; improve resilience, latency, and scalability.
- Develop automation and tooling to reduce toil (deployment, remediation, runbooks, self-healing) using scripting and software engineering best practices.
- Define and implement SLOs/SLIs/SLAs; create dashboards and alerting to track service health (availability, latency, errors, saturation).
- Lead sustainable incident response: tria...