Position Overview
Description
Robert Half is hiring! We are looking for an experienced Site Reliability Engineer to join our team. This role involves designing, operating, and enhancing a secure, scalable, and cost-efficient multi-cloud platform. The ideal candidate will possess a strong technical background, a passion for automation and observability, and a commitment to improving system reliability and efficiency.
Responsibilities:
• Design, implement, and manage reliable and scalable systems across multi-cloud environments, including AWS and Azure.
• Develop and refine service level objectives (SLOs), service level indicators (SLIs), and error budgets to support system reliability.
• Lead root cause analyses for incidents and implement measures to prevent recurrence.
• Enhance platform observability by creating and maintaining metrics, logs, traces, and alerts.
• Drive cloud cost optimization initiatives by implementing cost visibility, ...