Flexible Work, Better Balance
Role– Site Reliability Engineer
Location : Hove, UK
Work Mode :Hybrid
Mandatory primary skills on Datadog / Dynatrace tools, SLO management skills (AWS cloud skills is secondary).
Primary Responsibilities:
• Work closely with Product Engineering team and implement strategies for modernizing IT operations enhancing observability and toil reduction.
• Architect and deploy observability platforms to monitor system health, performance, and reliability effectively.
• Propose & drive strategies for AI-driven alerting and proactive anomaly detection to reduce MTTD & MTTR.
• Develop and enforce SRE best practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.
• Establish & create AIOPS roadmap for improving operational efficiency.
• Lead efforts to automate repetitive tasks (toil) using scripting, orchestration tools, and AI/ML-based solutions.
• Drive toil automation initiatives for automate...