Position Overview
An established technology-driven organisation is seeking an experienced Site Reliability Engineer (SRE) in Glasgow to strengthen and scale their cloud-native data platform, utilising AWS, Snowflake, and Databricks. This position offers the opportunity to drive automation, resilience, and operational excellence across critical data services.
Key Responsibilities:
- Automate infrastructure provisioning and platform operations using Infrastructure as Code and CI/CD tools.
- Lead and execute reliability initiatives including disaster recovery planning, failure testing, and resilience validation.
- Define and manage service health metrics (SLIs/SLOs/SLAs) to drive measurable improvements in reliability.
- Build observability solutions to monitor AWS, Snowflake, and Databricks workloads.
- Collaborate with engineering teams to embed reliability best practices throughout platform development.
- Analyse incidents and proa...