Position Overview
We’re looking for a seasoned DevOps & Site Reliability Engineering (SRE) Lead to design, scale, and elevate our cloud infrastructure and observability ecosystem.
If you’re passionate about automation, system resilience, and building highly reliable platforms — this role is for you.
Responsibilities - Architect and deploy scalable, highly available cloud infrastructure
- Lead SRE best practices to ensure reliability, performance, and scalability
- Optimize CI/CD pipelines (Jenkins, Argo CD or similar) for seamless deployments
- Define and track SLOs & SLIs to maintain uptime and service health
- Build robust observability frameworks (Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic)
- Manage Kubernetes clusters and Helm charts for efficient orchestration
- Implement auto-healing systems and proactive monitoring
- Drive chaos engineering and resilience testing (Chaos Mesh, Litmus, AWS FIS)
- Collab...