Oversee the end-to-end management of production and UAT environments, ensuring high availability, resiliency, reliability, performance, and security.
Execute policies and procedures that ensure operational stability and availability
Monitor production environments for anomalies, address issues, and drive evolution of utilization of standard observability tools.
Escalate and communicate issues and solutions to the business and technology stakeholders, actively participating from incident resolution to service restoration
Drive incident, problem, and change management in support of full stack technology systems, applications, or infrastructure
Participate the design, deployment, and optimization of AWS-based infrastructure, leveraging cloud-native solutions for scalability and resilience.
Investigate and resolve data-related incidents, including quality issues, performance problems, and connectivity fai...