Oversee the end-to-end management of production and UAT environments, ensuring high availability, resiliency, reliability, performance, and security.
Execute policies and procedures that ensure operational stability and availability.
Monitor production environments for anomalies, address issues, and drive evolution of utilization of standard observability tools.
Escalate and communicate issues and solutions to the business and technology stakeholders, actively participating from incident resolution to service restoration.
Drive incident, problem, and change management in support of full stack technology systems, applications, or infrastructure.
Participate in the design, deployment, and optimization of AWS-based infrastructure, leveraging cloud-native solutions for scalability and resilience.
Investigate and resolve data-related incidents, including quality issues, performance ...