Position Overview
Responsibilities
- Design and implement observability solutions to monitor system performance and availability.
- Develop dashboards and alerts for real-time monitoring of critical systems and services.
- Collaborate with cross-functional teams to identify and resolve system issues proactively.
- Analyze system logs and metrics to detect anomalies and optimize system performance.
- Develop automation scripts to streamline monitoring and alerting processes.
- Contribute to the continuous improvement of the organization's monitoring capabilities.
- Ensure compliance with industry standards and best practices for system observability.
- Provide technical guidance and support to team members and stakeholders.
The Successful Applicant
- Strong experience with observability platforms (e.g., Azure Monitor, Prometheus, ELK, Datadog, Splunk, etc.).
- Deep understanding of metrics, logs, and distr...