Design and implement enterprise-grade monitoring and observability frameworks (metrics, logs, traces) across distributed systems using enterprise Splunk, Grafana and Open-telemetry tools
Establish and manage SLIs, SLOs, and error budgets to drive reliability improvements
Develop and maintain real-time asset inventory systems across cloud, on-prem, and hybrid environments
Automate workload onboarding and offboarding processes, ensuring standardization and governance
Track system ownership, dependencies, and lifecycle states for operational transparency
Build proactive detection mechanisms using AIOps and intelligent alerting to minimize incident impact
Design and operate scalable, resilient, and secure infrastructure platforms across cloud and hybrid environments
Implement automated compliance tracking and enforcement aligned with organizational and regula...