Position Overview
Job Description
Serve as a part of the incident management team in a 24X7 Microsoft Azure environment. Candidate will diagnose, mitigate and/or escalate system issues to maintain a high level of system/platform availability. Candidate will serve as a part of the Live Site work stream and will require an understanding of core Windows Azure components and tools to diagnose issues.
Duties and Responsibilities
Responds to incident tickets in a 24x7 operational environment to meet SLA objectives. Troubleshoots system issues using diagnostic tools like netmom, windbg, and custom application tools. Reviews system logs to identify and mitigate system issues. Leverage knowledge base to help troubleshoot, identify and resolve systems issues. Update knowledge base troubleshooting guides and lessons learned as required. Document incident fixes and make recommendations to engineering team fo...