Position Overview
Job Description
Insight Global is seeking a Network Engineer – Reliability & Observability to support the quality, reliability, and lifecycle performance of large-scale AI network infrastructure. This role serves as a reliability engineering leader, responsible for building processes, data collection frameworks, and reliability metrics to improve network performance from initial deployment through ongoing operations.
This position focuses on developing scalable processes, systems, tooling, and data pipelines that drive network observability and reliability. You will deliver automated 24x7 metrics as well as periodic reliability reporting for both internal stakeholders and external customers, ensuring visibility into network health, performance, and risk.
This role is well-suited for experienced network operators who are passionate about reliability engineering and full-lifecycle software development, including quality assurance audits, circuit audits, periodic inspections, fai...