Position Overview
Principal Site Reliability Engineer
Secaucus, NJ 07094
Responsibilities:
- Experience in transforming an organization by designing and implementing SRE capabilities, including monitoring, performance and chaos engineering. You will set the strategy for overall Site Reliability Engineering (SRE)/Development alignment
- Lead initiatives to implement service levels (SLIs, SLOs, SLAs) and error budgets. You will initiate, influence and drive SRE within the organization and work with product and service teams to enable this model.
- Provides guidelines/patterns and establishes proper metrics for building highly scalable, reliable, high performing systems
- Strategizes best in class monitoring frameworks to accomplish end to end flow monitoring and meaningful alerting.
- Coaches and mentors' teams of monitoring, performance and SRE engineers.
- Proven ability to implement processes, solutions and engin...