🌍 Global Opportunities
⚑ Updated Hourly
πŸŽ“ Student Friendly
⏰

parttimejobs.work

Flexible Work, Better Balance

⏰ Full-time

Senior Software Engineer, AI Resiliency

NVIDIA
Location πŸ“ Redmond, United States
Posted πŸ“… June 01, 2026
Work Type ⏰ Full-time

Position Overview

We are now looking for a Senior Software Engineer for AI Resiliency!


At NVIDIA, we are pushing the boundaries of what’s possible in AI. We are currently seeking a Senior Software Engineer to lead the development of AI software resiliency for the most powerful AI supercomputers in the world. As a member of our AI Software Resiliency team, you will play a pivotal role in defining and implementing critical resiliency features for AI supercomputers at a scale of 100,000+ GPUs. Your expertise will be crucial in driving down cluster downtime towards zero, ensuring that our AI systems remain robust and reliable at all times.


What You’ll Be Doing:
+ Develop AI Software Resiliency Features: Implement and optimize software features that improve AI system reliability at a massive scale, such as fast checkpoint-recovery, error detection, error isolation, and straggler/hang detection.
+ Hands-On Coding & Optimization: Contribute to large-scale distributed syst...

Apply Now

Submit Application β†’

Quick and easy application process

Job Details

⏰
Employment Type
Full-time
πŸ“Š
Category
other-general
🏠
Work Arrangement
On-site
πŸ“
Location
Redmond, United States