🌍 Global Opportunities
Updated Hourly
🎓 Student Friendly

parttimejobs.work

Flexible Work, Better Balance

⏰ Full-time

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

NVIDIA
Location 📍 Santa Clara, United States
Posted 📅 June 03, 2026
Work Type ⏰ Full-time

Position Overview

NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU shards, routes requests, and manages shared KV cache across heterogeneous clusters so that many accelerators feel like a single system at datacenter scale. As large language models rapidly outgrow the memory and compute budget of any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads.

We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems.

What you'll be doing:
+ Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference.
+ Architect and implement deep integrations w...

Apply Now

Submit Application →

Quick and easy application process

Job Details

Employment Type
Full-time
📊
Category
other-general
🏠
Work Arrangement
On-site
📍
Location
Santa Clara, United States