⏰ Full-time

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

🏢

NVIDIA

                    Location
                    📍 Santa Clara, United States
                

                    Posted
                    📅 June 03, 2026
                

                    Work Type
                    ⏰ Full-time
                

Position Overview

                    NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU shards, routes requests, and manages shared KV cache across heterogeneous clusters so that many accelerators feel like a single system at datacenter scale. As large language models rapidly outgrow the memory and compute budget of any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads.
  
We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems.
  
What you'll be doing:
+ Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference.
+ Architect and implement deep integrations w...
                

Apply Now

Submit Application →

Quick and easy application process

Job Details

⏰

Employment Type

Full-time

📊