🌍 Global Opportunities
Updated Hourly
🎓 Student Friendly

parttimejobs.work

Flexible Work, Better Balance

⏰ Full-time

Manager, Large Language Model Inference

NVIDIA
Location 📍 Santa Clara, United States
Posted 📅 June 03, 2026
Work Type ⏰ Full-time

Position Overview

At NVIDIA, we aren't just powering the AI revolution—we're accelerating it. The TensorRT inference platform is the backbone of modern AI, delivering the industry's fastest and most efficient deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for AI exploding, particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are significantly expanding our team. We're seeking a highly skilled and driven Engineering Manager to take the lead in developing the next generation of LLM/VLM/VLA inference software technologies that will define the future of AI. This is a high-impact, hands-on leadership role at the intersection of deep technical expertise and world-class management. You won't just manage; you'll architect and guide a brilliant team of engineers who are building the core LLM inference runtime. Your work will be highly collaborative, interfacing directly with NVIDIA Researchers, GPU Architects, and other teams acro...

Apply Now

Submit Application →

Quick and easy application process

Job Details

Employment Type
Full-time
📊
Category
other-general
🏠
Work Arrangement
On-site
📍
Location
Santa Clara, United States