🌍 Global Opportunities
Updated Hourly
🎓 Student Friendly

parttimejobs.work

Flexible Work, Better Balance

⏰ Full-time

Software Engineer, Inference - CUDA / Kernels

OpenAI
Location 📍 san francisco, United-States
Posted 📅 June 11, 2026
Work Type ⏰ Full-time

Position Overview

About the Team
We’re building high-performance infrastructure to serve OpenAI’s frontier models at massive scale. As part of the inference team, you’ll be responsible for unlocking every last FLOP from our GPUs by designing kernels, tuning memory layouts, and optimizing model execution at the lowest levels of the stack.

About the Role
We are looking for a kernel-focused engineer to lead efforts in writing, porting, and optimizing GPU kernels used in inference workloads. This role requires deep familiarity with CUDA or equivalent kernel programming environments, and a strong intuition for performance tuning across modern GPU architectures.

In this role, you will:

  • Design, implement, and optimize CUDA kernels for inference-critical operations (e.g., fused matmuls, custom activation functions, memory layout transforms).

  • Analyze performance bottlenecks and optimize kernel executi...

Apply Now

Submit Application →

Quick and easy application process

Job Details

Employment Type
Full-time
📊
Category
Other-General
🏠
Work Arrangement
On-site
📍
Location
san francisco, United-States