🌍 Global Opportunities
Updated Hourly
🎓 Student Friendly

parttimejobs.work

Flexible Work, Better Balance

⏰ Full-time

Senior LLM Deployment & Inference Optimization Engineer

Confidential
Location 📍 singapore, Singapore
Posted 📅 June 19, 2026
Work Type ⏰ Full-time

Position Overview

We are looking for an experienced Senior LLM Deployment & Inference Optimization Engineer to build and operate self-hosted inference infrastructure for LLMs, multimodal models, ASR, and TTS systems in the cloud. Your mission is to deliver a stable, low-latency, and cost-efficient inference platform that powers real-time conversations and voice interactions in AI-driven English learning classrooms. This is a senior, cross-functional engineering role focused on deploying, optimizing, and operating open-source inference engines and GPU infrastructure at scale, rather than developing inference kernels from scratch.


Responsibilities

  • Design, deploy, and operate self-hosted cloud inference services for LLMs, multimodal models, ASR, and TTS systems , building highly available and elastically scalable inference infrastructure.
  • Optimize and productionize open-source inference framewor...

Apply Now

Submit Application →

Quick and easy application process

Job Details

Employment Type
Full-time
📊
Category
E-Learning Providers
🏠
Work Arrangement
On-site
📍
Location
singapore, Singapore