🌍 Global Opportunities
Updated Hourly
🎓 Student Friendly

parttimejobs.work

Flexible Work, Better Balance

⏰ Full-time

On-prem Platform Engineer

Apolis
Location 📍 Charlotte, North Carolina, United States
Posted 📅 May 16, 2026
Work Type ⏰ Full-time

Position Overview

On-prem Platform Engineer



Location: Charlotte, NC





Key Skills:



Must-Have Skills (Mandatory Keywords)



LLM Inference & Optimization




  • vLLM, TensorRT-LLM, Triton Inference Server, SGLang

  • Inference optimization techniques:

    • Continuous batching

    • Speculative decoding

    • KV cache / Prefix caching



  • Model optimization:

    • FP8, AWQ, GPTQ





Distributed & GPU Systems




  • Tensor parallelism and large model scaling

  • CUDA, NCCL, GPU architecture

  • GPU partitioning & optimization (MIG)



Kubernetes & ML Serving




  • Kubernetes-based ML serving...

Apply Now

Submit Application →

Quick and easy application process

Job Details

Employment Type
Full-time
📊
Category
architecture-and-engineering
🏠
Work Arrangement
On-site
📍
Location
Charlotte, North Carolina, United States