⏰ Full-time

On-prem Platform Engineer

🏢

Apolis

                    Location
                    📍 Charlotte, North Carolina, United States
                

                    Posted
                    📅 May 16, 2026
                

                    Work Type
                    ⏰ Full-time
                

Position Overview

  On-prem Platform Engineer

  Location:  Charlotte, NC 

  Key Skills: 

  Must-Have Skills (Mandatory Keywords) 

  LLM Inference & Optimization 

 vLLM, TensorRT-LLM, Triton Inference Server, SGLang

 Inference optimization techniques:
 
 Continuous batching

 Speculative decoding

 KV cache / Prefix caching

 Model optimization:
 
 FP8, AWQ, GPTQ

  Distributed & GPU Systems 

 Tensor parallelism and large model scaling

 CUDA, NCCL, GPU architecture

 GPU partitioning & optimization (MIG)

  Kubernetes & ML Serving 

 Kubernetes-based ML serving...

Apply Now

Submit Application →

Quick and easy application process

Job Details

⏰

Employment Type

Full-time

📊