Position Overview
On-prem Platform Engineer
Location: Charlotte, NC
Key Skills:
Must-Have Skills (Mandatory Keywords)
LLM Inference & Optimization
- vLLM, TensorRT-LLM, Triton Inference Server, SGLang
- Inference optimization techniques:
- Continuous batching
- Speculative decoding
- KV cache / Prefix caching
- Model optimization:
Distributed & GPU Systems
- Tensor parallelism and large model scaling
- CUDA, NCCL, GPU architecture
- GPU partitioning & optimization (MIG)
Kubernetes & ML Serving
- Kubernetes-based ML serving...