Architect and execute large‑scale custom model training and fine‑tuning jobs (SFT, RLHF) on multi‑node, multi‑GPU clusters.
Optimize training throughput and memory efficiency using distributed training strategies (FSDP, DeepSpeed, Megatron‑LM) and mixed‑precision techniques (FP16/BF16).
Design and develop autonomous AI agents capable of multi‑step reasoning, planning, and tool execution to automate complex manufacturing workflows.
Implement Agentic frameworks (e.g., LangChain, LangGraph, CrewAI) to orchestrate LLM interactions with internal APIs, databases, and software tools.
Profile and debug GPU performance bottlenecks using tools like Nsight Systems or PyTorch Profiler to maximize hardware utilization.
Build and maintain data/solution pipelines that feed machine learning models and GenAI applications.
Design and optimize data structures in data management systems (Snowflake and Google Cloud p...