⏰ Full-time

Lead AI Infrastructure Engineer

🏢

Zyoin Group

                    Location
                    📍 India, India
                

                    Posted
                    📅 June 03, 2026
                

                    Work Type
                    ⏰ Full-time
                

Position Overview

Inference Optimization
Drive TTFT below 400ms for multi-step agent pipelines
Streaming optimization: first token to user while sub-agents are still running
KV cache strategy, prompt compression, dynamic context window management
Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models
Agent Architecture
Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains
Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency
Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation
Tool call design: schema design that LLMs actually follow reliably across providers
Evaluation & Harness
Own the eval framework en...
                

Apply Now

Submit Application →

Quick and easy application process

Job Details

⏰

Employment Type

Full-time

📊