Architect agentic systems, including planning, memory, tool integration, multi-agent delegation, evaluation loops, and guardrails.
Drive model capability into production by designing prompt and context strategies, tool interfaces, retrieval and reranking methods, structured output, streaming, and evaluation across text and multimodal inputs.
Lead evaluation efforts by building offline evaluation sets, online LLM-as-judge loops, and regression harnesses.
Optimize system performance through prompt caching, batching, speculative decoding, model routing, token budget management, and meeting latency targets.
Contribute to upstream development by reading SDK source code, submitting pull requests to open-source agent frameworks, and providing clear bug reports for vendor orchestration services.