Flexible Work, Better Balance
A project dedicated to assessing and benchmarking advanced agentic audio models against leading systems. The programβs mission is to evaluate and optimize model performance for real-world customer support use cases.
Responsibilities Create and execute role-play-based evaluation scenarios that simulate realistic customer service interactions across multiple domains, including: Flight bookings and travel support Financial services Telecommunications and technical support Contribute to the development of diverse and representative datasets used to assess conversational audio agents. Evaluate model performance across a standardized set of qualitative and quantitative metrics. Ensure evaluations reflect real customer expectations for clarity, efficiency, and natural conversational flow. Evaluation MetricsModel performance is assessed using a combination of conversational, technical, and audio-specific criteria, including but not limited to:
Task completion accuracy and ...