Position Overview
Job Title: LLM Evaluator (Model Response Analyst) Location: Remote (Worldwide) Job Summary: We are seeking a detail-oriented and analytical LLM Evaluator to assess, analyze, and improve the performance of large language models (LLMs). In this role, you will evaluate AI-generated content for accuracy, coherence, factual reliability, bias, safety, and alignment with defined guidelines. Responsibilities Evaluate and rank model-generated text based on complex rubrics covering dimensions such as factuality, coherence, safety, instruction‑following, and creativity. Review multiple model responses to the same prompt and determine which output a human would prefer, providing justifications for your choices. Provide clear, concise feedback to the modeling and training teams regarding recurring failure models observed during evaluation sessions. Attempt to “break” the model by crafting prompts designed to elicit biased, harmful, or insecure outputs to help patch safety vulnerabilities. Collabora...