Flexible Work, Better Balance
Overview
We are seeking experienced bilingual evaluators to support a multilingual AI safety project focused on evaluating model responses across culturally specific prompt-image datasets. This project involves applying a structured safety rubric to assess AI-generated responses for appropriateness, safety, and reliability within the target locale’s cultural context. Each language stream will process approximately 1, prompt-image pairs. Every item will receive two independent evaluations, with arbitration applied in cases of disagreement. Evaluations will primarily be documented in English, with a defined in-language sample. Project Details