⏰ Full-time

AI Evaluation Engineer (Cequeliños)

🏢

FirstIgnite

                    Location
                    📍 arbo, Spain
                

                    Posted
                    📅 June 08, 2026
                

                    Work Type
                    ⏰ Full-time
                

Position Overview

AI Evaluation Engineer We’re hiring an AI Evaluation Engineer to own the quality bar for every LLM-powered feature we ship. You will design, build, and scale the infrastructure that tells us -- with evidence -- whether a prompt change, model swap, or agent refactor made things better or worse. 
Responsibilities Build evaluation infrastructure: Design and maintain eval suites using Promptfoo, LLM-as-judge methodologies, and custom harnesses for features such as our expert search system, natural language grants search, and AI SDR agents. 
Define what good means: Partner with product and domain experts to translate vague customer outcomes (does this surface the right principal investigator?) into precise, measurable rubrics. 
Own the feedback loop: Instrument production traffic, curate golden datasets from real customer interactions, and build pipelines that turn user behavior into regression tests. 
Ship quickly under uncertainty...
                

Apply Now

Submit Application →

Quick and easy application process

Job Details

⏰

Employment Type

Full-time

📊