Position Overview
This role is not just an internship. It is an entry point into worldclass AI collaboration.
Your Impact & Responsibilities
As a Data Engineer Intern, you will operate as a hands‑on contributor to our ASR data pipeline, not a passive assistant.
You Will
- Engineer, preprocess, and quality‑validate large‑scale speech and text datasets that directly influence ASR model performance
- Design and execute data transformations including text normalization, data chunking, format conversion, and structured analysis
- Optimize audio pipelines through segmentation, merging, transcoding, and subtitle/caption quality assurance
- Strengthen data pipelines by improving robustness, traceability, and reproducibility through clean logs and documentation
- Proactively identify data quality risks, triage issues at scale, and close the feedback loop with clarity and ownership
Your work feeds production speech model...