Position Overview
**Overview**
Help build the world’s most advanced multimodal dataset at Microsoft AI
We are on a mission to create the largest and most advanced multimodal dataset in the world. This dataset, spanning all modalities from across the web and beyond, will power the training of the world’s most capable AI frontier models, pushing the boundaries of scale, performance, and product deployment.
The AI Data Infra team at Microsoft AI is responsible for building data infrastructure to help MAI teams to generate the biggest and best training dataset. Our work involves data pipelines, Spark, Ray, Vector Databases, and all other aspects of data infra.
We are looking for outstanding individuals excited about contributing to the next generation of systems that will transform the field. In particular, we are looking for candidates who:
Are passionate about the role of data in large-scale AI model training
Will thrive in a highly collaborative, fast...