📍 Local Job Near You
Pre-Training Text Data
Microsoft Corporation
📍
Multiple Locations, United States
Location
Multiple Locations
Posted
June 10, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
**Overview**
We are seeking engineers and researchers to join our Pretraining Text Data team, where we are building the next generation of foundation large language models. If you are passionate about designing and curating high-quality datasets to power frontier AI models, this role is for you.
In this role, you’ll work at the intersection of data and innovation—collaborating with scientists, engineers, and annotators to curate, analyze, and evaluate diverse text datasets critical to model development. You will lead efforts to:
+ Develop novel data collection strategies
+ Improve dataset quality and integrity
+ Understand data-driven model behaviors
+ Train models to understand the impact of data and data mixes
+ Align datasets with ethical and societal values
This is a cross-disciplinary, high-impact role ideal for engineers and researchers who want to push the boundaries of what AI can learn from data.
...
We are seeking engineers and researchers to join our Pretraining Text Data team, where we are building the next generation of foundation large language models. If you are passionate about designing and curating high-quality datasets to power frontier AI models, this role is for you.
In this role, you’ll work at the intersection of data and innovation—collaborating with scientists, engineers, and annotators to curate, analyze, and evaluate diverse text datasets critical to model development. You will lead efforts to:
+ Develop novel data collection strategies
+ Improve dataset quality and integrity
+ Understand data-driven model behaviors
+ Train models to understand the impact of data and data mixes
+ Align datasets with ethical and societal values
This is a cross-disciplinary, high-impact role ideal for engineers and researchers who want to push the boundaries of what AI can learn from data.
...