Automatic speech recognition (ASR) models transcribe adult speech well but struggle with children's voices. Kids have distinct vocal characteristics, inconsistent pronunciation, and are still developing the motor skills that shape how they speak — resulting in error rates 4–8x worse than for adults. Narrowing this performance gap would unlock a range of ASR applications that could enhance educational outcomes and scale early screening and intervention.
The On Top of Pasketti: Children's Speech Recognition Challenge brought together a global community of machine learning practitioners to develop open ASR models tailored to early education with an assembled, newly labelled dataset of 560k child utterances representing over 515 hours of read, prompted, and spontaneous speech collected in a range of populations and settings.
The challenge ran two tracks:
- In the Word Track, solvers predicted the words spoken by children in audio clips. Word-level models enable automated transcription, verbal tool use, and assessments related to cognition and speech (e.g., comprehension, reasoning).
- In the Phonetic Track, solvers predicted the speech sounds, or phones, spoken by children in audio clips. Phonetic models are critical for diagnostic applications like speech pathology screening.
The Results
Over 828 participants submitted more than 2,100 solutions across both tracks. In the Word Track, top solvers cut the error rate of the best existing children's speech model by more than half, converging on fine-tuned Qwen3-ASR-1.7B. In the Phonetic Track, winners improved 49% over the reference solution using WavLM-based ensembles. Across both tracks, performance gains were consistent across populations. These models may be currently useable for some applications and populations, but they still struggle in key contexts, like with very young children and in noisy environments.
To build on the challenge results and deepen the public impact of these advances, we are retraining winning solutions on a larger dataset with better coverage of high-impact settings and student populations, and will publish them as open-weight models. Sign up below to be notified when open-weight models and other competition assets like annotations become available.
In the meantime, you can browse winning code and solution reports and read about the top approaches in detail.
Get Notified
To stay informed about updates on model releases, sign up for a challenge-related update list here.
The competitions
Phonetic Track
Develop automatic speech recognition models that produce phone-level transcriptions of children’s speech in IPA. #education
Word Track
Develop automatic speech recognition models that produce word-level transcriptions of children’s speech. #education