On Top of Pasketti: Children’s Speech Recognition Challenge

Develop cutting-edge ASR algorithms specifically for children's speech to advance early education assessments and teaching tools. #education

$120,000 in prizes
Completed apr 2026
828 joined

Automatic speech recognition (ASR) models transcribe adult speech well but struggle with children's voices. Kids have distinct vocal characteristics, inconsistent pronunciation, and are still developing the motor skills that shape how they speak — resulting in error rates 4–8x worse than for adults. Narrowing this performance gap would unlock a range of ASR applications that could enhance educational outcomes and scale early screening and intervention.

The On Top of Pasketti: Children's Speech Recognition Challenge brought together a global community of machine learning practitioners to develop open ASR models tailored to early education with an assembled, newly labelled dataset of 560k child utterances representing over 515 hours of read, prompted, and spontaneous speech collected in a range of populations and settings.

The challenge ran two tracks:

  • In the Word Track, solvers predicted the words spoken by children in audio clips. Word-level models enable automated transcription, verbal tool use, and assessments related to cognition and speech (e.g., comprehension, reasoning).
  • In the Phonetic Track, solvers predicted the speech sounds, or phones, spoken by children in audio clips. Phonetic models are critical for diagnostic applications like speech pathology screening.

The Results

Over 828 participants submitted more than 2,100 solutions across both tracks. In the Word Track, top solvers cut the error rate of the best existing children's speech model by more than half, converging on fine-tuned Qwen3-ASR-1.7B. In the Phonetic Track, winners improved 49% over the reference solution using WavLM-based ensembles. Across both tracks, performance gains were consistent across populations. These models may be currently useable for some applications and populations, but they still struggle in key contexts, like with very young children and in noisy environments.

To build on the challenge results and deepen the public impact of these advances, we are retraining winning solutions on a larger dataset with better coverage of high-impact settings and student populations, and will publish them as open-weight models. Sign up below to be notified when open-weight models and other competition assets like annotations become available.

In the meantime, you can browse winning code and solution reports and read about the top approaches in detail.

Get Notified

To stay informed about updates on model releases, sign up for a challenge-related update list here.

There was a problem. Please try again.
Subscribe successful!
Enter your email address to join an On Top of Pasketti mailing list.
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

The competitions

education

Phonetic Track

Develop automatic speech recognition models that produce phone-level transcriptions of children’s speech in IPA. #education

430 joined
$50,000 in prizes
apr 2026

education

Word Track

Develop automatic speech recognition models that produce word-level transcriptions of children’s speech. #education

696 joined
$70,000 in prizes
apr 2026