Overview
Voice is one of the most natural ways for kids to learn, explore, and show what they know, yet today's Automatic Speech Recognition (ASR) technology hardly understands them. Built on adult speech, most ASR systems struggle with the pitch, rhythm, and evolving articulation of young learners.
The high error rate of ASR for children prevents downstream uses that could enhance educational outcomes and scale early screening and intervention. High-quality ASR for kids can unlock a new generation of learning tools like playful voice-driven tutoring systems, early support for speech and phonological development, adaptive literacy assessments, and accessible interfaces for students who are preliterate or otherwise benefit from alternatives to text or visual input.
Challenge structure
This challenge assembles pre-existing and newly labeled datasets to advance speech models that truly work for children. Solvers will develop models that accurately capture what children say and how they say it from audio recordings.
There are two independent tracks that enable different downstream uses. Solvers can compete in either track and can earn prizes for each. Across tracks, prizes total $120,000.
- In the Word Track, solvers will predict the words spoken by children in audio clips. Word-level models enable automated transcription, verbal tool use, and assessments related to cognition and speech (e.g., comprehension, reasoning).
- In the Phonetic Track, solvers will predict the speech sounds, or phones, spoken by children in audio clips. Phonetic models are critical for diagnostic applications like speech pathology screening.
Prize Overview
| Place | Word Track | Phonetic Track |
|---|---|---|
| 1st | $25,000 | $25,000 |
| 2nd | $15,000 | $15,000 |
| 3rd | $10,000 | $10,000 |
| Noisy Classroom Bonus (4x) | $5,000 each | N/A |
| Total Prize Pool | $120,000 | |
After the Competition
After the competition concludes, the top-performing solutions will undergo additional refinement to ensure they are robust and ready for broader use. Once finalized, the winning approaches will be released as part of an open-source software library designed to make child-focused ASR models easy to access, implement, and integrate into downstream applications. A subset of competition annotations will also be made publicly available to support continued research and development.
To stay informed about updates on model releases, code, and documentation after the competition ends, sign up for a challenge-specific update list here.
The competitions
Word Track
Develop automatic speech recognition models that produce word-level transcriptions of children’s speech. #education
Phonetic Track
Develop automatic speech recognition models that produce phone-level transcriptions of children’s speech in IPA. #education