Overview
Voice is one of the most natural ways for kids to learn, explore, and show what they know, yet today's Automatic Speech Recognition (ASR) technology hardly understands them. Built on adult speech, most ASR systems struggle with the pitch, rhythm, and evolving articulation of young learners.
The high error rate of ASR for children prevents downstream uses that could enhance educational outcomes and scale early screening and intervention. High-quality ASR for kids can unlock a new generation of learning tools like playful voice-driven tutoring systems, early support for speech and phonological development, adaptive literacy assessments, and accessible interfaces for students who are preliterate or otherwise benefit from alternatives to text or visual input.
Challenge structure
This challenge assembles pre-existing and newly labeled datasets to advance speech models that truly work for children. Solvers will develop models that accurately capture what children say and how they say it from audio recordings.
There are two independent tracks that enable different downstream uses. Solvers can compete in either track and can earn prizes for each. Across tracks, prizes total $120,000.
- In the Word Track, solvers will predict the words spoken by children in audio clips. Word-level models enable automated transcription, verbal tool use, and assessments related to cognition and speech (e.g., comprehension, reasoning).
- In the Phonetic Track, solvers will predict the speech sounds, or phones, spoken by children in audio clips. Phonetic models are critical for diagnostic applications like speech pathology screening.
The competitions
Word Track
Develop automatic speech recognition models that produce word-level transcriptions of children’s speech. #education
Phonetic Track
Develop automatic speech recognition models that produce phone-level transcriptions of children’s speech in IPA. #education