Goodnight Moon, Hello Early Literacy Screening

Help educators provide effective early literacy intervention by scoring audio clips from literacy screeners. #education

$30,000 in prizes
7 weeks left
218 joined

Overview

Literacy—the ability to read, write, and comprehend language—is a fundamental skill that underlies personal development, academic success, career opportunities, and active participation in society. Many children in the United States need more support with their language skills. A national test of literacy in 2022 estimated that 37% of U.S. fourth graders lack basic reading skills. Addressing shortfalls as early as preschool is a promising approach given the strong relationship between early and later childhood literacy.

In order to provide effective early literacy intervention, teachers must be able to reliably identify the students who need support. Currently, teachers across the U.S. are tasked with administering and scoring literacy screeners, which are written or verbal tests that are manually scored following detailed rubrics. Manual scoring methods not only take time, but they may be unreliable, producing different results depending on who is scoring the test and how thoroughly they were trained.

Machine learning approaches to score literacy assessments can help teachers quickly and reliably identify children in need of early literacy intervention. By advancing state-of-the-art machine learning approaches, there is an opportunity to transform the landscape of literacy screening and intervention in classrooms across the United States.

Your goal in this challenge is to develop a model to score audio recordings from literacy screener exercises completed by students in kindergarten through 3rd grade.

Prizes

In this competition, solvers will train models on anonymized training data to qualify for prizes. Anonymized data is used in the competition because the audio clips contain child voices and the raw data cannot be shared.

The prize finalists are the top 3 teams on the private leaderboard at the end of the competition. Finalists will package up their training and inference code and their models will be retrained on the raw, non-anonymized training data and then evaluated on the raw, non-anonymized test set. Final prize rankings will be determined by model performance on both the anonymized and raw test sets. Final scores will be a weighted average of the anonymized test set score from the private competition leaderboard (70%) and the raw test set score produced by the model retrained on raw data (30%).

This competition also includes a bonus prize for explainability write-ups that describe methods to determine where in the audio stream error(s) occur and provide insight into the model decision-making rationale.


Competition End Date:

Jan. 22, 2025, 11:59 p.m. UTC

Place Prize Amount
1st $12,000
2nd $8,000
3rd $5,000
Bonus prize Prize Amount
1st $3,000
2nd $2,000

Bonus prize: Explainability and localization write-up

Teams that place in the top 5 on the private leaderboard at the end of the competition will be invited to submit to the bonus round. Participating teams will submit code and an associated write up (max 1 page of text, max 3 pages with visuals) that use explainability techniques to provide insight into the model's decision-making process. Preference will be given to submissions that include localization approaches to identify the portion in the incorrect audio sample that contains the mistake(s). A bonus prize will be awarded to the two best write-ups as selected by a judging panel of subject matter experts. Submissions will be judged on methodological robustness, insight, localization ability, and clarity of communication.


How to compete

  1. Click the "Compete!" button in the sidebar to enroll in the competition.
  2. Get familiar with the problem through the overview and problem description. You might also want to reference additional resources available on the about page.
  3. Download the data from the data download page.
  4. Create and train your own model. The benchmark blog post is a good place to start.
  5. Share useful code or find code shared by others in the Community Code section.
  6. Package your model files with the code to make predictions based on the runtime repository specification on the code submission format page.
  7. Click “Submissions” on the sidebar followed by “Make new submission” to submit your code as a zip archive for containerized execution. You’re in!

The challenge rules are in place to promote fair competition and useful solutions. If you are ever unsure whether your solution meets the competition rules, ask the challenge organizers in the competition forum or send an email to info@drivendata.org.


This challenge is sponsored by the MIT Senseable Intelligence Group, MIT Gabrieli Lab, and FSU Florida Center for Reading Research.

Sponsor Logos


Image courtesy of Reach Every Reader