Problem description
The challenge is centered around developing better methods for prediction of Alzheimer's disease and Alzheimer's disease related dementias (AD/ADRD) as early as possible. Phase 2 — [Build IT!]: Algorithms and Approaches Acoustic Track — is focused on building innovative models for early detection of AD/ADRD using audio data.
Current methods of screening for AD/ADRD are time intensive and difficult to perform. Models that can flag individuals with a high likelihood of cognitive decline early based on vocal characteristics have the potential to catch and treat cognitive decline earlier, and to reduce disparities in care for marginalized groups. Speech data would be a highly cost-effective and noninvasive method for assessing cognitive decline.
Overview of the data files provided for this competitions:
.
├── metadata.csv
├── submission_format.csv
├── test_features.csv
├── train_features.csv
└── train_labels.csv
All of the CSV files above can be accessed from the data download page. Participants can access raw audio files through TalkBank.
Prizes, with the exception of community code, will be awarded based on a combination of leaderboard score and model report. For details on the timeline and requirements, see the home page. No prizes will be awarded based on leaderboard score alone. All winners will be required to submit their modeling code. DrivenData will rerun the full model training and inference pipeline to confirm all winners' leaderboard scores.
Data
The feature data in this competition is a series of audio recordings collected from individuals diagnosed with some form of cognitive decline as well as healthy controls.
The data include 2,058 individuals from multiple different studies. Participants have access to ~30-second clips from the raw audio recordings, as well as pre-generated acoustic features. Participants can choose whether to use the audio recordings, the pre-generated features, or both. The focus of this challenge is on acoustic biomarkers, or voice-based features that may signal the presence of cognitive impairment. However, we encourage solvers to explore all possible features, including linguistic and semantic ones.
External data usage: Per the competition rules, external data is not allowed in this competition. However, participants can use pre-trained models as long as they were (1) available freely and openly in that form at the start of the competition and (2) not trained on any data associated with the ground truth data for this challenge.
Metadata
metadata.csv
provides basic information about each individual in the dataset, and includes both the train and test split. metadata.csv
includes the following columns:
uid
(str): A unique identifier for the individual. Each row is a unique individual.age
(int): Patient age.gender
(str): Patient gender. In this dataset, only the categories "male" and "female" are included.split
(str): Dataset split, either "train" or "test". There are 1,646 individuals in the train set, and 412 in the test set.hash
(str): Hash of the audio .mp3 file for the individual. This can be useful to verify the integrity of your own downloaded file. Hashes are generated using the MD5 hash function. In python, the MD5 hash of a file can be generated withhashlib.md5(file_path.read_bytes()).hexdigest()
using thehashlib
library. Note thatfile_path
must be a pathlike object (eg.pathlib.Path
), not a string.filesize_kb
(float): Size of the audio .mp3 file for the individual in KB.
Audio files
Raw audio .mp3
files are available for all individuals in both the train and the test set. Each audio recording corresponds to a different individual. Recordings have been diarized and spliced together to minimize interruptions from other speakers. Each recording is 30 seconds or less. Using the raw audio recordings is optional but encouraged. Participants can choose whether to use the audio recordings, the pre-generated features, or both.
The raw audio files are hosted on TalkBank, a repository for open access spoken language data. See the data download page for detailed instructions on how to access the raw audio files (you must be logged in and registered for the competition to see this page).
Three audio files will be available:
test_audios.zip
(98 MB): A zip file containing all of the audio snippets for the test set. There is one file per individuals, and 412 files total.train_audios.zip
(391 MB): A zip file containing all of the audio snippets for the train set. There is one file per individual, and 1,646 files total.train_audios_sample.zip
(3.5 MB): A zip file containing a small sample of 15 audio files. This is provided for easy download and exploration. All the files here are also included intrain_audios.zip
.
Within each zip file, recordings are saved under an individual's uid
from the metadata. For example, the recording for the individual with UID aazd
is saved within the test_audios.zip
file as aazd.mp3
.
Pre-generated acoustic features
Along with raw audio files, participants can use patient metadata and a set of pre-generated acoustic features. Using the pre-generated acoustic features is optional. Participants can choose whether to use the audio recordings, the pre-generated features, or both.
Each row in train_features.csv
represents a distinct 0.2-second slice of a recording, and is a unique combination of uid
and segment_start_sec
.
train_features.csv
and test_features.csv
each include the following columns:
uid
(str): A unique identifier for the individual.segment_start_sec
,segment_end_sec
(float): The start and end time within the patient's full audio recording in seconds.F0semitoneFrom27.5Hz_sma3nz_amean
toequivalentSoundLevel_dBp
(float): 88 different pre-generated acoustic features, extracted from each segment using the eGeMAPS V02 parameter set. These features include pitch, formants, speech rate, and other key acoustic markers that may be indicative of cognitive decline. Features were generated using theopensmile
package. Multiple studies have used this set of acoustic parameters to study detection of Alzheimer's Disease (J. Chen, J. Ye, F. Tang, and J. Zhou, 2021; F. Haider, S. de la Fuente, and S. Luz, 2020).
Pre-generated acoustic feature data
The first few rows in
train_features.csv
are:
uid, segment_start_sec,segment_end_sec,F0semitoneFrom27.5Hz_sma3nz_amean,...,equivalentSoundLevel_dBp
aaop,0.0,0.2,38.605347,...,-15.069119
aaop,0.1,0.3,15.119725,...,-15.477599
The first few rows all describe snippets of the audio recording for individual aaop
. The first row is features based on seconds 0 to 0.2, the second row is features based on seconds 0.1 to 0.3, and so on.
Solutions should adhere to the following rules while developing model features:
(1) Participants may annotate provided training data as long as they are included with solutions to enable reproduction and do not overfit to the test set;
(2) Participants may not add any manual annotations to the provided test data. Eligible solutions need to be able to run on test samples automatically using the test data as provided;
(3) Each test data sample should be processed independently during inference without the use of information from other cases in the test set. As a result, running model training code with the same training data but a different set of test data or no test data should produce the same model weights and fitted feature parameters. Eligible solutions need to be able to run on test samples automatically using the test data as provided.
For more context and to read examples, please see the announcement published on December 5, 2024.
Labels
The target variable is the cognitive status of each individual. There are three possible diagnoses:
- Control: Healthy individual, aging in a typical way
- MCI: Mild cognitive impairment (MCI). While not everyone who has MCI develops dementia, MCI is a useful indicator of risk that supports early detection of AD/ADRD.
- ADRD: An advanced diagnosis. This includes primary progressive aphasia (PPA), probable AD, and AD. Note that primary progressive aphasia (PPA) is distinct from Alzheimer's. They are grouped in this competition because both represent a form of advanced decline, and they share many symptoms and neurodegenerative conditions
train_labels.csv
includes the following columns:
uid
(str): Unique identifier for the individual. Each row is one individual.diagnosis_control
(float, 0.0 or 1.0): Whether the individual is a healthy control.diagnosis_mci
(float, 0.0 or 1.0): Whether the individual was diagnosed with mild cognitive impairment.diagnosis_adrd
(float, 0.0 or 1.0): Whether the individual was diagnosed with advanced decline (primary progressive aphasia, probable AD, or AD).
In each row, only one of diagnosis_control
, diagnosis_mci
, or diagnosis_adrd
will be equal to 1.
Labelled training data example
The first row in
train_labels.csv
is:
uid | aaop |
---|---|
diagnosis_control | 0.0 |
diagnosis_mci | 1.0 |
diagnosis_adrd | 0.0 |
aaop
has a diagnosis of mild cognitive impairment.
The cognitive status of each individual was diagnosed based on interviews, cognitive tests, and language-based tasks, including verbal fluency, sentence construction, picture descriptions, story recall, and conversational interactions. These screenings are time intensive and difficult to perform. The ability to automatically diagnose an individual based on their vocal characteristics would save time for clinicians and improve availability of cognitive screening.
Submission format
The format for submission is a .csv
with the following columns:
uid
(str): Unique identifier for the individual. Each row should be one individual.diagnosis_control
(float): Probability between 0 and 1 that the individual is a healthy control.diagnosis_mci
(float): Probability between 0 and 1 that the individual was diagnosed with mild cognitive impairment.diagnosis_adrd
(float): Probability between 0 and 1 that the individual was diagnosed with advanced decline (primary progressive aphasia, probable AD, or AD).
In each row, diagnosis_control
, diagnosis_mci
, and diagnosis_adrd
must add up to 1.
To create a submission, download submission_format.csv
and replace the placeholder values with your predictions for the test samples.
For example, if the first row of your predictions is:
uid,diagnosis_control,diagnosis_mci,diagnosis_adrd
aazd,0.0,0.3,0.7
That means you are predicting that there is a 30% chance that individual aazd
was diagnosed with MCI, and a 70% chance that they were diagnosed with more advanced cognitive decline.
Performance metric
Leaderboard performance is evaluated using multi-class log-loss. This is an error metric, so a lower value is better.
$$\text{Log loss} = -\frac{1}{N}\cdot\sum\limits_{n=1}^{N}\sum\limits_{m=1}^{M} y_{nm}\log p_{nm}$$
- |$N$| is the number of observations
- |$M$| is the number of classes (in this case |$M=3$|)
- |$y_{nm}$| is whether or not label |$m$| applies to observation |$n$| (0 or 1)
- |$p_{nm}$| is the user-predicted probability that label |$m$| applies to observation |$n$|.
In Python you can calculate log loss using the scikit-learn function sklearn.metrics.log_loss
.
Note that the public leaderboard displayed while the competition is running may not use the same subset of test data as the final leaderboard displayed after submissions close. Prizes will be based on a combination of final leaderboard score and model reports. No prize depends on leaderboard score alone. Winners will be required to submit their modeling code to verify their leaderboard score and adherence to the competition rules.
Competition arenas
The competition will be conducted in two stages, each with its own arena:
Model Arena |
|
Report Arena (Model Arena finalists only) |
|
Model Arena | Report Arena (Model Arena finalists only) |
---|---|
Oct 22 - Dec 19, 2024 | Dec 20, 2024 - Jan 22, 2025 |
Participants submit predictions in the Model Arena, and scores are displayed on the public leaderboard. | The top 15 leaderboard finalists from each Model Arena (Social Determinants Track + Acoustic Track) who confirm eligibility are invited to the pre-screened Report Arena. |
Report Arena evaluation
All prizes will be determined based on a combination of leaderboard performance and report quality. Reports will be judged by a panel of experts from the NIA.
Participants can submit two types of reports, each eligible for different prizes:
- Model Reports focus on generating a deeper understanding of the predictions within a medical context, including the global explainability of the model and its performance across different demographic groups. Model reports will be judged based on the following evaluation criteria:
- Model performance and methodology (40%)
- Insights and innovation (20%)
- Bias exploration and mitigation (20%)
- Generalizability (10%)
- Clarity and communication (10%)
- Explainability Reports help patients or care providers interpret and understand a prediction for a given individual.
Participants are not required to submit both reports to be considered. After the Model Arena has closed, additional data, including the test set, will be released to support model analysis. Detailed instructions for submitting reports will be shared with finalists in the pre-screened Report Arena at a later date.
Good luck
Not sure where to start? Check out the "How to compete" section on the homepage.
Good luck and enjoy the challenge! If you have any questions you can always visit the user forum.