PREPARE Challenge: Data for Early Prediction (Phase 1)

Find, curate, or contribute data to help the National Institute of Aging, a center of the National Institute of Health, create representative and open datasets that can be used for the early prediction of Alzheimer's disease and related dementias. #health

$200,000 in prizes
jan 2024
376 joined

Ideas for Data Collection

In addition to the main challenge, solvers are invited to propose new data collection that will result in an open, shareable dataset that can support novel machine learning approaches for early prediction of AD/ADRD with an emphasis on addressing biases in existing data sources. Solvers who propose ideas for new datasets will be eligible to win prizes from a smaller pool of $25,000.

Unlike in the main challenge where solvers submit existing data, this alternate track invites solvers to propose an idea for future data collection efforts that the solvers intend to undertake. The proposed data need not be ready by the end of the challenge, and indeed, more ambitious proposals may take much longer to collect. The proposed data must include a target variable that validly indicates AD/ADRD, along with predictor variables that can be used as input features for predicting the target variable. Visit the Problem description for more about the problem framing.

Submission format

Proposal Submissions must include a data collection proposal section and a team introduction section (subheadings encouraged but optional):

Data collection proposal (maximum 4 pages, excluding references)

  • Background: Clearly articulate your understanding of the problem and goals of the challenge, and how the proposed dataset will address those problems.
  • Basic information: Summarize the proposed activities and describe the resulting dataset, including the data sources and relevant study methodology, and which if any parts of the dataset already exist. The description of the resulting dataset should include its sample size (or target sample size, with justification), definitions of the proposed target variable(s) and proposed predictor variables, the amount of time required to complete activities, and major challenges of the proposed plan.
  • Utility & rigor: Provide a definition of the proposed target variable(s), information about its measurement and its distribution in the sample. Present evidence that the target reliably and validly indicates AD/ADRD. Describe the predictor variables in the dataset and explain any theoretical or empirical links to AD/ADRD. Include any potential for validation or generalization to other data.
  • Innovation: Describe to what extent the proposed data push forward the state of the field and present novel insights and directions for further research by, for example, enabling higher accuracy, earlier predictions, lower cost, and/or greater accessibility. Describe other similar datasets, if any, and how the proposed dataset provides advancements over existing ones. Explain the potential of machine learning on the data to improve early AD/ADRD prediction.
  • Sample characteristics and representation: Describe how the proposed activities and resulting data would address current biases in research and diagnosis of AD/ADRD in populations disproportionately impacted by AD/ADRD. Include relevant information about the sampling approach, such as whether participants will be compensated for participation, the geographic location of participants, how participants will be contacted and recruited, and any other aspects of the study methodology designed to enhance sample representativeness.
  • Feasibility: Describe the major challenges of the proposed plan and strategies to mitigate them. Describe the experiences and abilities of the team to complete the proposed data collection with or without additional support.

Team Introduction (maximum 1 page): Describe the submitting team, including members' roles and expertise with respect to the submission. For example, this may include information about a team member’s role on the project, job title, career stage, institutional affiliation of the team members, and relevant education, training, and professional or personal experiences.

Written submissions must:

  • Consist of a single PDF file with page size set to 8.5” x 11” and at least 1-inch margins.
  • Use a font no smaller than 11-point Arial and line spacing no less than 1.0.
  • Be written in English.
  • Not use the HHS logo or official seal or the logo of NIH or NIA in the entries and must not claim federal government endorsement.

Data collection proposals must be uploaded to the Ideas for data submission page by the Final submission due date, January 31, 2024 at 11:59:59 PM UTC.

A template for the data collection proposal containing the required sections and descriptions of each section is provided on the Data downloads page.

Evaluation criteria

Entries that are responsive and comply with the entry requirements will be scored by technical reviewers in accordance with the criteria outlined below.

Utility & Rigor - Predictor(s) (20%): What is the potential for the proposed predictor data to provide useful signal for early prediction of AD/ADRD? What are the benefits for using this information beyond what exists today (e.g., wider population, gaps in coverage, earlier identification, lower cost, etc.)?

Utility & Rigor - Target(s) (20%): How well defined is the proposed AD/ADRD target variable? How well do applicants justify their choice of a target variable? How well do the proposed data include demonstrated links between the predictors and target variable(s)?

Innovation (20%): To what extent do the proposed data push forward the state of the field and present novel insights and directions for further research?

Disproportionate Impact (20%): To what extent do the proposed data help address current biases in research and diagnosis of AD/ADRD in populations disproportionately impacted by AD/ADRD?

Feasibility (20%): How feasible is the data collection proposal? How well positioned is the team to successfully complete the proposed approach with additional support?

Good luck

Good luck and have fun engaging with this challenge! If you have any questions, send an email to the challenge organizers at info@drivendata.org or post on the forum!