Navigation

Problem description

Competition
Competition Timeline
Data

Submissions
Submission Format 2016
Submission Format 2020

Competition timeline

There are two phases for this competition. PHASE I runs like every other competition with a public test set and private training set (in this case, the results of the 2016 election). During PHASE I submit your predictions using submission-format-2016.csv and your score will appear on the leaderboard.

However, the final leaderboard is not determined by your 2016 predictions. It is determined in PHASE II by your submission for 2020, which you have marked as "Final Evaluation Submission" in the submissions dialog. You will need to use submission-format-2020.csv for this submission.

Predict 2016
Predict 2020
Evaluate Predictions

Submit 2016 Predictions	Submit 2020 Predictions	Election Day	Submissions Evaluated
Launch - 11/3/2020	Launch - 11/3/2020	11/3/2016	11/3/2020 - Returns Finalized
Train your models on 2016 data and submit predictions for the leaderboard	Submit your predictions for 2020. You can make your submission at any point.	You must have your predictions submitted	We update the scores as election results come in. Scores will change until the vote counts are finalized. Final winner is determined based on 2020 election results.

Data

We don't provide any one set of feature data that you must use. Instead, you can use any publicly available data to make your predictions. For some suggestions, check out the resources page!

The Metric

Submissions will be scored according to the root mean squared error metric, which will quantify how far from the true vote shares your regression predictions are. This metric is implemented in Scikit Learn, but you'll need to set the squared parameter to False. See the submissions page for more information.

Submission Format 2016

The goal is to predict the fraction of voters that each candidate gets in each state. Fractions should be represented as float values between 0 and 1. You can see an example of the format that your submission must conform to (including headers and row names) in submission-format-2016.csv.

Submission values

Again, you must submit vote fractions between 0 and 1 for each state.

	Clinton	Trump	Other
state_abbreviation
AK	0.33	0.33	0.33
AL	0.33	0.33	0.33
AR	0.33	0.33	0.33
AZ	0.33	0.33	0.33
CA	0.33	0.33	0.33

As a CSV, those submissions might look like:

state_abbreviation,Clinton,Trump,Other
AK,0.33,0.33,0.33
AL,0.33,0.33,0.33
AR,0.33,0.33,0.33
AZ,0.33,0.33,0.33
CA,0.33,0.33,0.33
CO,0.33,0.33,0.33
CT,0.33,0.33,0.33
DC,0.33,0.33,0.33
DE,0.33,0.33,0.33

Submission Format 2020

The goal is to predict the fractions of voters that each candidate gets in each state. Fractions should be represented as float values between 0 and 1. You can see an example of the format that your submission must conform to (including headers and row names) in submission-format-2020.csv.

Submission values

Again, you must submit vote fractions between 0 and 1 for each state.

	Biden	Trump	Other
state_abbreviation
AK	0.33	0.33	0.33
AL	0.33	0.33	0.33
AR	0.33	0.33	0.33
AZ	0.33	0.33	0.33
CA	0.33	0.33	0.33

As a CSV, those submissions might look like:

state_abbreviation,Biden,Trump,Other
AK,0.33,0.33,0.33
AL,0.33,0.33,0.33
AR,0.33,0.33,0.33
AZ,0.33,0.33,0.33
CA,0.33,0.33,0.33
CO,0.33,0.33,0.33
CT,0.33,0.33,0.33
DC,0.33,0.33,0.33
DE,0.33,0.33,0.33

Good luck!

If you want to get started, check out our benchmark blog post, which will walk through making predictions and submissions for 2016 and 2020.

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!

America's Next Top (Statistical) Model - 2020

Quick Facts

Participants

No. of Entries

Prize

Winner

Noriega-Santoyo-Castillo

Navigation

Problem description

Competition timeline

Data

The Metric

Submission Format 2016

Submission values

Submission Format 2020

Submission values

Good luck!

On this page