America's Next Top (Statistical) Model - 2020

US presidential elections come but once every 4 years, and this one's a big one. There are lots of people trying to predict what will happen. Can you top them? #civic

advanced practice
dec 2020
77 joined

Problem description

Competition timeline


There are two phases for this competition. PHASE I runs like every other competition with a public test set and private training set (in this case, the results of the 2016 election). During PHASE I submit your predictions using submission-format-2016.csv and your score will appear on the leaderboard.

However, the final leaderboard is not determined by your 2016 predictions. It is determined in PHASE II by your submission for 2020, which you have marked as "Final Evaluation Submission" in the submissions dialog. You will need to use submission-format-2020.csv for this submission.


Predict 2016
Predict 2020
Evaluate Predictions
Submit 2016 Predictions Submit 2020 Predictions Election Day Submissions Evaluated
Launch - 11/3/2020 Launch - 11/3/2020 11/3/2016 11/3/2020 - Returns Finalized
Train your models on 2016 data and submit predictions for the leaderboard Submit your predictions for 2020. You can make your submission at any point. You must have your predictions submitted We update the scores as election results come in. Scores will change until the vote counts are finalized. Final winner is determined based on 2020 election results.

Data


We don't provide any one set of feature data that you must use. Instead, you can use any publicly available data to make your predictions. For some suggestions, check out the resources page!

The Metric


Submissions will be scored according to the root mean squared error metric, which will quantify how far from the true vote shares your regression predictions are. This metric is implemented in Scikit Learn, but you'll need to set the squared parameter to False. See the submissions page for more information.

Submission Format 2016


The goal is to predict the fraction of voters that each candidate gets in each state. Fractions should be represented as float values between 0 and 1. You can see an example of the format that your submission must conform to (including headers and row names) in submission-format-2016.csv.

Submission values

Again, you must submit vote fractions between 0 and 1 for each state.
Clinton Trump Other
state_abbreviation
AK 0.33 0.33 0.33
AL 0.33 0.33 0.33
AR 0.33 0.33 0.33
AZ 0.33 0.33 0.33
CA 0.33 0.33 0.33

As a CSV, those submissions might look like:

state_abbreviation,Clinton,Trump,Other
AK,0.33,0.33,0.33
AL,0.33,0.33,0.33
AR,0.33,0.33,0.33
AZ,0.33,0.33,0.33
CA,0.33,0.33,0.33
CO,0.33,0.33,0.33
CT,0.33,0.33,0.33
DC,0.33,0.33,0.33
DE,0.33,0.33,0.33

Submission Format 2020


The goal is to predict the fractions of voters that each candidate gets in each state. Fractions should be represented as float values between 0 and 1. You can see an example of the format that your submission must conform to (including headers and row names) in submission-format-2020.csv.

Submission values

Again, you must submit vote fractions between 0 and 1 for each state.
Biden Trump Other
state_abbreviation
AK 0.33 0.33 0.33
AL 0.33 0.33 0.33
AR 0.33 0.33 0.33
AZ 0.33 0.33 0.33
CA 0.33 0.33 0.33

As a CSV, those submissions might look like:

state_abbreviation,Biden,Trump,Other
AK,0.33,0.33,0.33
AL,0.33,0.33,0.33
AR,0.33,0.33,0.33
AZ,0.33,0.33,0.33
CA,0.33,0.33,0.33
CO,0.33,0.33,0.33
CT,0.33,0.33,0.33
DC,0.33,0.33,0.33
DE,0.33,0.33,0.33

Good luck!


If you want to get started, check out our benchmark blog post, which will walk through making predictions and submissions for 2016 and 2020.

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!