United Nations Millennium Development Goals

The UN's Millennium Development Goals provide the big-picture perspective on international development. Using indicators aggregated and collected by the World Bank, try to predict progress towards select MDGs. #development

intermediate practice
Completed feb 2021
3,270 joined

Since its founding in 1944, the World Bank has been gathering data to help it alleviate poverty by focusing on foreign investment, international trade, and capital investment. The World Bank provides these data to the public through their data portal.

Training data

We've aggregated their data from 1972-2007 on over 1200 macroeconomic indicators in 214 countries around the world. A random snapshot of the data looks like the below. Each row represents a timeseries for a specific indicator and country. The row has an id, a country name, a series code, a series name, and data for the years 1972 - 2007.

1972 [YR1972] ... 2007 [YR2007] Series Name Country Name Series Code
97510 NaN ... 19 Time to export (days) Ghana IC.EXP.DURS
16297 NaN ... 0 Currency composition of PPG debt, Pound sterli... Azerbaijan DT.CUR.UKPS.ZS
34357 186000.0000 ... 0 PPG, bonds (TDS, current US$) Botswana DT.TDS.PBND.CD
126538 NaN ... 54 Newborns protected against tetanus (%) Jamaica SH.VAC.TTNS.ZS
30573 NaN ... NaN Secondary education, teachers Bhutan SE.SEC.TCHR
126818 107.3836 ... NaN School enrollment, primary, male (% gross) Jamaica SE.PRM.ENRR.MA
101060 NaN ... NaN Net bilateral aid flows from DAC donors, New Z... Grenada DC.DAC.NZLL.CD
18552 NaN ... NaN Self-employed, total (% of total employed) Bahamas, The SL.EMP.SELF.ZS

A word of warning: It's hard to reliably gather these data all over the world for such a long period of time. There is a fair amount of missing data from the training dataset, and competitors will have to devise strategies for dealing with the missing data. Missing values are labeled with NaN.

Prediction data and submission format

We're not interested in predicting all of these timeseries--just the ones that are relevant to the Millenium Development Goals. There are a set of indicators from the World Bank dataset that represent our progress towards these goals. We're withholding the names and codes of the World Bank indicators we want to predict, since the data is readily available publicly. However, these all have a series code labeled with the MDG goal and sub-goal we are interested in (e.g., 1.2 or 3.1).

We've also taken the subset of predictions that we can reliably make. We first made sure that we only looked at goals where we had true measures to compare them against in the forecast years (2008, 2012). We also removed rows that didn't have any data before 2008, since these will be impossible to predict.

We provide a set of the labels (along with matching indices) for which we want preditions. You should add two columns to this file with your predictions for the years 2008 and 2012 as below.

2008 [YR2008] 2012 [YR2012]
559 NaN NaN
618 NaN NaN
753 NaN NaN
1030 NaN NaN
1896 NaN NaN
1955 NaN NaN
2090 NaN NaN
2690 NaN NaN
3233 NaN NaN
3292 NaN NaN

Good luck!

Looking for a great tutorial to get you started? Check out the notebook walkthrough created for this challenge.

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!