Reboot: Box-Plots for Education

We're rebooting our first prized competition for fun and education! Tag school budgets automatically to help districts get a better grasp of their spending and how to improve the impact of their scarce resources. #education

intermediate practice
dec 2022
2,591 joined

A New Look at our First Competition

We're relaunching our first competition as a way for students to learn how machine learning competitions work and, most importantly, what methods win a DrivenData competition! We're posting the data and problem so that budding data scientists can try their hand at this unique data set with a compelling use case.


Budgets for schools and school districts are huge, complex, and unwieldy. It's no easy task to digest where and how schools are using their resources. Education Resource Strategies is a non-profit that tackles just this task with the goal of letting districts be smarter, more strategic, and more effective in their spending.

Your task is a multi-class-multi-label classification problem with the goal of attaching canonical labels to the freeform text in budget line items. These labels let ERS understand how schools are spending money and tailor their strategy recommendations to improve outcomes for students, teachers, and administrators.

About the competition

In order to compare budget or expenditure data across districts, ERS assigns every line item to certain categories in a comprehensive financial spending framework. For instance, Object_Type describes what the spending "is"—Base Salary/Compensation, Benefits, Stipends & Other Compensation, Equipment & Equipment Lease, Property Rental, and so on. Other categories describe what the spending "does," which groups of students benefit, and where the funds come from.

Once this process is complete, we can finally offer cross-district insight into a partner's finances. We might observe that a particular partner spends more on facilities and maintenance than peer districts, or staffs teaching assistants more richly. These findings are not in themselves good or bad—they depend on the context, goals, and strategy of the partner district.

This task (which we call financial coding) is very time and labor-intensive. This limits our ability to provide this analysis to districts. It typically takes us several weeks to reliably code a financial file. Furthermore, the challenges of financial coding put a limit on the quality of our comparisons, since the only districts in our comparison database are those with whom we've gone through this lengthy, laborious process.

The right algorithm, paired with some human checks, will allow us to code financial files more accurately, more quickly, and more cheaply. As a result, we will be able to offer these valuable insights to many more districts at a much lower cost, greatly extending our impact. Eventually, we hope to offer a free self-service version of the algorithm through our website* which would allow any district to upload their data and receive comparisons to similar districts on a time scale of days or even hours.

Financial decisions are never easy, but they are certainly easier with an awareness of the choices of your peers. We are excited to have you with us as we build toward that world of greater financial transparency.