This is where you'll find all of the documentation about this dataset and the problem we are trying to solve. For this competition, there are three subsections to the problem description:
The features in this dataset
Your goal is to predict the probability that a certain label is attached to a budget line item. Each row in the budget has mostly free-form text features, except for the two below that are noted as float. Any of the fields may or may not be empty
FTEfloat - If an employee, the percentage of full-time that the employee works.
Facility_or_Department- If expenditure is tied to a department/facility, that department/facility.
Function_Description- A description of the function the expenditure was serving.
Fund_Description- A description of the source of the funds.
Job_Title_Description- If this is an employee, a description of that employee's job title.
Location_Description- A description of where the funds were spent.
Object_Description- A description of what the funds were used for.
Position_Extra- Any extra information about the position that we have.
Program_Description- A description of the program that the funds were used for.
SubFund_Description- More detail on
Sub_Object_Description- More detail on
Text_1- Any additional text supplied by the district.
Text_2- Any additional text supplied by the district.
Text_3- Any additional text supplied by the district.
Text_4- Any additional text supplied by the district.
Totalfloat - The total cost of the expenditure.
Feature data example
For example, a single row in the dataset might have these values:
|Function_Description||Care and Upkeep of Building Services|
|Location_Description||BUILDING OPERATIONS SECTION|
|SubFund_Description||Operation and Maintenance of Plant Services|
The labels in this dataset
For each of these rows, ERS attaches one label from each of 9 different categories:
Career & Academic Counseling
Data Processing & Information Services
Development & Fundraising
Extended Time & Tutoring
Facilities & Maintenance
Finance, Budget, Purchasing & Distribution
Instructional Materials & Supplies
Library & Media
Parent & Community Relations
Physical Health & Services
Research & Accountability
Security & Safety
Social & Emotional
Special Population Program Management & Support
Untracked Budget Set-Aside
Equipment & Equipment Lease
Travel & Conferences
Operating, Not PreK-12
Leadership & Management
School on Central Budgets
Pupil Services & Enrichment
Untracked Budget Set-Aside
Note, there is a hierarchical relationship for these labels. If a line is marked as
Non-Operating in the
Operating_Status category, then all of the other labels should be marked as
NO_LABEL since ERS does not analyze and compare non-operating budget items.
The row in the example above would have the following labels attached to it:
|Function||Facilities & Maintenance|
Your goal is to predict a probability for each possible label in the dataset given a row of new data. Each of these probabilities goes in a separate column in the submission file. The submission must be
50064x104 where 50064 is the number of rows in the test dataset (excluding the header) and 104 is the number of columns (excluding a first column of row ids). The columns in the submission have the format
ColumnName__PossibleLabel, which we have listed below for your convenience. This is simply a flattening of the labels that we listed above.
Function__Career & Academic Counseling
Function__Data Processing & Information Services
Function__Development & Fundraising
Function__Extended Time & Tutoring
Function__Facilities & Maintenance
Function__Finance, Budget, Purchasing & Distribution
Function__Instructional Materials & Supplies
Function__Library & Media
Function__Parent & Community Relations
Function__Physical Health & Services
Function__Research & Accountability
Function__Security & Safety
Function__Social & Emotional
Function__Special Population Program Management & Support
Function__Untracked Budget Set-Aside
Object_Type__Equipment & Equipment Lease
Object_Type__Travel & Conferences
Operating_Status__Operating, Not PreK-12
Sharing__Leadership & Management
Sharing__School on Central Budgets
Use__Pupil Services & Enrichment
Use__Untracked Budget Set-Aside
Again, you must submit a probability for each label. Valid probabilities are in the range [0, 1].
|Function__Aides Compensation||Function__Career & Academic Counseling||Function__Communications||...||Use__O&M||Use__Pupil Services & Enrichment||Use__Untracked Budget Set-Aside|