We work with organizations looking to make the most out of their data.


Run a DrivenData competition

Looking to engage data scientists around the world in working on your data challenges and building top-quality solutions?

The DrivenData community includes tens of thousands of data scientists eager for real-world problems where they can practice, compete, and apply their skills for the benefit of nature and humanity.

For example:

A good model can level up what an organization is able to do, how it plans, how quickly it responds, and how it understands its problem space.

In general, there are three things we need to run a competition:

  • Problem: The best problems have a good story for how they generate impact, like more effective planning, resources saved, or people served.
  • Data: Datasets can range from structured quantitative data to text, images and video. Our data science team will take a first look and ensure there is the right data and sufficient signal for the problem at hand.
  • Funding: The value of running a competition can include building algorithms, driving engagement, learning about effective approaches, or some combination of these. Funding is required to support the cost of hosting the competition and a prize pool for rewarding the creators of top performing solutions.

Not all of these have to come from the same place (for instance, competitions may pair a data provider with a funder interested in supporting innovation in the problem domain). If you have any of the above and are thinking about using them to put on a competition, drop us a line below.

Run a custom competition

Need to run a competition with your own style or on your own servers? Take advantage of the platform we’ve spent years building and testing.

Competitions can be a great way to spark innovation, test real-world data skills, and engage internal teams or expert communities with your toughest data questions. We work with partners to set up private, white label deployments of data science competitions.

For example:

  • We partnered with BAE Systems Applied Intelligence to provide a competition infrastructure for running a series of machine learning challenges for government clients in intelligence and defense
  • We worked with Microsoft to source and run capstone competitions where data science students put their new skills to the test on real-world datasets, tackling questions like how to predict student debt

A private deployment includes all the functionality that the DrivenData competition platform already brings, including:

  • User accounts
  • Content management
  • Rules agreement
  • Team formation
  • Submission scoring
  • Live leaderboard
  • Audit trail
  • Platform security

Thinking about your own competition? Leave us a note below.

Thinking about running a competition? Let’s talk.

Competitions aren't right for every problem. For more flexible data needs or sensitive data sources, we have our own team of experienced data scientists to take the case. Learn more about working with our consulting team.

Here are some questions we often get from organizations:

What kind of organizations do you work with?

DrivenData does not have any restrictions in terms of size, mission, software tools, or database structure. We have encountered applications in fields from education to microfinance, social services to healthcare. Our goal is to partner with organizations that share our vision of using data to make the world a better place. If you're an organization excited about innovative ways to leverage your data, please get in touch.

Can my organization host a competition?

We'd love to work with you to find a problem that makes a difference to your organization. If you have data and organizational goals, we can explore how to make that data work for you. See below for information about what makes a problem a good fit for a competition.

What are some examples of competitions?

There are a wide range of problems that can be tackled with effective statistical modeling. Here are just a few examples of competitions.

Predicting Loan Defaults: Take, for example, a nonprofit microlender. Using their data describing each loan application and loan outcomes, DrivenData would run a competition to improve their impact and sustainability. A good model predicts which loans involve the most risk. A better model might be able to establish the optimal loan amount such that the probability of default is minimized. Using the winning solution, the lender can effectively decrease negative outcomes for recipients, more responsibly disburse funds, and improve its long-term ability to deliver on its mission.
Distributing Humanitarian Aid: A good model is indispensable for complex decision-making. Take an organization that provides aid for refugees from the Syrian conflict. Based on past data, we want to ask: How, where, and in what quantities should they distribute housing, food, and medical supplies in order to minimize shortfalls? This is a multifaceted problem where a good model can perform significantly better than a naive approach. Competitors build models based on the size and length of the conflict, the location of refugee camps with respect to the geopolitical borders and population centers, the history of refugee populations in other conflicts, and the demographics of the refugees in this particular region. The aid organization now has a more effective way to plan and prioritize their work.
Responding to Citizen Reports: Another example based on actual research by IBM improves upon UNICEF’s uReport, an effort in Uganda that lets citizens submit unstructured reports via SMS in order to document problems such as human rights abuses, evolving hunger or water crises, and disease outbreaks. uReport needs to be able to identify the nature and severity of reports in real time. A smart algorithm allows journalists and human rights observers to quickly determine trending problems in the hopes of responding before they grow more dire.

What kind of problem is good for a competition?

A good model can transform how an organization plans, how quickly it responds, and how it understands its problem space. In general, these types of problems have four characteristics:

Impact: The best problems have a clear win for the organization in terms of effective planning, resources saved, or people served. The ones that are most appealing to the data-science community have a good story around how they generate social impact.
Challenge: The problem needs to be challenging enough for a rich competition. For example, a set of a thousand data points where a linear regression gets most of the way there isn’t the kind of problem we can be most effective in tackling. Instead, we specialize in being able to handle many predicting variables, large numbers of data points, complex covariance, or analysis of text, images and video.
Feasibility: We will need to ensure that the organization has the right kind of data to answer the question at hand. And, if there is data, does it have enough signal to be useful? Our data science team will take a first look at the data and build benchmark solutions to the problem at hand.
Privacy: We want to make sure that we can answer this question while protecting the privacy of individuals in the data set and the operational privacy of an organization. As you would imagine, this is a common concern in the world of data science, and we utilize privatization strategies developed for these types of situations. For more information on data privacy, see below.

Have innovation competitions like this been successful in the past?

The National Institute for Health created a competition to identify an immune system deficiency through mutations in genetic data; the top 30 submissions performed 100x better than a researcher’s model and 1000x better than the NIH gold standard, all while cutting down the analysis time. Organizations that have had similar competitions include NASA, Teach for America, and DonorsChoose.org.

Other organizations have benefited from advanced statistical techniques. uReport is a UNICEF program in Uganda through which residents submit crisis notifications through SMS. Classifying the messages was a critical step in routing them to the appropriate destination for response. The algorithm developed by the organization was able to accurately classify these messages ~70% of the time. With a more sophisticated algorithm developed in partnership with IBM, they increased classification accuracy to ~85%. This dramatically changed the way uReport was able to use its automation, improving response time while saving resources spent reading and classifying incoming messages.

Tell me more about data privacy.

Privacy, both in terms of personally-identifiable information and organizational secrets, is critically important to DrivenData. For every competition we run we audit the data for security and privacy concerns. Ultimately, we believe that making data available to our competitors has the potential to vastly improve how your organization operates. The primary safeguards to consider are:

Anonymization Strategy: Driven data works with organizations to figure out the right strategy to anonymize their data. Names of individuals and entities are removed entirely, field and column names are obscured where appropriate (e.g., they become 'Feature1', 'Feature2'), traceable values are run through a one-way function, and any other identifiers (e.g., addresses) are abstracted.
Non-disclosure Rules: Competitors agree to our rules and terms of use when entering each competition. If anonymization is not enough, we can include a non-disclosure clause barring users from sharing or publicizing your data.

What do we get at the end of the competition?

There are major benefits to running a DrivenData competition instead of hosting a hackathon or hiring a consultant. Much like a hackathon, you get the power of a group of analytic minds with a variety of approaches that provide new perspectives on your data. On top of that, we work with your organization to construct a well-specified problem ahead of time. Instead of a list of brainstormed ideas, you get tested, actionable results.

Furthermore, our goal is to hand off software that is easy for an organization to use for future decision making. We are dedicated to working with our partners and the winning team to integrate the best model. Finally, unlike with a consultancy, you also get the knowledge that the winning solution has been compared against a range of heterogeneous models and outperformed the competition.