If you have hard questions and data you think could be used to solve them, or if you want to chat with us to see how your data could be used to improve your impact, give us a shout using the contact form below.
Need to run a competition with your own style, branding, or on your own servers? Get in touch:
Competitions aren't right for every problem. For more flexible data needs or sensitive data sources, we have our own team of experienced data scientists to take the case. Learn more about working with our consulting team.
DrivenData does not have any restrictions in terms of size, mission, software tools, or database structure. We have encountered applications in fields from education to microfinance, social services to healthcare. Our goal is to partner with organizations that share our vision of using data to make the world a better place. If you're an organization excited about innovative ways to leverage your data, please get in touch.
We'd love to work with you to find a problem that makes a difference to your organization. If you have data and organizational goals, we can explore how to make that data work for you. See below for information about what makes a problem a good fit for a competition.
There are a wide range of problems that can be tackled with effective statistical modeling. Here are just a few examples of competitions.
Predicting Loan Defaults: Take, for example, a nonprofit microlender. Using their data describing each loan application and loan outcomes, DrivenData would run a competition to improve their impact and sustainability. A good model predicts which loans involve the most risk. A better model might be able to establish the optimal loan amount such that the probability of default is minimized. Using the winning solution, the lender can effectively decrease negative outcomes for recipients, more responsibly disburse funds, and improve its long-term ability to deliver on its mission.
Distributing Humanitarian Aid: A good model is indispensable for complex decision-making. Take an organization that provides aid for refugees from the Syrian conflict. Based on past data, we want to ask: How, where, and in what quantities should they distribute housing, food, and medical supplies in order to minimize shortfalls? This is a multifaceted problem where a good model can perform significantly better than a naive approach. Competitors build models based on the size and length of the conflict, the location of refugee camps with respect to the geopolitical borders and population centers, the history of refugee populations in other conflicts, and the demographics of the refugees in this particular region. The aid organization now has a more effective way to plan and prioritize their work.
Responding to Citizen Reports: Another example based on actual research by IBM improves upon UNICEF’s uReport, an effort in Uganda that lets citizens submit unstructured reports via SMS in order to document problems such as human rights abuses, evolving hunger or water crises, and disease outbreaks. uReport needs to be able to identify the nature and severity of reports in real time. A smart algorithm allows journalists and human rights observers to quickly determine trending problems in the hopes of responding before they grow more dire.
A good model can transform how an organization plans, how quickly it responds, and how it understands its problem space. In general, these types of problems have four characteristics:
Impact: The best problems have a clear win for the organization in terms of effective planning, resources saved, or people served. The ones that are most appealing to the data-science community have a good story around how they generate social impact.
Challenge: The problem needs to be challenging enough for a rich competition. For example, a set of a thousand data points where a linear regression gets most of the way there isn’t the kind of problem we can be most effective in tackling. Instead, we specialize in being able to handle many predicting variables, large numbers of data points, complex covariance, or analysis of text, images and video.
Feasibility: We will need to ensure that the organization has the right kind of data to answer the question at hand. And, if there is data, does it have enough signal to be useful? Our data science team will take a first look at the data and build benchmark solutions to the problem at hand.
Privacy: We want to make sure that we can answer this question while protecting the privacy of individuals in the data set and the operational privacy of an organization. As you would imagine, this is a common concern in the world of data science, and we utilize privatization strategies developed for these types of situations. For more information on data privacy, see below.
The National Institute for Health created a competition to identify an immune system deficiency through mutations in genetic data; the top 30 submissions performed 100x better than a researcher’s model and 1000x better than the NIH gold standard, all while cutting down the analysis time. Organizations that have had similar competitions include NASA, Teach for America, and DonorsChoose.org.
Other organizations have benefited from advanced statistical techniques. uReport is a UNICEF program in Uganda through which residents submit crisis notifications through SMS. Classifying the messages was a critical step in routing them to the appropriate destination for response. The algorithm developed by the organization was able to accurately classify these messages ~70% of the time. With a more sophisticated algorithm developed in partnership with IBM, they increased classification accuracy to ~85%. This dramatically changed the way uReport was able to use its automation, improving response time while saving resources spent reading and classifying incoming messages.
Privacy, both in terms of personally-identifiable information and organizational secrets, is critically important to DrivenData. For every competition we run we audit the data for security and privacy concerns. Ultimately, we believe that making data available to our competitors has the potential to vastly improve how your organization operates. The primary safeguards to consider are:
Anonymization Strategy: Driven data works with organizations to figure out the right strategy to anonymize their data. Names of individuals and entities are removed entirely, field and column names are obscured where appropriate (e.g., they become 'Feature1', 'Feature2'), traceable values are run through a one-way function, and any other identifiers (e.g., addresses) are abstracted.
There are major benefits to running a DrivenData competition instead of hosting a hackathon or hiring a consultant. Much like a hackathon, you get the power of a group of analytic minds with a variety of approaches that provide new perspectives on your data. On top of that, we work with your organization to construct a well-specified problem ahead of time. Instead of a list of brainstormed ideas, you get tested, actionable results.
Furthermore, our goal is to hand off software that is easy for an organization to use for future decision making. We are dedicated to working with our partners and the winning team to integrate the best model. Finally, unlike with a consultancy, you also get the knowledge that the winning solution has been compared against a range of heterogeneous models and outperformed the competition.