We work with organizations looking to build innovative solutions with data and AI.

drivendata-illustrations

Thinking about running a competition? Let’s talk.


Run a DrivenData competition

Looking to engage data scientists around the world in working on your data challenges and building top-quality solutions?

The DrivenData community includes over a hundred thousand data scientists and engineers eager for real-world problems where they can practice, compete, and apply their skills for the benefit of nature and humanity.

For example:

  • The Bureau of Reclamation and NASA ran the Snowcast Showdown competition to generate better estimates of water accumulating in snowpack across the Western US.
  • Meta AI put on the Hateful Memes challenge to push forward state-of-the-art approaches to detect hate speech in multimodal memes combining images and text.
  • Yelp sponsored the Keeping It Fresh competition using its reviews to help the City of Boston better predict public health violations at local restaurants.
  • The French Society of Pathology organized the TissueNet competition to identify potentially cancerous lesions in cervical biopsies.

A good model can level up what an organization is able to do, how it plans, how quickly it responds, and how it understands its problem space.

In general, there are three things we need to run a competition:

  • Problem: The best problems have a good story for how they generate impact, like more effective planning, resources saved, or people served.
  • Data: Datasets can range from structured quantitative data to text, images and video. Our data science team will take a first look and ensure there is the right data and sufficient signal for the problem at hand.
  • Funding: The value of running a competition can include building algorithms, driving engagement, and learning about effective approaches. Funding is required to support the cost of hosting the competition and a prize pool for rewarding the creators of top performing solutions.

Not all of these have to come from the same place (for instance, competitions may pair a data provider with a funder interested in supporting innovation in the problem domain). If you have any of the above and are thinking about using them to put on a competition, drop us a line above.


Run a custom competition

Need to run a competition with your own style or on your own servers? Take advantage of the platform we’ve spent years building and testing.

Competitions can be a great way to spark innovation, test real-world data skills, and engage internal teams or expert communities with your toughest data questions. We work with partners to set up private, white label deployments of data science competitions.

For example:

  • We partnered with BAE Systems Applied Intelligence to provide a competition infrastructure for running a series of machine learning challenges for government clients in intelligence and defense
  • We worked with Microsoft to source and run capstone competitions where data science students put their new skills to the test on real-world datasets, tackling questions like how to predict student debt

A private deployment includes all the functionality that the DrivenData competition platform already brings, including:

  • User accounts
  • Content management
  • Rules agreement
  • Team formation
  • Submission scoring
  • Live leaderboard
  • Audit trail
  • Platform security

Thinking about your own competition? Leave us a note above.


We also understand competitions aren't right for every problem. For more flexible data needs or sensitive data sources, we have our own team of experienced data scientists to take the case. Learn more about working with our consulting team.

Here are some questions we often get from organizations:


What kind of organizations do you work with?

DrivenData does not have any restrictions in terms of size, mission, software tools, or database structure. We have encountered applications in fields from education to microfinance, social services to healthcare. Our goal is to partner with organizations that share our vision of using data to make the world a better place. If you're an organization excited about innovative ways to leverage your data, please get in touch.


Can my organization host a competition?

We'd love to work with you to find a problem that makes a difference to your organization. If you have data and organizational goals, we can explore how to make that data work for you. See below for information about what makes a problem a good fit for a competition.


What are some examples of competitions?

There are a wide range of problems that can be tackled with data science and AI approaches. Our past competitions include some great examples, for instance:

Forecasting energy needs: The ability to forecast a building’s energy consumption plays a critical role in energy efficiency. Good forecasts can help implement energy-saving policies and optimize operations of chillers, boilers and energy storage systems, while also helping to flag potentially wasteful discrepancies between expected and actual energy use. Using historical measurements, Schneider Electric ran a competition to predict future energy consumption at hourly, daily, and weekly time windows. The top algorithm was able to predict consumption within 0.3% of actual recorded measures on average, using weather, holidays, and features automatically created from the data to produce the most accurate forecasts.
Photo-identification of endangered beluga whales: Cook Inlet belugas are an endangered population of beluga whales at risk for extinction. To more closely monitor their health and track individual whales, the NOAA Alaska Fishery Science Center conducts an annual photo-identification survey using drone imagery. The Where's Whale-do challenge invited participants develop computer vision models to accurately identify individual whales from past photographic images. The top approaches were built into leading tools using AI for conservation monitoring, and have also been extended to improve state-of-the-art models for identification of other wildlife species.
Surfacing trends in medical narratives: Falls are the leading cause of injury-related death among adults 65 and older. Understanding factors associated with the occurrence and severity of unintentional older adult falls helps the CDC inform solutions for injury prevention. In this challenge, participants used advances in AI/ML approaches working with unstructured data to analyze medical narratives from a sample of emergency departments across the US. Top solutions were selected by a panel of judges based on a combination of methodology and results, providing new ways of extracting insights from text data about how, when, and why older adults fall.

What kind of problem is good for a competition?

A good model can transform how an organization plans, how quickly it responds, and how it understands its problem space. In general, these types of problems have four characteristics:


Impact: The best problems have a clear win for the organization in terms of effective planning, resources saved, or people served. The ones that are most appealing to the data-science community have a good story around how they generate social impact.
Challenge: The problem needs to be challenging enough for a rich competition. For example, a set of a thousand data points where a linear regression gets most of the way there isn’t the kind of problem we can be most effective in tackling. Instead, we specialize in being able to handle many predicting variables, large numbers of data points, complex covariance, or analysis of text, images and video.
Feasibility: We will need to ensure that the organization has the right kind of data to answer the question at hand. This includes making sure the data can be released in a challenge and has enough signal to be useful for the problem. Our data science team will take a first look at the data and work with you to make sure the dataset is set up well for the task at hand.

Have innovation competitions like this been successful in the past?

The National Institute for Health created a competition to identify an immune system deficiency through mutations in genetic data; the top 30 submissions performed 100x better than a researcher’s model and 1000x better than the NIH gold standard, all while cutting down the analysis time. Organizations that have had similar competitions include NASA, Teach for America, and DonorsChoose.org.


Other organizations have benefited from advanced statistical techniques. uReport is a UNICEF program in Uganda through which residents submit crisis notifications through SMS. Classifying the messages was a critical step in routing them to the appropriate destination for response. The algorithm developed by the organization was able to accurately classify these messages ~70% of the time. With a more sophisticated algorithm developed in partnership with IBM, they increased classification accuracy to ~85%. This dramatically changed the way uReport was able to use its automation, improving response time while saving resources spent reading and classifying incoming messages.


Data science competitions have been shown to increase levels of both performance and engagement achieved for problems like these. The structure of a competition catalyzes the efforts of a global talent pool, enabling the best approaches to be compared side-by-side, iterated upon, and ultimately recognized, shared and used.


Tell me more about data privacy.

Privacy, both in terms of individuals in a dataset and operational privacy of an organization, is critically important to DrivenData. We've even hosted competitions to help develop privacy-preserving AI.


For every competition, we audit the data for security and privacy concerns, and we utilize privatization strategies developed for these types of situations to make sure competitions yield strong solutions with minimal risk. Ultimately, we believe that making data available to our competitors has the potential to vastly improve the solutions that can be developed. Two primary safeguards to consider are:


Anonymization Strategy: DrivenData works with organizations to figure out the right strategy to anonymize their data. Names of individuals and entities are removed entirely, field and column names are obscured where appropriate (e.g., they become 'Feature1', 'Feature2'), traceable values are run through a one-way function, and any other identifiers (e.g., addresses) are abstracted.
Non-disclosure Rules: Competitors agree to our rules and terms of use when entering each competition before they can access challenge data. These rules include terms prohibiting sharing or using data outside of a competition, and can be updated as appropriate for a given challenge.

What do you get at the end of the competition?

There are major benefits to running a DrivenData competition instead of hosting a hackathon or hiring a consultant. Much like a hackathon, you get the power of a group of analytic minds with a variety of approaches that provide new perspectives on your data. DrivenData's large competitor pool spans across continents and areas of expertise, making it an exceptionally powerful group to have hacking on your problem. On top of that, we work with your organization to construct a well-specified problem ahead of time. Instead of a list of brainstormed ideas, you get tested, actionable results.


Typical outputs at the end of a competition include all the code assets of winning solutions packaged in a well-organized repository, along with write-ups from the developers documenting their approaches. Where possible, these solutions are made available under an open source license for more people to use and learn from. Moreover, unlike with a consultancy, you also get the knowledge that the winning solution has been compared against a range of heterogeneous models and outperformed the competition, with empirical results for what level of performance is achievable and which types of approaches are the most effective.


In some cases, our partners will also be interested in building out, testing, or operationalizing the winning solutions. Our experienced data science team is dedicated to help carry forward competition-winning solutions, working closely with partners and winning teams, to help integrate the best solutions for applied use.