What's Up, Docs? Document Summarization with LLMs

Looking for a great way to start working with LLMs? See if you can summarize research papers from an open archive of the social sciences. #development #science

beginner practice
11 months left
96 joined

About the Data

The data for this competition come from SocArXiv, a moderated repository for scientific documents in the social sciences hosted on the OSF preprints. (It's named after arXiv). Repositories like SocArXiv enable scientists to share their research outside of traditional, publisher-managed outlets. SocArXiv is referred to as a preprint repository, but it includes a variety of document types:

  • working papers, manuscript drafts uploaded for early circulation and feedback
  • preprints, completed manuscript drafts that have not undergone formal peer-review
  • post-prints, author-formatted final or near-final versions of published, peer-reviewed articles

All the bodies of the papers were modified in an effort to remove the abstract text, references, and other text that would be either irrelevant or make the task too easy.

All papers are copyright their respective authors, were modified from their original versions, and licensed under Creative Commons Attribution 4.0 International. Full attribution information is available here.

About the Competition

This is a practice competition, so the only prize is the knowledge you gain by practicing — but oh! What a glorious prize! (And maybe you get some bragging rights for your position on the leaderboard).

Resources

The accompanying benchmark blog post is a great place to start. If you find the writing style annoying, which, like, yeah, I get that, there are a lot of other good resources around.

To learn more about working with LLMs, see:

To learn more about preprints and open access publishing, see: