Lean benchmark for real-world SWE tasks.
fvspec scrapes GitHub for property-based tests in Hypothesis
and transpiles them to Lean specifications, autoformalizing each function
under test and presenting language models of the future with almost 12k
hard proof engineering challenges.
Authors: Quinn Dougherty, Mike Dodds, Max von Hippel, Hazel Shackleton
Additional acknowledgments: Ledah Casburn
We thank Evan Boehs and Jake Ginesin for working on the scraper, Juan CastaƱo for setting up the database and AWS instances used for scraping.
This project is funded by the Advanced Research + Invention Agency (ARIA).
Dual licensed under MIT or Apache 2.0 at your choice.