fvspec

Lean benchmark for real-world SWE tasks.

fvspec scrapes GitHub for property-based tests in Hypothesis and transpiles them to Lean specifications, autoformalizing each function under test and presenting language models of the future with almost 12k hard proof engineering challenges.

Acknowledgments

Authors: Quinn Dougherty, Mike Dodds, Max von Hippel, Hazel Shackleton

Additional acknowledgments: Ledah Casburn

We thank Evan Boehs and Jake Ginesin for working on the scraper, Juan CastaƱo for setting up the database and AWS instances used for scraping.

This project is funded by the Advanced Research + Invention Agency (ARIA).

Dual licensed under MIT or Apache 2.0 at your choice.