Post-Doctoral Research Visit F/M Benchmarks for Evaluating LLMs for Lean Elegance
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Contexte et atouts du poste
The postdoc will be within the SIERRA team, which focuses on theoretical machine learning, statistics and optimization. There will be interactions with other teams within INRIA interested in AI for maths (ARGO, PICUBE, SCOOL) as well as at ENS (CSD).
Travel, equipment, and compute expenses will be covered within reasonable limits.
Mission confiΓ©e
Proposed research subject :
Frontier models have demonstrated rapid progress in producing correct Lean code, saturating existing Lean benchmarks of advanced problems in both the IMO and Putnam competitions, and can produce Fields Medal-level formalizations of research math. However, while Lean code that type checks might be reasonably declared correct, for formalizations to be useful to humans we need to extend our assessment beyond mere correctness to code quality, such as concision, transparency, maintainability, human readability, eleg...