mstdn.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A general-purpose Mastodon server with a 500 character limit. All languages are welcome.

Administered by:

Server stats:

14K
active users

#Leanprover

20 posts9 participants4 posts today

The Department of Computer Science, University of Oxford has released recordings of the recent Strachey Series Lectures featuring Leo de Moura and Kevin Buzzard:

1️⃣ "Formalizing the Future: Lean's Impact on Mathematics, Programming, and AI" - Leo de Moura, Chief Architect of Lean

Leo discusses how Lean provides a framework for machine-checkable mathematical proofs and code verification, enabling collaboration between mathematicians, software developers, and AI systems. He also outlines the work the Lean Focused Research Organization does to expand Lean’s capabilities and support the community.

➡️ Watch Leo's lecture here: podcasts.ox.ac.uk/formalizing-

2️⃣ "Will Computers Prove Theorems?" with Kevin Buzzard, Professor of Mathematics, Imperial College

Kevin examines the potential for AI systems and theorem provers to assist in mathematical discovery, addressing whether computers might someday find patterns in mathematics that humans have missed, and discusses the integration of language models with formal verification systems.

➡️ Watch Kevin's lecture here: podcasts.ox.ac.uk/will-compute

podcasts.ox.ac.ukFormalizing the Future: Lean’s Impact on Mathematics, Programming, and AIHow can mathematicians, software developers, and AI systems work together with complete confidence in each other’s contributions? The open-source Lean proof assistant and programming language provides an answer, offering a rigorous framework where proofs and programs are machine-checkable, shared, and extended by a broad community of collaborators. By removing the traditional reliance on trust-based verification and manual oversight, Lean not only accelerates research and development but also redefines how we collaborate.

The Lean FRO team met synchronously in Amsterdam last week for our annual team retreat, and to discuss upcoming work and our Year 3 roadmap! 🇳🇱✨

We had very productive discussions around Lean's future in mathematics, software and hardware verification, and AI for math. It was energizing to see our team's commitment to Lean's continued growth in each of these domains.

We're cooking up many exciting developments that will support both our mathematical community and our growing base of software verification users. Stay tuned for our full Y3 roadmap publication at the end of July!

Lean Copilot: Large language models as copilots for theorem proving in Lean. ~ Peiyang Song, Kaiyu Yang, Anima Anandkumar. arxiv.org/abs/2404.12534 #LLMs #ITP #LeanProver

arXiv logo
arXiv.orgLean Copilot: Large Language Models as Copilots for Theorem Proving in LeanNeural theorem proving combines large language models (LLMs) with proof assistants such as Lean, where the correctness of formal proofs can be rigorously verified, leaving no room for hallucination. With existing neural theorem provers pretrained on a fixed collection of data and offering valuable suggestions at times, it is challenging for them to continually prove novel theorems in a fully autonomous mode, where human insights may be critical. In this paper, we explore LLMs as copilots that assist humans in proving theorems. We introduce Lean Copilot, a general framework for running LLM inference natively in Lean. It enables programmers to build various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users. Lean users can use our pretrained models or bring their own ones that run either locally (with or without GPUs) or on the cloud. Using Lean Copilot, we build LLM-based tools that suggest proof steps, complete proof goals, and select relevant premises. Experimental results on the Mathematics in Lean textbook demonstrate the effectiveness of our method compared to existing rule-based proof automation in Lean (aesop). When assisting humans, Lean Copilot requires only 2.08 manually-entered proof steps on average (3.86 required by aesop); when automating the theorem proving process, Lean Copilot automates 74.2% proof steps on average, 85% better than aesop (40.1%). We open source all code and artifacts under a permissive MIT license to facilitate further research.

APOLLO: Automated LLM and Lean collaboration for advanced formal reasoning. ~ Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh. arxiv.org/abs/2505.05758 #LLMs #ITP #LeanProver

arXiv logo
arXiv.orgAPOLLO: Automated LLM and Lean Collaboration for Advanced Formal ReasoningFormal reasoning and automated theorem proving constitute a challenging subfield of machine learning, in which machines are tasked with proving mathematical theorems using formal languages like Lean. A formal verification system can check whether a formal proof is correct or not almost instantaneously, but generating a completely correct formal proof with large language models (LLMs) remains a formidable task. The usual approach in the literature is to prompt the LLM many times (up to several thousands) until one of the generated proofs passes the verification system. In this work, we present APOLLO (Automated PrOof repair via LLM and Lean cOllaboration), a modular, model-agnostic pipeline that combines the strengths of the Lean compiler with an LLM's reasoning abilities to achieve better proof-generation results at a low sampling budget. Apollo directs a fully automated process in which the LLM generates proofs for theorems, a set of agents analyze the proofs, fix the syntax errors, identify the mistakes in the proofs using Lean, isolate failing sub-lemmas, utilize automated solvers, and invoke an LLM on each remaining goal with a low top-K budget. The repaired sub-proofs are recombined and reverified, iterating up to a user-controlled maximum number of attempts. On the miniF2F benchmark, we establish a new state-of-the-art accuracy of 75.0% among 7B-parameter models while keeping the sampling budget below one thousand. Moreover, Apollo raises the state-of-the-art accuracy for Goedel-Prover-SFT to 65.6% while cutting sample complexity from 25,600 to a few hundred. General-purpose models (o3-mini, o4-mini) jump from 3-7% to over 40% accuracy. Our results demonstrate that targeted, compiler-guided repair of LLM outputs yields dramatic gains in both efficiency and correctness, suggesting a general paradigm for scalable automated theorem proving.