EACL 2023

BERT is not The Count: Learning to Match Mathematical Statements with Proofs

Weixian Waylon Li · Yftah Ziser · Maximin Coavoux · Shay B. Cohen

University of Edinburgh · University of Copenhagen

MATcH studies whether models can align mathematical statements with their proofs, and how much they rely on notation overlap instead of deeper mathematical evidence.

Statement

If a function is continuous on a compact metric space, then it is uniformly continuous.

Proof

Use local continuity, extract a finite subcover, and choose a common radius.

The task asks a model to recover the correct proof for each mathematical statement from a candidate collection.

Abstract

MATcH introduces a task for matching mathematical statements to their corresponding proofs. The dataset contains more than 180k statement-proof pairs extracted from modern mathematical research articles based on the MREC corpus, making it a large-scale benchmark for mathematical information retrieval.

The paper studies bilinear similarity models, local and global decoding, ScratchBERT, MathBERT, and symbol replacement procedures. The results show that global decoding improves assignment coherence, while symbol replacement reveals that models can make relatively shallow use of mathematical text and formulae.

Task and model

MATcH frames proof retrieval as a structured assignment problem over statement-proof scores.

01

Bilinear scoring

Encoded statement and proof representations are compared with a trainable bilinear association function.

02

Local ranking

Each statement independently ranks candidate proofs according to its row of the score matrix.

03

Global assignment

A one-to-one constraint prevents many statements from selecting the same proof.

Dataset and analysis

The dataset includes multiple splits and symbol replacement levels for measuring robustness beyond shared notation.

180k+ Statement-proof pairs

Extracted from professional mathematical writing in arXiv research articles.

74.68 Accuracy

ScratchBERT with local training and global decoding on conservation data.

57.69 Full replacement

Accuracy under full symbol replacement with ScratchBERT local-global decoding.

Mixed split

Pairs from the same article can span train, development, and test.

Unmixed split

Pairs from the same article appear in only one split, creating a harder benchmark.

Symbol variants

Conservation, partial, full, and transposition settings stress-test notation use.

MATcH prediction example before aggressive symbol replacement.
Prediction example before aggressive symbol replacement.
MATcH prediction example after full symbol replacement.
Prediction example under full symbol replacement.

Citation

@inproceedings{li-etal-2023-bert,
  title = {BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs},
  author = {Li, Weixian Waylon and Ziser, Yftah and Coavoux, Maximin and Cohen, Shay B.},
  booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics},
  year = {2023},
  address = {Dubrovnik, Croatia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.eacl-main.260/},
  doi = {10.18653/v1/2023.eacl-main.260},
  pages = {3581--3593}
}