The Model Solving Geometry Problems at the Level of a Math Olympiad Gold Medalist

DeepMind’s AlphaGeometry represents another breakthrough in AI reasoning.

Created Using DALL-E

Next Week in The Sequence:

  • Edge 365: Our series about LLM reasoning continues with the famous ReAct technique including a review of the original paper by Google Research. We also explore Helicone to monitor LLMs.

  • Edge 366: Reviews COSP and USP: Google Research New Methods to Advance Reasoning in LLMs

You can subscribe below!

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: The Model Solving Geometry Problems at the Level of a Math Olympiad Gold Medalist

A few months ago, the International Mathematical Olympiad announced the AIMO Prize, a $10 million award for an AI model that can achieve a gold medal in an International Math Olympiad (IMO). IMOs are elite high school competitions where the top six students from each participating country must answer six different questions over two days, with a four-hour time limit each day. Some of the most renowned mathematicians of the past few decades have been medalists in IMO competitions. Geometry, an important and one of the hardest aspects of IMO tests, combines visual and mathematical challenges. We might intuitively think that this would be the hardest type of problem for AI models to solve.

Well, not anymore.

Last week, Google DeepMind published a paper unveiling AlphaGeometry, a model capable of solving geometry problems at the level of an IMO gold medalist.

The most interesting aspect of AlphaGeometry is its architecture, which combines a Large Language Model (LLM) with a symbolic model. Neuro-symbolic architectures have long attempted to bridge the gap between the two most established machine learning schools: neural networks and rule-based models. While LLMs excel at identifying patterns in data and reasoning through problems, they struggle with the systematic, multi-step reasoning required in complex geometry problems. Symbolic models, which solve problems using rules, can only operate in very constrained settings. How did AlphaGeometry apply neuro-symbolic models to geometry? The model, based on an LLM and a symbolic rules engine, first uses the symbolic model to attempt a solution. If unsuccessful, the LLM suggests new constructs that open new reasoning paths for the symbolic model. This is an oversimplification, but this is a short editorial after all. 😉

In a benchmark test of 30 IMO problems, AlphaGeometry solved 25 within the standard time limits. This achievement is nothing short of remarkable. Google DeepMind continues to impress in this field. Just a few weeks ago, they unveiled FunSearch, capable of discovering new algorithms in math and computer science. Now, with AlphaGeometry solving IMO-caliber geometry problems, one wonders what could be next?”

🔎 ML Research

AlphaGeometry

Google DeepMind published a paper detailing AlphaGeometry, a model that is able to solve geometry problems at the math olympiad level. The model combines a neural language model and rule-based deduction engine —> Read more.

TrustLLM

Researchers from top universities and tech companies published a comprehensive study of trustworthiness in LLMs. The paper includes a framework that quantifies trustworthiness in LLMs across five different dimensions —> Read more.

LLMs Self-Correcting Mistakes

Google Research published a paper that tests LLMs in mistake findings and correction. The paper also introduces a new benchmark for mistake identification —> Read more.

Training on Easy Data

Researchers from the Allen Institute for AI(AI2) published a paper outlining the thesis that LLMs can perform well in highly specialized takss while training on “easy” data in that domain. By “easy”, AI2 refers to data that is accesible but its enough for the models to generalize —> Read more.

Selective Prediction in LLMs

Google Research published a paper introducing ASPIRE, a framework for improve the confidence of LLM answers. The method is based on a selective prediction technique that assigns a confidence score to each answer that indicates the probability that the answer is correct —> Read more.

SGLang

UC Berkeley published Structured Generation Language(SGLang) for LLM, a technique for faster and more expressive LLM inference. SGLang combines both frontend and backend optimizations that enable the creation of complex LLM programs —> Read more.

🤖 Cool AI Tech Releases

Stable Code 3B

Stability AI open sourced Stable Code 3B, a new coding model that matches the performance of models 2.5x larger —> Read more.

Pinecone Serverless

The leading vector database provider released a new version of its platform with a simpler interface and a 50x cost reduction —> Read more.

DataStax RAG API

DataStax unveiled a new Data API to streamline the development of RAG applications —> Read more.

🛠 Real World ML

GitHub and AI

GitHub published the results of detailed interviews about the productivity impact that its AI tools is having in developers —> Read more.

LinkedIn Gen AI Playbook

LinkedIn shared some of the ideas that its engineering leaders are evaluating to fully leverage the advancements in generative AI —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.