Alex Gu

I am a PhD student at MIT advised by the wonderful Armando Solar-Lezama. I also did my bachelor's and Master's degrees at MIT, advised by Armando and Jacob Andreas. I have done internships at Meta AI Research, Jane Street, and pony.ai. My research is generously funded by the NSF GRFP.

My research interests are to improve the capabilities of AI systems on tasks like programming and mathematics through methods such as neurosymbolic programming. The best part of doing a PhD is that I get to learn from all kinds of people, so feel free to reach out if you want to chat or collaborate! Bandwidth permitting, I am also happy to mentor undergraduate students who are interested in my research and provide advice for people who would benefit from my experiences.

Outside research, I love music. I studied piano for over 15 years under inspiring pianists Yukiko Sekino and Chun-Chi An and received a music minor in my undergraduate. As a highlight, I performed the first movement of Rachmaninoff's 2nd Piano Concerto with the MIT Symphony Orchestra. I occasionally perform piano recitals and rarely make EDM-like tracks and remixes.

Email: gua [at] mit.edu

Google Scholar / Twitter / SoundCloud / YouTube / WeChat

News

[Jun 2025] Our paper on challenges and future directions in AI for Software Engineering is accepted to ICML 2025!

[Apr 2025] Zhaoyu Li and I are hosting a social on AI for Mathematics and Theorem Proving at ICLR 2025!

[Mar 2025] I started an internship at Meta in Menlo Park working on AI and Lean with Aram Markosyan.

Publications

Challenges and Paths Towards AI for Software Engineering
ICML, 2025 Alex Gu, Naman Jain, Wen-Ding Li, Manish Shetty, Yijia Shao, Ziyang Li, Diyi Yang, Kevin Ellis, Koushik Sen, Armando Solar-Lezama

In this paper, we first discuss challenges in today's AI systems for software engineering. Then, we propose a set of promising future research directions to address these challenges.

Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi, Clara Mohri, David Brandfonbrener, Alex Gu, Nikhil Vyas, Nikhil Anand, David Alvarez-Melis, Yuanzhi Li, Sham M. Kakade, Eran Malach
ICLR, 2025

In this paper, we compare MoEs and dense transformers. Our main finding is that as the number of experts increases at constant active parameter count, memorization performance increases while reasoning capabilities saturate.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain, King Han, Alex Gu*, Wen-Ding Li*, Fanjia Yan*, Tianjun Zhang*, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica
ICLR, 2025

LiveCodeBench is a holistic and contamination-free benchmark for code LMs. It consists of 4 tasks: code generation, self-repair, code execution, and test output prediction. We update the benchmark periodically with new high-quality problems from platforms like LeetCode, AtCoder, and Codeforces.

🕵️ The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?
Alex Gu, Wen-Ding Li*, Naman Jain*, Theo X. Olausson*, Celine Lee*, Koushik Sen, Armando Solar-Lezama
ACL Findings, 2024

In The Counterfeit Conundrum, we analyze the ability of open code language models to understand their counterfeit samples. These are samples that 1) have a high enough log-probability to be generated at a moderate temperature, 2) are incorrect, but 3) pass weak correctness checks. We find that open code LMs 1) think counterfeits are correct, 2) execute them as if they were correct, and 3) can't repair them without feedback.

StarCoder2 and The Stack v2: The Next Generation
Anton Lozhkov, ..., Alex Gu, ..., Leandro von Werra*, Harm de Vries*

We train new models with 3B, 7B, and 15B on Software Heritage source code including 619 programming languages and high-quality data sources like GitHub pull requests, Kaggle notebooks, and code documentation. Our models outperform most similarly sized models on a variety of benchmarks.

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida I. Wang
ICML, 2024; DMLR & DPFM Workshops @ ICLR 2024

CRUXEval is a benchmark of 800 Python functions and input-output pairs designed to test the ability of code LMs on code reasoning, understanding, and execution. We find that despite being trained on 100G of Python code and 1T of code data, models like Code Llama fail over half the time at simple execution prediction and code reasoning!

Language Agnostic Code Embeddings
Saiteja Utpala, Alex Gu, Pin Yu Chen
NAACL, 2024

We apply a method used in multilingual NLP on code, showing that vector embeddings for code can be decomposed into syntax-like and semantic-like components. We show that when removing the syntax-like component, language identification becomes expectedly difficult, and Text2Code and Code2Code retrieval performance improves.

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
Alex Gu*, Theo X. Olausson*, Benjamin Lipkin*, Cedegao E. Zhang*, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy
EMNLP 2023, Outstanding Paper Award in Commonsense and Reasoning

We propose, LINC, a neurosymbolic approach to logical reasoning from natural language where the LLM acts as an autoformalizer to first-order logic and a logic theorem prover makes a deduction. We also qualitatively compare LINC to chain of thought, showing they make different mistakes and thus are complimentary.

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar
NeurIPS Datasets and Benchmarks Track 2023, Oral presentation

We release LeanDojo: an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant. LeanDojo contains 1) tools for data extraction and interaction with Lean, 2) fine-grained annotations of where lemmas are used and defined, 3) a new benchmark of 97K human-written theorems from mathlib, and 4) a retrieval-augmented theorem prover using retrieval for relevant premise selection.

💫 StarCoder: may the source be with you!
Raymond Li, ..., Alex Gu, ..., Leandro von Werra*, Harm de Vries*
TMLR, 2023

StarCoder and StarCoderBase are 15.5B parameter models with 8K context length, infilling capabilities, and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder.

Pruning CodeBERT for Improved Code-to-Text Efficiency [Coming Soon!]
Alex Gu, Ria Sonecha, Saaketh Vedantam, Bharat Runwal, Diganta Misra
Workshop on Sparsity in Neural Networks, ICLR 2023

We do some very preliminary experiments on pruning the encoder piece of CodeBERT. Preprint coming soon, but if you're interested in this topic, please reach out!

Min-Max Bilevel Multi-objective Optimization with Applications in Robust Machine Learning
Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng
ICLR 2023 [Video from ICML 2021 workshop]

We extend standard single-objective bilevel optimization to a min-max multi-objective framework that aims to minimize the worst-case loss of all tasks. Our main result is theoretical: we introduce a new algorithm (MORBiT) for our framework and show a convergence result. We also highlight applications in representation learning and hyperparameter optimization.

🎅SantaCoder: Don't Reach for the Stars!🌟
Loubna Ben Allal*, Raymond Li*, Denis Kocetkov*, ..., Alex Gu, ..., Leandro von Werra*
Deep Learning for Code Workshop, ICLR 2023 [Best Paper]

The BigCode project is an open-scientific collaboration working on the responsible open-source development of large language models for code. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack, and our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling.

ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications
Alex Gu, Tamara Mitrovska, Daniela Velez, Jacob Andreas, Armando Solar-Lezama
Deep Learning for Code Workshop, ICLR 2023

ObSynth leverages domain knowledge embedded in GPT-3 to help users design object models from high level natural language prompts. We synthesize object names, field names, field types, method names, and relationships between objects. Also, we conduct a user study to highlight how users may interact with ObSynth to design a restaurant management application.

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Satyapriya Krishna*, Tessa Han*, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju
Workshop on Trust and Reliance in AI-Human Teams, CHI 2022

We introduce the disagreement problem in explainable machine learning, showing that commonly used algorithms including LIME, SHAP, and SmoothGrad often disagree in practice. We also conduct a user study highlighting that practitioners do not have good ways of resolving these disagreements in their day to day.

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates
Alp Yurtsever*, Alex Gu*, Suvrit Sra
NeurIPS 2021 [Video]

Standard three operator splitting minimizes the sum of three convex functions f(x) + g(x) + h(x), where f is smooth and the prox operators of g and h are computable. We focus on three settings: (i) f is nonsmooth, (ii) we only have noisy gradients and subgradients of f, (iii) an adaptive setting where smoothness properties of f are unknown.

Certified Interpretability Robustness for Class Activation Mapping
Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel
Workshop on Machine Learning for Autonomous Driving, NeurIPS 2020 [Video]

Our algorithm, CORGI, computes certified bounds on an interpretability algorithm known as CAM (Class Activation Mapping), showing that within the certified radius, the top-K pixels of the CAM map do not change.

Invited Talks

If you find my work interesting, I am happy to give a talks to anyone who is interested!

April 2025: AI for Software Engineering: Where are we now, and what lies ahead? at ICLR DL4C Workshop(Singapore)
November 2024: Beyond Code Generation: Reasoning about Code and Reasoning with Code at ML Foundations Reading Group (Tsinghua, virtual)
October 2024: LeanDojo: Theorem Proving with Retrieval-Augmented Language Models at Mathematics and Machine Learning Program (Harvard, virtual)
March 2024: Beyond Code Generation: Do code language models understand programs the same way humans do? at Lei Lab (CMU, virtual), Cornell Tech, and Columbia PL/SE seminar
February 2024: CRUXEval: Code Reasoning, Understanding, and Execution Evaluation at AGI Leap Summit (virtual)
October 2023: LeanDojo: Theorem Proving with Retrieval-Augmented Language Models at CMSA New Technologies in Mathematics seminar (Harvard)
May 2023: What we need before even attempting to replace programmers with AI at TEDxBoston

Teaching Experience

Spring 2022: TA, Nonlinear Optimization (MIT 6.252)
Fall 2021: TA, Machine Learning (MIT 6.867)
Spring 2020: TA, Signals, Systems, and Inference (MIT 6.011)

Service

Organizer of the neurosymbolic programming reading group at MIT.
Reviewer for AISTATS (2022, 2023), NeurIPS (2022, 2023, 2024), EMNLP (2022), ICML (2022, 2023), ICLR DL4C Workshop (2022, 2023), ICLR (2024)
Artifact Evaluation Committee member, PLDI 2022
Co-organizer of the ICML 2021 Social on Open Collaboration in ML Research
Outstanding reviewer for ML reproducibility challenge (2021)
Student volunteer at POPL (2021), PLDI (2021), ICFP (2021)

Olympiad Awards

William Lowell Putnam Competition, Rank 185.5 / 4623 (2018), Rank 212 / 3428 (2019)
2016 USACO (USA Computing Olympiad) Finalist, National Top 26
2015-2017 USAJMO / 2018 USAMO Qualifier
2017 USAPhO (USA Physics Olympiad) Honorable Mention Winner
2015 MATHCOUNTS National Team Champion, 3rd Place Individual

Other Projects

Here are some other projects I've worked on. If you're interested in building off any of them, I highly welcome collaborations!

RestBot - Online Interactive Breaks Revive Positive Thinking and Behavior during COVID-19

The intention of this research project is to explore alternative methodologies available to people who are struggling with quarantine and isolation, providing them with an opportunity to resuscitate positive thinking and behavior. The result of the study proved that direct online social engagement can be an alternative way to resuscitate positive thinking and behavior for everyone under strenuous circumstances, such as during COVID.

Estimating the Lipschitz Constant of Neural Networks

An (unsuccessful) attempt to estimate the Lipschitz constant of neural networks via running optimization techniques on the gradient norm.

Dex Synthesizer

A synthesizer for toy programs in the Dex Programming Language

Exploring Founded Semantics for Static Analysis

A toy exploration of how founded semantics can help static analysis.

Neural Network on FPGA

Implementation of a feedforward neural network on FPGA in Verilog

CUTHBERT: Comprehensive aUTo-arrangement algoritHm for BEats n’ RhyThms

CUTHBERT, a music generator that uses Markov chain models to gain an intuition for music, is a software tool designed to remove knowledge barriers for new musicians and provide inspiration to experienced musicians that just need an idea to start writing.

Comparing Sketching Algorithms for Nearest Neighbor Search

A comparison of six different sketching algorithms for nearest neighbor search (run time/accuracy) on MNIST, GloVe, and SIFT.

Ray Tracer

A ray-tracer written in OCaml based on The Ray Tracer Challenge by Jamis Buck.

Music

Previously, I studied classical piano under Yukiko Sekino and Chun-Chi An. Here are some recordings of concerts I have given in the past:

Recital, Spring 2023 [Program Notes]

Ludwig van Beethoven, Sonata Op. 57, No. 23 "Appassionata" (1806)

Sergei Rachmaninoff, Sonata Op. 36, No. 2 (1913 - 1931)

Chinese Music Concert, Spring 2023, [Program Notes]

王建中, 浏阳河 / Jianzhong Wang, Liuyang River (1972)

黎英海, 夕阳箫鼓 / Yinghai Li, Flute and Drum at Sunset (1975)

陈钢, 阳光照耀着塔什库尔干 / Gang Chen, Sunshine Over Tashkuergan

Recital, Spring 2022 [Program Notes]

Alexander Scriabin, Piano Sonata No. 2 in G-sharp minor (1897)

John Adams, China Gates (1977)

Philip Glass, Opening, from Glassworks (1995)

Robert Schumann, Piano Sonata No. 2 in G minor (1838)

Performance with MIT Symphony Orchestra, Spring 2022

Rachmaninoff Piano Concerto No. 2 in C minor, Op. 18
I. Moderato

Recital, Spring 2021 [Program Notes]

Claude Debussy, Images, Book 1 (1905)

Frédéric Chopin, Sonata No. 2 in B flat minor (1839)

Sergei Rachmaninoff, Piano Concerto No. 2 in C minor, I. Moderato (1901)

Once in a while, I also enjoy creating my own music.