Alex Gu

I am a PhD student at MIT advised by the wonderful Armando Solar-Lezama. I also did my bachelor's and Master's degrees at MIT, advised by Armando and Jacob Andreas. I have done internships at Meta AI Research, Jane Street, and My research is generously funded by the NSF GRFP.

Email: gua [at]

Google Scholar  /  Twitter  /  SoundCloud  /  YouTube  /  WeChat

profile photo

[Oct 2023] New preprint, Language Agnostic Code Embeddings, an approach to separate syntax and semantics for code embeddings, is on arXiv!

[Oct 2023] LINC, a neurosymbolic approach to logical reasoning, has been accepted to EMNLP 2023!

[Oct 2023] I gave a talk about LeanDojo at Harvard! In addition, LeanDojo was selected for an oral presentation at NeurIPS 2023!

[May 2023] A recording of my piano recital this spring is now on YouTube (and below on this page)!

[May 2023] I'm interning at Meta AI Research under Sida Wang this summer! If you're in the Seattle area, I'd love to chat!

[May 2023] My TEDxBoston talk on AI for Code is now online!

Research Interests

My research dreams currently lie in these three (somewhat different) areas, though I'm currently focusing on the first two. I'm very open to both collaborating and advising students (through UROP if you're an undergrad at MIT, we can discuss otherwise), so don't hesitate to reach out if you're excited about these topics and would like to work together!

1. Neurosymbolic programming, specifically through the lens of smashing symbolic techniques together with LLM's to create more reliable and powerful ways to perform program synthesis.

2. Developing methodologies to interpret the behavior and understand the capabilities of code LM's, understand their reasoning abilities, and distinguish them from language LM's.

3. Understanding theoretical foundations and training dynamics behind large language models (for code), such as sensitivity to of initialization/learning rates, sharpness-aware training, or pruning.

Language Agnostic Code Embeddings
Saiteja Utpala, Alex Gu, Pin Yu Chen

We apply a method used in multilingual NLP on code, showing that vector embeddings for code can be decomposed into syntax-like and semantic-like components. We show that when removing the syntax-like component, language identification becomes expectedly difficult, and Text2Code and Code2Code retrieval performance improves.

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
Alex Gu*, Theo X. Olausson*, Benjamin Lipkin*, Cedegao E. Zhang*, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy
EMNLP 2023

We propose, LINC, a neurosymbolic approach to logical reasoning from natural language where the LLM acts as an autoformalizer to first-order logic and a logic theorem prover makes a deduction. We also qualitatively compare LINC to chain of thought, showing they make different mistakes and thus are complimentary.

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar
NeurIPS Datasets and Benchmarks Track 2023, Oral presentation

We release LeanDojo: an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant. LeanDojo contains 1) tools for data extraction and interaction with Lean, 2) fine-grained annotations of where lemmas are used and defined, 3) a new benchmark of 97K human-written theorems from mathlib, and 4) a retrieval-augmented theorem prover using retrieval for relevant premise selection.

💫 StarCoder: may the source be with you!
Raymond Li, ..., Alex Gu, ..., Leandro von Werra*, Harm de Vries*

StarCoder and StarCoderBase are 15.5B parameter models with 8K context length, infilling capabilities, and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder.

Pruning CodeBERT for Improved Code-to-Text Efficiency [Coming Soon!]
Alex Gu, Ria Sonecha, Saaketh Vedantam, Bharat Runwal, Diganta Misra
Workshop on Sparsity in Neural Networks, ICLR 2023

We do some very preliminary experiments on pruning the encoder piece of CodeBERT. Preprint coming soon, but if you're interested in this topic, please reach out!

Min-Max Bilevel Multi-objective Optimization with Applications in Robust Machine Learning
Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng
ICLR 2023 [Video from ICML 2021 workshop]

We extend standard single-objective bilevel optimization to a min-max multi-objective framework that aims to minimize the worst-case loss of all tasks. Our main result is theoretical: we introduce a new algorithm (MORBiT) for our framework and show a convergence result. We also highlight applications in representation learning and hyperparameter optimization.

🎅SantaCoder: Don't Reach for the Stars!🌟
Loubna Ben Allal*, Raymond Li*, Denis Kocetkov*, ..., Alex Gu, ..., Leandro von Werra*
Deep Learning for Code Workshop, ICLR 2023 [Best Paper]

The BigCode project is an open-scientific collaboration working on the responsible open-source development of large language models for code. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack, and our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling.

ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications
Alex Gu, Tamara Mitrovska, Daniela Velez, Jacob Andreas, Armando Solar-Lezama
Deep Learning for Code Workshop, ICLR 2023

ObSynth leverages domain knowledge embedded in GPT-3 to help users design object models from high level natural language prompts. We synthesize object names, field names, field types, method names, and relationships between objects. Also, we conduct a user study to highlight how users may interact with ObSynth to design a restaurant management application.

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Satyapriya Krishna*, Tessa Han*, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju
Workshop on Trust and Reliance in AI-Human Teams, CHI 2022

We introduce the disagreement problem in explainable machine learning, showing that commonly used algorithms including LIME, SHAP, and SmoothGrad often disagree in practice. We also conduct a user study highlighting that practitioners do not have good ways of resolving these disagreements in their day to day.

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates
Alp Yurtsever*, Alex Gu*, Suvrit Sra
NeurIPS 2021 [Video]

Standard three operator splitting minimizes the sum of three convex functions f(x) + g(x) + h(x), where f is smooth and the prox operators of g and h are computable. We focus on three settings: (i) f is nonsmooth, (ii) we only have noisy gradients and subgradients of f, (iii) an adaptive setting where smoothness properties of f are unknown.

Certified Interpretability Robustness for Class Activation Mapping
Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel
Workshop on Machine Learning for Autonomous Driving, NeurIPS 2020 [Video]

Our algorithm, CORGI, computes certified bounds on an interpretability algorithm known as CAM (Class Activation Mapping), showing that within the certified radius, the top-K pixels of the CAM map do not change.

Teaching Experience
  • Spring 2022: TA, Nonlinear Optimization (6.252)
  • Fall 2021: TA, Machine Learning (6.867)
  • Spring 2020: TA, Signals, Systems, and Inference (6.011)
Other Projects

Here are some other projects I've worked on. If you're interested in building off any of them, I highly welcome collaborations!

RestBot - Online Interactive Breaks Revive Positive Thinking and Behavior during COVID-19

The intention of this research project is to explore alternative methodologies available to people who are struggling with quarantine and isolation, providing them with an opportunity to resuscitate positive thinking and behavior. The result of the study proved that direct online social engagement can be an alternative way to resuscitate positive thinking and behavior for everyone under strenuous circumstances, such as during COVID.

Estimating the Lipschitz Constant of Neural Networks

An (unsuccessful) attempt to estimate the Lipschitz constant of neural networks via running optimization techniques on the gradient norm.

Dex Synthesizer

A synthesizer for toy programs in the Dex Programming Language

Exploring Founded Semantics for Static Analysis

A toy exploration of how founded semantics can help static analysis.

Neural Network on FPGA

Implementation of a feedforward neural network on FPGA in Verilog

CUTHBERT: Comprehensive aUTo-arrangement algoritHm for BEats n’ RhyThms

CUTHBERT, a music generator that uses Markov chain models to gain an intuition for music, is a software tool designed to remove knowledge barriers for new musicians and provide inspiration to experienced musicians that just need an idea to start writing.

Comparing Sketching Algorithms for Nearest Neighbor Search

A comparison of six different sketching algorithms for nearest neighbor search (run time/accuracy) on MNIST, GloVe, and SIFT.

Ray Tracer

A ray-tracer written in OCaml based on The Ray Tracer Challenge by Jamis Buck.


I also study classical piano under Yukiko Sekino. Here are some recordings of concerts I have given in the past:

Recital, Spring 2023 [Program Notes]

Ludwig van Beethoven, Sonata Op. 57, No. 23 "Appassionata" (1806)

Sergei Rachmaninoff, Sonata Op. 36, No. 2 (1913 - 1931)

Chinese Music Concert, Spring 2023, [Program Notes]

王建中, 浏阳河 / Jianzhong Wang, Liuyang River (1972)

黎英海, 夕阳箫鼓 / Yinghai Li, Flute and Drum at Sunset (1975)

陈钢, 阳光照耀着塔什库尔干 / Gang Chen, Sunshine Over Tashkuergan

Recital, Spring 2022 [Program Notes]

Alexander Scriabin, Piano Sonata No. 2 in G-sharp minor (1897)

John Adams, China Gates (1977)

Philip Glass, Opening, from Glassworks (1995)

Robert Schumann, Piano Sonata No. 2 in G minor (1838)

Performance with MIT Symphony Orchestra, Spring 2022

Rachmaninoff Piano Concerto No. 2 in C minor, Op. 18
I. Moderato

Recital, Spring 2021 [Program Notes]

Claude Debussy, Images, Book 1 (1905)

Frédéric Chopin, Sonata No. 2 in B flat minor (1839)

Sergei Rachmaninoff, Piano Concerto No. 2 in C minor, I. Moderato (1901)

Chamber Music, Spring 2021

Edvard Grieg, Sonata No. 3 in C minor for Violin and Piano, Op. 45 (1887)
I. Allegro molto ed appassionato
II. Allegretto espressivo alla Romanza

Once in a while, I also enjoy creating my own music.

A sentimental piece that I co-created with my friend Boom once for New Year's.

A remix of Parry Gripp's hit song, It's Raining Tacos, which often kept me sane in my undergraduate days.

A remix of another one of my guilty pleasure songs, 学猫叫 (Learn to Meow).

A surreal, jolly, but nostalgic piece reflecting on the past.

Thanks to Jon Barron for the awesome website template. Last updated on October 29, 2023.