Publications and Other Works

Papers

Do Machine Learning Models Produce TypeScript Types That Type Check?

Ming-Ho Yee, Arjun Guha
ECOOP 2023
Abstract

Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the research community, there has been significant interest in using machine learning to automate TypeScript type migration. Existing machine learning models report a high degree of accuracy in predicting individual TypeScript type annotations. However, in this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker?

We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps that are necessary for using a type prediction model, including (1) importing types for a project’s dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type predictions when needed.

We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that have never been typed before. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully.

StarCoder: may the source be with you!

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries
TMLR 2023
Abstract

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

Type Prediction With Program Decomposition and Fill-in-the-Type Training

Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen
In submission, 2023
Abstract

TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file. All code, data, and models are available on GitHub.

MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation

Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda
TSE 2023
Abstract

Large language models have demonstrated the ability to generate both natural language and programming language text. Although contemporary code generation models are trained on corpora with several programming languages, they are tested using benchmarks that are typically monolingual. The most widely used code generation benchmarks only target Python, so there is little quantitative evidence of how code generation models perform on other programming languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages. We use MultiPL-E to extend the HumanEval benchmark (Chen et al., 2021) and MBPP benchmark (Austin et al., 2021) to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex (Chen et al., 2021), CodeGen (Nijkamp et al., 2022) and InCoder (Fried et al., 2022). We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages.

Contextual Dispatch for Function Specialization

Olivier Flückiger, Guido Chari, Ming-Ho Yee, Jan Ječmen, Jakob Hain, Jan Vitek
OOPSLA 2020
Abstract

In order to generate efficient code, dynamic language compilers often need information, such as dynamic types, not readily available in the program source. Leveraging a mixture of static and dynamic information, these compilers speculate on the missing information. Within one compilation unit, they specialize the generated code to the previously observed behaviors, betting that past is prologue. When speculation fails, the execution must jump back to unoptimized code. In this paper, we propose an approach to further the specialization, by disentangling classes of behaviors into separate optimization units. With contextual dispatch, functions are versioned and each version is compiled under different assumptions. When a function is invoked, the implementation dispatches to a version optimized under assumptions matching the dynamic context of the call. As a proof-of-concept, we describe a compiler for the R language which uses this approach. Our implementation is, on average, 1.7× faster than the GNU R reference implementation. We evaluate contextual dispatch on a set of benchmarks and measure additional speedup, on top of traditional speculation with deoptimization techniques. In this setting contextual dispatch improves the performance of 18 out of 46 programs in our benchmark suite.

R Melts Brains: An IR for First-Class Environments and Lazy Effectful Arguments

Olivier Flückiger, Guido Chari, Jan Ječmen, Ming-Ho Yee, Jakob Hain, Jan Vitek
DLS 2019
Abstract

The R programming language combines a number of features considered hard to analyze and implement efficiently: dynamic typing, reflection, lazy evaluation, vectorized primitive types, first-class closures, and extensive use of native code. Additionally, variable scopes are reified at runtime as first-class environments. The combination of these features renders most static program analysis techniques impractical, and thus, compiler optimizations based on them ineffective. We present our work on PIR, an intermediate representation with explicit support for first-class environments and effectful lazy evaluation. We describe two dataflow analyses on PIR: the first enables reasoning about variables and their environments, and the second infers where arguments are evaluated. Leveraging their results, we show how to elide environment creation and inline functions.

Precise Dataflow Analysis of Event-Driven Applications

Ming-Ho Yee, Ayaz Badouraly, Ondřej Lhoták, Frank Tip, Jan Vitek
Technical report, 2019
Abstract

Event-driven programming is widely used for implementing user interfaces, web applications, and non-blocking I/O. An event-driven program is organized as a collection of event handlers whose execution is triggered by events. Traditional static analysis techniques are unable to reason precisely about event-driven code because they conservatively assume that event handlers may execute in any order. This paper proposes an automatic transformation from Interprocedural Finite Distributive Subset (IFDS) problems to Interprocedural Distributed Environment (IDE) problems as a general solution to obtain precise static analysis of event-driven applications; problems in both forms can be solved by existing implementations. Our contribution is to show how to improve analysis precision by automatically enriching the former with information about the state of event handlers to filter out infeasible paths. We prove the correctness of our transformation and report on experiments with a proof-of-concept implementation for a subset of JavaScript.

Correctness of Speculative Optimizations with Dynamic Deoptimization

Olivier Flückiger, Gabriel Scherer, Ming-Ho Yee, Aviral Goel, Amal Ahmed, Jan Vitek
POPL 2018
Abstract

High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition.

From Datalog to Flix: A Declarative Language for Fixed Points on Lattices

Magnus Madsen, Ming-Ho Yee, Ondřej Lhoták
PLDI 2016
Abstract

We present Flix, a declarative programming language for specifying and solving least fixed point problems, particularly static program analyses. Flix is inspired by Datalog and extends it with lattices and monotone functions. Using Flix, implementors of static analyses can express a broader range of analyses than is currently possible in pure Datalog, while retaining its familiar rule-based syntax. We define a model-theoretic semantics of Flix as a natural extension of the Datalog semantics. This semantics captures the declarative meaning of Flix programs without imposing any specific evaluation strategy. An efficient strategy is semi-naive evaluation which we adapt for Flix. We have implemented a compiler and runtime for Flix, and used it to express several well-known static analyses, including the IFDS and IDE algorithms. The declarative nature of Flix clearly exposes the similarity between these two algorithms.

Optimizing Contractor Selection for Construction Packages in Capital Projects

Mahdi Safa, Ming-Ho Yee, Derek Rayside, Carl T. Haas
Journal of Computing in Civil Engineering, 2016
Abstract

In capital construction projects, the contractor selection process is executed in the front-end planning phase based on the experience and judgment of the project leadership team, whose members are considered experts. A comprehensive construction value packaging system (CVPS) is presented for assisting these project leaders in improving the efficacy, efficiency, and transparency of this process. The CVPS comprises an interface-oriented approach to work decomposition and a multicriteria approach to contractor selection. Selection is supported by software for iteratively computing and interactively visualizing the Pareto front of optimal work packages. The CVPS is validated by functional demonstration and a case study of a piping installation project in an energy infrastructure facility. The initial Pareto front had more than 100,000 optimal work packages representing different trade-offs between cost, time, contractor experience, maintenance rates, and financial stability. An iterative and interactive process of query formulation and exact (not heuristic) evaluation helped the expert to focus in on solutions of interest and created a transparent and quantitative record of the decision-making process. Thus, this research contributes a new paradigm and computational tool for project decision support.

Optimizing Alloy for Multi-objective Software Product Line Configuration

Ed Zulkoski, Chris Kleynhans, Ming-Ho Yee, Derek Rayside, Krzysztof Czarnecki
ABZ 2014
Abstract

Software product line (SPL) engineering involves the modeling, analysis, and configuration of variability-rich systems. We improve the performance of the multi-objective optimization of SPLs in Alloy by several orders of magnitude with two techniques.

First, we rewrite the model to remove binary relations that map to integers, which enables removing most of the integer atoms from the universe. SPL models often require using large bitwidths, hence the number of integer atoms in the universe can be orders of magnitude more than the other atoms. In our approach, the tuples for these integer-valued relations are computed outside the sat solver before returning the solution to the user. Second, we add a checkpointing facility to Kodkod, which allows the multi-objective optimization algorithm to reuse previously computed internal sat solver state, after backtracking.

Together these result in orders of magnitude improvement in using Alloy as a multi-objective optimization tool for software product lines.

Altered macromolecule signal in the hippocampus in alzheimer patients measured by 1H magnetic resonance spectroscopy

Robert Bartha, Ming-Ho Yee, Raul Rupsingh, Matthew Smith, Michael Borrie
Alzheimer's & Dementia, 2009

Theses

Predicting TypeScript Type Annotations and Definitions with Machine Learning

PhD thesis proposal, Northeastern University
September 2023
Abstract

Type information is useful for developing large-scale software systems. Types help prevent bugs, but may be inflexible and hamper quick iteration on early prototypes. TypeScript, a syntactic superset of JavaScript, brings the best of both worlds, allowing programmers to freely mix statically and dynamically typed code, and choose the level of type safety they wish to opt into. However, type migration, the process of migrating an untyped program to a typed version, has remained a labour-intensive manual effort in practice. As a first step towards automated effective type migration, there has been interest in applying machine learning to the narrower problem of type prediction.

In my thesis, I propose to use machine learning to partially migrate JavaScript programs to TypeScript, by predicting type annotations and generating type definitions. To support my thesis, I make three contributions. First, I propose evaluating type prediction by type checking the generated annotations instead of computing accuracy. Second, I fine-tune a large language model with fill-in-the-middle capability to fill-in-the-type and predict type annotations. Finally, I use a similar approach to fine-tune a large language model to generate missing type definitions.

Implementing a Functional Language for Flix

MMath thesis, University of Waterloo
September 2016
Abstract

Static program analysis is a powerful technique for maintaining software, with applications such as compiler optimizations, code refactoring, and bug finding. Static analyzers are typically implemented in general-purpose programming languages, such as C++ and Java; however, these analyzers are complex and often difficult to understand and maintain. An alternate approach is to use Datalog, a declarative language. Implementors can express analysis constraints declaratively, which makes it easier to understand and ensure correctness of the analysis. Furthermore, the declarative nature of the analysis allows multiple, independent analyses to be easily combined.

Flix is a programming language for static analysis, consisting of a logic language and a functional language. The logic language is inspired by Datalog, but supports user-defined lattices. The functional language allows implementors to write functions, something which is not supported in Datalog. These two extensions, user-defined lattices and functions, allow Flix to support analyses that cannot be expressed by Datalog, such as a constant propagation analysis. Datalog is limited to constraints on relations, and although it can simulate finite lattices, it cannot express lattices over an infinite domain. Finally, another advantage of Flix is that it supports interoperability with existing tools written in general-purpose programming languages.

This thesis discusses the implementation of the Flix functional language, which involves abstract syntax tree transformations, an interpreter back-end, and a code generator back-end. The implementation must support a number of interesting language features, such as pattern matching, first-class functions, and interoperability.

The thesis also evaluates the implementation, comparing the interpreter and code generator back-ends in terms of correctness and performance. The performance benchmarks include purely functional programs (such as an N-body simulation), programs that involve both the logic and functional languages (such as matrix multiplication), and a real-world static analysis (the Strong Update analysis). Additionally, for the purely functional benchmarks, the performance of Flix is compared to C++, Java, Scala, and Ruby.

In general, the performance of compiled Flix code is significantly faster than interpreted Flix code. This applies to all the purely functional benchmarks, as well as benchmarks that spend most of the time in the functional language, rather than the logic language. Furthermore, for purely functional code, the performance of compiled Flix is often comparable to Java and Scala.

Posters

Flix and its Implementation: A Language for Static Analysis

Ming-Ho Yee, Magnus Madsen, Ondřej Lhoták
ECOOP 2016
Abstract

Flix is a language for implementing static analyses. Flix is inspired by Datalog, but supports user-defined lattices and functions, allowing a larger class of analyses to be expressed. For example, a constant propagation analysis can be expressed in Flix, but not in Datalog. A static analysis in Flix is specified as a set of constraints in a logic language, while functions are expressed in a pure functional language.

Optimizing Moolloy: A Solver for Multi-Objective Optimization Problems

Joseph Hong, Chris Kleynhans, Ming-Ho Yee, Atulan Zaman
Capstone Design Project, University of Waterloo
March 2014
Abstract

Our project is to optimize Moolloy, a solver for multi-objective optimization problems, so that it can solve problems faster and scale with additional hardware. Moolloy finds all optimal solutions, which are configurations that cannot be improved in one metric without making some other metric worse.

Talks

A History of Classical Music

Undistinguished Lecture Series, Northeastern University
February 2018
Abstract

What is classical music? How has it evolved over time? In this talk, I'll give a tour of the last 400 years of classical music. We'll explore a selection of composers, wander through different periods and styles, and of course, listen to some music. No musical background is required for this talk. Whether you are completely new to classical music, or have season tickets to the symphony, I hope you will find something interesting in this talk.

Push-Down Automata for Higher Order Flow Analysis

CS 7480 (Program Analysis Seminar), Northeastern University
November 2017

Control Flow Analysis in Scheme

Aviral Goel, Ming-Ho Yee
CS 7400 (Intensive Principles of Programming Languages), Northeastern University
April 2017
Abstract

To better understand Control Flow Analysis in Scheme, we implemented CFA as a Redex model. We presented our project in class, as a blackboard presentation.

Tracing JITs for Dynamic Languages

CS 7480 (History of Programming Languages), Northeastern University
February 2017
Abstract

Traditional JIT (just-in-time) compilers are method-based: they compile “hot” (i.e. frequently executed) methods to native code. An alternative is trace-based or tracing JITs, where the compilation unit is a (hot) sequence of instructions. Typically, such sequences of instructions correspond to loops, where programs spend most of their execution time.

Where did the idea of tracing come from? What was appealing about it? How was tracing adapted for JITs and dynamic languages? What happened to Mozilla’s TraceMonkey, which used to be part of Firefox? Do any JITs today use tracing?

Yesterday, My Program Worked. Today, It Does Not. Why?

CS 7480 (Advanced Program Analysis), Northeastern University
November 2016

Flix: A Language for Static Analysis

Magnus Madsen, Ming-Ho Yee, Ondřej Lhoták
Seminar, Northeastern University
October 2016
Abstract

Static program analysis is a powerful technique for maintaining software, with applications such as compiler optimizations, code refactoring, and bug finding. Static analyzers are typically implemented in general-purpose programming languages, such as C++ and Java; however, these analyzers are complex and often difficult to understand and maintain. An alternate approach is to use Datalog, a declarative language. Implementors can express analysis constraints declaratively, which makes it easier to understand and ensure correctness of the analysis. Furthermore, the declarative nature of the analysis allows multiple, independent analyses to be easily combined.

Flix is a programming language for static analysis, consisting of a logic language and a functional language. The logic language is inspired by Datalog, but supports user-defined lattices. The functional language allows implementors to write functions, something which is not supported in Datalog. These two extensions, user-defined lattices and functions, allow Flix to support analyses that cannot be expressed by Datalog, such as a constant propagation analysis. Datalog is limited to constraints on relations, and although it can simulate finite lattices, it cannot express lattices over an infinite domain. Finally, another advantage of Flix is that it supports interoperability with existing tools written in general-purpose programming languages.

This talk provides an overview of the implementation and evaluation of the Flix functional language. The implementation involves abstract syntax tree transformations, an interpreter back-end, and a code generator back-end, and must support a number of interesting language features, such as pattern matching, first-class functions, and interoperability. The evaluation compares the interpreter and code generator back-ends in terms of correctness and performance.

Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages

CS 7480 (Advanced Program Analysis), Northeastern University
September 2016

The Flix Language

Magnus Madsen, Ming-Ho Yee, Ondřej Lhoták
Seminar, University of Waterloo
March 2016
Abstract

We present Flix, a declarative programming language for specifying and solving least fixed point problems, particularly static program analyses. Flix is inspired by Datalog and extends it with lattices and monotone functions. Using Flix, implementors of static analyses can express a broader range of analyses than is currently possible in pure Datalog, while retaining its familiar rule-based syntax. We define a model-theoretic semantics of Flix as a natural extension of the Datalog semantics. This semantics captures the declarative meaning of Flix programs without imposing any specific evaluation strategy. An efficient strategy is semi-naive evaluation which we adapt for Flix. We have implemented a compiler and runtime for Flix, and used it to express several well-known static analyses.

Graduate School: What I Wish I Knew in Third Year

University of Waterloo
October 2015
Abstract

This was a talk for undergraduate students in the software engineering program at the University of Waterloo, the same program I studied in. The goal was to share I had learned about applying to graduate school.

Surgical Precision JIT Compilers

CS 842 (Virtual Machines for Dynamic Languages), University of Waterloo
February 2015

Optimizing Moolloy: A Solver for Multi-Objective Optimization Problems

Joseph Hong, Chris Kleynhans, Ming-Ho Yee, Atulan Zaman
Capstone Design Project, University of Waterloo
March 2014
Abstract

Our project is to optimize Moolloy, a solver for multi-objective optimization problems, so that it can solve problems faster and scale with additional hardware. Moolloy finds all optimal solutions, which are configurations that cannot be improved in one metric without making some other metric worse.

Other Writing

On-Stack Replacement

CS 7600 (Intensive Computer Systems), Northeastern University
December 2018
Abstract

On-stack replacement (OSR) is a programming language implementation technique that allows a running program to switch to a different version of code. For example, a program could start executing optimized code, and then transfer to and start executing unoptimized code. This was the original use case for OSR, to facilitate debugging of optimized code.

After its original use was established, OSR shifted to a different use case: optimizing programs. OSR allows the run-time system to detect if a program is executing an inefficient loop, recompile and optimize the method that contains the loop, and then transfer control to the newly compiled method. Another strategy is to optimize code based on some assumptions, then, if the assumptions are invalidated at run-time, transfer control back to the original, unoptimized code.

In this survey paper, we study how OSR was first introduced as a means for debugging, how it came to be used for program optimizations, its implementation as a reusable library, and other directions of research.

Undergraduate Capstone Design Project

University of Waterloo
2012–2014
Abstract

The Capstone Design Project, also known as the fourth-year design project (FYDP), is a mandatory three-course sequence in the software engineering program at the University of Waterloo, meant to be completed in groups of four undergraduate students. I wrote a series of articles documenting my group's project.