MING-HO YEE https://mhyee.com * https://github.com/mhyee * mh@mhyee.com -------------------------------------------------------------------------------- EXPERIENCE Research Scientist Meta, Menlo Park CA Aug 2024 -- present Ph.D. Candidate Northeastern University, Boston MA Sep 2016 -- Apr 2024 - Built [TypeWeaver](https://github.com/nuprl/TypeWeaver), the first machine-learning-based tool to migrate files from JavaScript to TypeScript, with a 69% success rate (as measured by type checking). - Fine-tuned and evaluated a large language model for code to [generate type definitions](https://github.com/nuprl/StenoType) for TypeScript, allowing 47% of files (with missing type definitions) to type check (22% absolute improvement). - Built an [interpreter in OCaml for a subset of R] (https://github.com/reactorlabs/rhotic) to model the relationship between static and dynamic program analysis. - Sped up test suite by 15% by optimizing dominance graph construction in Ř, a [just-in-time compiler for R](https://github.com/reactorlabs/rir). - Co-chaired, organized, and led over 40 student volunteers at [ECOOP/ISSTA 2018](https://conf.researchr.org/home/ecoop-issta-2018), an international conference for programming languages and software engineering with over 600 attendees. - Mentored undergraduate, master's, and Ph.D. students by providing feedback and advice on project planning, software development, and written and oral communication. - Teaching assistant for "Fundamentals II, Introduction to Class-based Program Design" ([CS 2510](https://course.ccs.neu.edu/cs2510sp18/)) and "Fundamentals of Software Engineering" ([CS 4530] (https://neu-se.github.io/CS4530-Spring-2024/)): helped design assignments, held office hours, and graded exams. Researcher Intern Microsoft Research, Cambridge UK Sep -- Dec 2019 - Explored, implemented, and tested different memory management strategies for [Project Verona](https://github.com/microsoft/verona-rt). MMath Candidate University of Waterloo, Waterloo ON Sep 2014 -- Aug 2016 - Designed and led the implementation of the original interpreter and compiler for the functional sub-language of [Flix](https://flix.dev/). - Experimented with different code generation techniques for Flix, such as Scala macros and generating Scala code. - Teaching assistant for "Foundations of Sequential Programs" ([CS 241] (https://www.student.cs.uwaterloo.ca/~cs241/), [CS 241E] (https://www.student.cs.uwaterloo.ca/~cs241e/)) and "Compiler Construction" ([CS 444](https://www.student.cs.uwaterloo.ca/~cs444/)): held office hours and provided feedback on assignments and exams for the first offering of CS 241E. Software Development Engineering Intern Microsoft, Redmond WA May -- Jul 2014 - Prototyped [concepts lite] (https://en.cppreference.com/w/cpp/language/constraints) in the Microsoft Visual C++ (MSVC) compiler, a feature that was eventually added to C++20. Software Development Engineering Intern Microsoft, Redmond WA Sep -- Dec 2013 - Implemented [user-defined literals] (https://en.cppreference.com/w/cpp/language/user_literal) in the Microsoft Visual C++ (MSVC) compiler, a C++11 feature that was missing from MSVC. Software Development Engineering Intern Microsoft, Redmond WA Jan -- Apr 2013 - Developed a heap memory collection tool for debugging .NET applications. - Designed and conducted performance tests for the memory collection tool. Undergraduate Research Assistant University of Waterloo, Waterloo ON May -- Dec 2012 Developer Engagio (formerly Eqentia), Toronto ON Sep -- Dec 2011 Developer Eqentia, Toronto ON Jan -- Apr 2011 Software Development Research Intern Genesys Telecommunications Laboratories, Markham ON May -- Aug 2010 Junior Developer Robarts Research Institute, London ON Jul -- Aug 2008 PUBLICATIONS MH Yee and A Guha (2023). [Do Machine Learning Models Produce TypeScript Types That Type Check?](https://doi.org/10.4230/LIPIcs.ECOOP.2023.37), ECOOP. L von Werra, H de Vries, et al. (2023). [StarCoder: may the source be with you!](https://doi.org/10.48550/arXiv.2305.06161), TMLR. F Cassano, MH Yee, N Shinn, A Guha, S Holtzen (2023). [Type Prediction With Program Decomposition and Fill-in-the-Type Training] (https://doi.org/10.48550/arXiv.2305.17145), preprint. F Cassano et al. (2023). [MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation] (https://doi.org/10.1109/TSE.2023.3267446), TSE. O Flückiger, G Chari, MH Yee, J Ječmen, J Hain, J Vitek (2020). [Contextual Dispatch for Function Specialization] (https://doi.org/10.1145/3428288), OOPSLA. O Flückiger, G Chari, J Ječmen, MH Yee, J Hain, J Vitek (2019). [R Melts Brains: An IR for First-Class Environments and Lazy Effectful Arguments] (https://doi.org/10.1145/3359619.3359744), DLS. MH Yee, A Badouraly, O Lhoták, F Tip, J Vitek (2019). [Precise Dataflow Analysis of Event-Driven Applications](https://arxiv.org/abs/1910.12935), technical report. O Flückiger, G Scherer, MH Yee, A Goel, A Ahmed, J Vitek (2018). [Correctness of Speculative Optimizations with Dynamic Deoptimization] (https://doi.org/10.1145/3158137), POPL. M Madsen, MH Yee, O Lhoták (2016). [From Datalog to Flix: A Declarative Language for Fixed Points on Lattices] (https://doi.org/10.1145/2908080.2908096), PLDI. M Safa, MH Yee, D Rayside, C T Haas (2016). [Optimizing Contractor Selection for Construction Packages in Capital Projects] (https://dx.doi.org/10.1061/%28ASCE%29CP.1943-5487.0000555), ASCE J. Comput. Civ. Eng. E Zulkoski, C Kleynhans, MH Yee, D Rayside, K Czarnecki (2014). [Optimizing Alloy for Multi-objective Software Product Line Configuration] (https://doi.org/10.1007/978-3-662-43652-3_34), ABZ. R Bartha, MH Yee, R Rupsingh, M Smith, M Borrie (2009). [Altered macromolecule signal in the hippocampus in alzheimer patients measured by 1H magnetic resonance spectroscopy] (https://doi.org/10.1016/j.jalz.2009.04.138), Alzheimer's &; Dementia. TECHNICAL SKILLS - Implementation experience: interpreters, just-in-time compilers, memory management, program analysis. - Languages: C, C++, Java, Scala, OCaml, Python, Ruby, JavaScript, TypeScript, R. - Compilation targets: ARM, JVM, LLVM, MIPS, x86. EDUCATION Doctor of Philosophy in Computer Science Northeastern University, Boston MA Apr 2024 Thesis: [Predicting TypeScript Type Annotations and Definitions with Machine Learning](https://hdl.handle.net/2047/D20653005) Advisor: [Arjun Guha](https://www.khoury.northeastern.edu/~arjunguha/main/home/) Master of Mathematics in Computer Science University of Waterloo, Waterloo ON Jun 2017 Thesis: [Implementing a Functional Language for Flix](https://hdl.handle.net/10012/10856) (completed Aug 2016) Advisor: [Ondřej Lhoták](https://plg.uwaterloo.ca/~olhotak/) Bachelor of Software Engineering University of Waterloo, Waterloo ON Jun 2014 With Distinction --- Dean's Honours List