Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia
Dynamo: A Transparent Dynamic Optimization System
https://doi.org/10.1145/349299.349303
Programming Language Design and Implementation (PLDI), 2000, pages 1-12.

    This paper introduces Dynamo, a system for dynamically optimizing programs.
    Dynamo interprets a machine code binary, monitoring it for hot instruction
    sequences (called "traces"). Once a trace is hot, Dynamo produces a
    "fragment" by optimizing and recompiling the trace, and then stores it in a
    cache. On subsequent executions of the trace, Dynamo will fetch the fragment
    and execute it directly on hardware. Once the fragment has finished
    executing, control returns to Dynamo and the interpreter.

    Programs are assumed to spend most of their time in loop bodies, which are
    captured by traces in Dynamo. The insight is to focus optimizations on
    these hot portions of code. Since traces consist of sequential code (which
    lack control-flow merge points) and run through function boundaries, more
    optimization opportunities become available.


Andreas Gal and Michael Franz
Incremental Dynamic Code Generation with Trace Trees
https://github.com/nuprl/hopl-s2017/raw/master/tracing-jit/Gal06_Trace_Trees.pdf
Technical Report ICS-TR-06-16, University of California, Irvine, 2006.

    The authors implement tracing in a just-in-time compiler (JIT) for Java.
    They introduce a new intermediate structure, trace trees, which are
    discovered and constructed on demand, as well as a form of Static Single
    Assignment (SSA) adapted for trace trees. The authors show that their
    technique generates code that is comparable to method-based JIT compilers,
    for only a fraction of memory overhead and engineering effort.

    Instead of treating methods as compilation units, as traditional JITs do,
    this paper considers individual traces. The authors adapt the tracing
    technique from Dynamo and introduce trace trees, which are which are able
    to represent traces that partially overlap. The authors' techniques require
    significantly less engineering effort and memory overhead when compared to
    traditional method-based JITs.


Andreas Gal, Brendan Eich, et al.
Trace-based Just-in-Time Type Specialization for Dynamic Languages
https://doi.org/10.1145/1542476.1542528
Programming Language Design and Implementation (PLDI), 2009, pages 465-478.

    This paper describes TraceMonkey, a tracing JIT for JavaScript used by the
    Firefox browser. When generating code for a dynamic language such as
    JavaScript, a compiler has to produce branching code that handles all the
    different possible types. However, by tracing the sequence of instructions
    that are executed, TraceMonkey is able to speculate on the types and
    generate specialized code without branching.

    TraceMonkey builds on the previous paper, by adding tracing to a JIT for a
    dynamic language. A tracing JIT compiler helps with many problems caused by
    a lack of types. The authors also present a new algorithm for handling
    traces of nested loops.


Carl Friedrich Bolz, Antonio Cuni, Maciej Fijałkowski, and Armin Rigo
Tracing the Meta-Level: PyPy's Tracing JIT Compiler
https://doi.org/10.1145/1565824.1565827
Implementation, Compilation, Optimization of Object-Oriented Languages
 and Programming Systems (ICOOOLPS), 2009, pages 18-25.

    The authors apply tracing to PyPy, which is a project for easy
    implementation of dynamic languages. However, instead of tracing the user
    program, their technique traces the language interpreter that executes the
    user program. PyPy requires hints from the language developer. The number
    of hints is usually small.

    This paper applies the tracing technique from Dynamo and TraceMonkey.
    However, instead of tracing the user program, the authors trace the
    interpreter that executes the user program. The work in this paper is most
    similar to DynamoRIO, an earlier project that applied Dynamo to
    interpreters. To effectively trace an interpreter, the interpreter's
    dispatch loop needs to be unrolled to correspond to user program loop.
    Furthermore, many operations in the dispatch loop can be constant folded.
    The main benefit of this approach is that dynamic language implementations
    using PyPy get tracing and its optimizations for free.


Håkan Ardö, Carl Friedrich Bolz, and Maciej Fijałkowski
Loop-Aware Optimizations in PyPy's Tracing JIT
https://doi.org/10.1145/2384577.2384586
Dynamic Languages Symposium (DLS), 2012, pages 63-72.

    The authors adapt an optimization originally developed by Mike Pall for
    LuaJIT. The optimization consists of a simple transformation, which "peels
    off" a single loop iteration. Afterwards, the optimizer sees the peeled
    iteration before the rest of the loop, and is able to apply optimizations
    such as common subexpression elimination, which corresponds to hoisting
    loop invariant code.

    The authors implement this optimization in PyPy, which was described in the
    previous paper. However, the optimization applies to tracing JITs in
    general, and is only needed due to the nature of traces. Because a trace is
    a linear sequence of code, there is no control-flow information.
    Optimizations typically consist of a single forward pass, and so more
    complicated optimizations such as loop-invariant code motion are not
    possible. By peeling off a single loop iteration, the optimizer can see
    which expressions are redundant in the loop and eliminate them. No other
    changes are needed to the optimizer, in contrast to TraceMonkey, which
    manually implements a similar optimization.