arXiv:2109.00506v1 [quant-ph] 1 Sep 2021
Enabling Retargetable O ptimizing Compilers for
Quantum Accelerators via a Multi-Level
Intermediate Representation
Thien Nguyen
Quantum Science Center
Oak Ridge National Laboratory
Oak Ridge, TN, United States
Alexander McCaskey
Quantum Science Center
Oak Ridge National Laboratory
Oak Ridge, TN, United States
Abstract—We present a multi-level quantum-classical interme-
diate representation (IR) that enables an optimizing, retargetable,
ahead-of-time compiler for available quantu m programming
languages. To demonstrate our architecture, we leverage our
proposed IR to enable a compiler for version 3 of the OpenQASM
quantum language specification. We support the entire gate-
based OpenQASM 3 language an d provide cu st om extensions for
common quantum programming patterns and improved syntax.
Our work builds upon the Multi-level Intermediate Represen-
tation (MLIR) framework and leverages its unique p rogressive
lowering capabilities to map quantum language expressions to
the LLVM machine-level IR. We provide both quantum and
classical optimizations via the MLIR pattern rewriting sub-
system an d standard LLVM optimization passes, and demon-
strate the programmability, compilation, and execution of our
approach via standard benchmarks and test cases. In comparison
to other standalone language and compiler efforts available today,
our work results in compile times that are 1000x faster than
standard Pythonic approaches, and 5-10x faster than comparative
standalone quantum language compilers. Our compiler provides
quantum resource optimizations via standard programming pat-
terns that result in a 10x reduction in entangling operations, a
common source of program noise. Ultimately, we see this work
as a vehicle for rapid quantum compiler prototyping enabli ng
language integration, optimizations, and interoperability with
classical compilation approaches.
Index Terms—quantum computing, quantum programming,
quantum simulation, programming languages
I. INTRODUCTION
Quantum acceleratio n of existing scientific computing work-
flows has the potential to enhance computational scalability
for modeling and simulation tasks in fields such as nucle a r
physics, chemistry, and machine learning [
10], [12], [22].
As hardware architectures continue to scale and improve
enabling more qubits at lower error rates one can exp e ct
This manuscript has been authored by UT-Battelle, LL C under Contract No.
DE-AC05-00OR22725 with the U.S. Department of Energy. The United States
Government retains and the publisher, by accepting the article for publication,
acknowledges that the United States Government retains a non-exclusive, paid-
up, irrevocable, world-wide license to publish or reproduce the published form
of this manuscript, or allow others to do so, for United States Government
purposes. The Department of Energy will provide public access to these results
of federally sponsored research in accordance with the DOE Public Access
Plan. (http://energy.gov/downloads/doe-public-access-plan).
quantum-classical machine mode ls to move toward tighter
integration of CPU and QPU resources [
6], [8], [18]. These
architecture s stand to benefit from robust language and com-
pilation approaches that enable high-level classical language
integration, quantum and classical compiler optimization tech-
niques [
25], and compiler-automated circuit synthesis strate-
gies [
29], [32]. There is a critical need to move the quantum
programming community from manual circuit construction via
vendor-provided data structures and frameworks embedded in
application-level classical languages (Python, etc.) toward per-
formant language app roaches that enable tight integration with
existing classical runtimes, libraries, and languages. Recently,
a nu mber of su c h langu age approach e s have begun to brid ge
this gap in the research commun ity, with stand- a lone languages
such as Q# [
31], OpenQASM 3.0 [9], [13], Silq [7], Scaffold
[15], and classical language extensions like QCOR [20], [23].
In parallel to quan tum pro gramming research and develop-
ment, there has been a wealth of work done to improve classi-
cal compilation frameworks and techniques. One result of note
is the introduction of multi-level intermediate representations
enabling compiler representations at a variety of abstraction
levels including those close to the source language itself
in tande m with associated progressive lowering workflows
that take high-level representations to low-level executable
object code via a hierarchy of intermediate representation (IR)
abstraction. Th is enables robust compiler development for do-
main specific languages that retain automated language-level
optimizations, transformations, and lowering to machine-level
IRs like the LLVM. Treating quantum program expressions as
stand-alon e domain specific languages represents a n opportu-
nity to leverage these state-of-the-art classical multi-level IRs.
Specifically, the MLIR framework [
3], [17] is an example of a
popular multi-level IR in u se today for classically accelerated
heteroge neous workflows [
11], [16], and is well-positioned to
provide a unique resource for the rapid pr ototyping of quantum
languag e com pilers via its extensib le language-level I R and
progressive IR-lowering capabilities.
We recently demonstrated the utility of MLIR for simple
quantum assembly lan guages with no true control flow struc-
1
Quantum
Affine
SCF Std
MLIR Target
LLVM Dialect - LLVM IR
QIR Specification
OpenQASM 2 OpenQASM 3
...
OpenQASM 2 Parser OpenQASM 3 Parser
...
Binary Executable
QCOR Runtime Linkage:
NISQ / FTQC
Classical Compute Codegen:
x86, ARM, PowerPC, etc.
Accelerator
Host
Fig. 1: QCOR’s MLIR-based compilation stack for CPU-
QPU heterogeneous computing. Each quantum programming
languag e is processed by a dedicated parser that produces
the AST of the input source code. AST o f different source
languag es is all mapped to an MLIR representation expressed
in the QCOR quantum dialect and built-in Standard, Affine,
and SCF dialects. This MLIR rep resentation is progressively
(multi-stage) transformed (optimization and dialect conver-
sion/lowering) until only operations in th e LLVM dialect
remain, i.e., quantum operations are converted to LLVM
function calls adhering to the QIR specification . This guar-
antees that the final binary executable is compatible with
any QIR-conformed runtime implementations provided at link
time, su c h as qcor runtime supporting both remotely hosted
(NISQ) and tightly coupled (FTQC) execution models.
tures [
19] a low level MLIR dialect for quantu m computing.
In this work, we leverage and extend that simple q uantum
MLIR extension for a more complex quantum language
OpenQASM version 3.0 (henceforth referred to a s Open-
QASM, unless stated otherwise), which provides robust con-
trol flow structures, variable declaration and assignment, and
novel syntax for quantum circuit generation and synthesis [
1].
Our approach enables an optim iz ing com piler for OpenQASM
that compiles to the LLVM machine-level IR adherent to
the recently introduced Quantum Intermediate Representation
(QIR) sp ecification [
2], [5]. Moreover, this approach need
not be limited to OpenQA SM the implementation pattern
shown in this work ca n serve as a robust mechanism for further
quantum language compiler prototyping and deployment. The
work presented here puts forward the requisite infrastructure
for quantum language expression and LLVM IR generation,
leaving future language compiler imp le mentations as a matter
of providing the mapping of a language abstract syntax tree
(AST) to our quantum MLIR extension (via ANTLR [
28],
for instance). Language lowering to executable code is then
readily available.
We integrate our approach with the qcor compiler platform
[4], [20] (note we use QCOR to denote the language extension
specification [27] and qcor for the compiler implementation).
qcor enables single-sour ce quantum-classical programming
in both C++ and Python, promoting an ahead- of-time C++
compiler executable an d just-in-time compilation infrastruc -
ture for perfor mant quantum-classical code genera tion and
1 %Array = type opaque
2 %Qubit = type opaque
3
4 declare void @__quantum__q is__cnot(%Qubit
*
%0,
%Qubit
*
%1)֒
5 declare void @__quantum__q is__h(%Qubit
*
%0)
6
7 declare void
@__quantum__rt __qubit_release_array(%Array
*
%0)֒
8 declare i8
*
@__quantum__rt __array_get_element_ptr_1d(
%Array
*
%0, i64 %1)
֒
֒
9 declare %Array
*
@__quantum__rt __qubit_allocate_array(i64 %0)֒
10
11 define i32 @ghz() {
12 %4 = call %Array
*
@__quantum__rt __qubit_allocate_array(i64 3)֒
13 %5 = call i8
*
@__quantum__rt __array_get_element_ptr_1d(
%Array
*
%4, i64 0)
֒
֒
14 %6 = bitcast i8
*
%5 to %Qubit
**
15 %7 = load %Qubit
*
, %Qubit
**
%6, align 8
16 call void @__quantum__qis__h(%Qubit
*
%7)
17 %8 = call i8
*
@__quantum__rt __array_get_element_ptr_1d(
%Array
*
%4, i64 1)
֒
֒
18 %9 = bitcast i8
*
%8 to %Qubit
**
19 %10 = load %Qubit
*
, %Qubit
**
%9, align 8
20 call void @__quantum__qis__cnot(%Qubit
*
%7,
%Qubit
*
%10)֒
21 %11 = call i8
*
@__quantum__rt __array_get_element_ptr_1d(
%Array
*
%4, i64 2)
֒
֒
22 %12 = bitcast i8
*
%11 to %Qubit
**
23 %13 = load %Qubit
*
, %Qubit
**
%12, align 8
24 call void @__quantum__qis__cnot(%Qubit
*
%10,
%Qubit
*
%13)֒
25 call void
@__quantum__rt __qubit_release_array(%Array
*
%4)
֒
֒
26 call void @__quantum__rt__finalize()
27 ret i32 0
28 }
Fig. 2: A quantum pro gram generating the Greenberger-Horne-
Zeilinger (G H Z) state on 3 qubits represented in the LLVM
IR adherent to th e QIR specification.
execution. This work extends the qcor executable to stand-
alone Ope nQASM source files, and enables their compilation
and execution in a retargetable (quantum hardware-agnostic)
fashion.
II. BACKGROUND
Figure
1 demonstrates the hierarc hy o f layers underlying our
compiler architecture. Ultimately, we take quantum-classical
languag es down to an MLIR representation be fore emitting
standard object code via lowering to the LLVM IR (we
leverage the LLVM ecosystem of executables to map an
IR bitcode representation to assembly and executable code).
Here we seek to provide nece ssary background o n the QIR
specification, the MLIR framework, and the qcor compiler
frontend and quantum runtime library to set up the presentation
of the rest of the compiler a rchitecture.
2
A. Quantum Intermediate Representation
Recent work h as r esulted in th e development of a for-
mal specification for a Quantum Intermediate Representation
(QIR) embedded in the LLVM IR [
5]. This specification does
not extend the LLVM IR with new instructions pertinent to
quantum computing, instead, it expresses quantum specific op-
erations as declared function calls on opaque data types. Func-
tion declarations and correspond ing signatures are defined for
quantum memory allocation and dealloca tion, individual qubit
addressing, quantum instruction invocation on allocated qubits,
and u tility func tions for arr ay management, tuple creation, and
callable invocation. By describ ing qubits, measurement results,
and arrays as opaque types, and pro moting function decla-
rations over concrete implementations, the QIR specification
promotes a flexible approach to quantum-classical compiler
architecture and integration. The approach represents a novel
target for language com pilers all of the LLVM toolc hain
becomes available, includ ing runtime linking, compile time
classical op timizations, and external language inter operability.
Figure
2 demonstrates a LLVM IR code snippet adherent to
the QIR specification for the generation of a Greenberger-
Horne-Z eilinger (GHZ) maximally entangle state. Note the
declaration of opaque Array and Qub it types (lines 1,2)
and the externally declared QIR runtime functions (lines 4-10).
These are left for implementation by appropriate QIR runtime
libraries, affecting actual execution o f quantum instructions,
array handling ( including array s of Qubit instances), and
data type actualization. Th e body of the code consists of
standard LLVM instructions (bitcast, call, etc.) and calls
to the declared QIR runtime functions. These fun ctions are
concretely provided at link time v ia the runtime libra ry.
B. MLIR
Moving up the IR abstraction hierar chy in Figure
1, recent
developments in classical compilation research and develop-
ment has resulted in the MLIR [
17]. The MLIR repre sents a
modular and extensible approach to defining custom compiler
IRs that can express a spectrum of language abstraction
(language-level IR d own to the machine-level LLVM IR). At
its core, the framework pu ts forward the concept of a language
dialect which is composed of lan guage-specific o perations.
These operations are the core abstraction al unit in the MLIR,
and they model a unique mappin g of operands to return values,
and ca n optionally c arry a dictionary of co mpile-time metad ata
(attributes). Operands and return values are m odeled as an
mlir::Value type, which describes a computable , type d
value and its correspon ding set of users (enabling one to
construct standard use-define chains). T he creation of dialects
provides a mechanism for mapping language ASTs to a corre-
sponding MLIR operation tree compo sed of language-dialect-
specific op erations along sid e other utility dialect op e rations
(standard function calls, memory references and allocation, for
loops, conditional statements, etc.). Mo reover, the framework
puts forward a general progressive lowering ca pability
incrementa l translation of higher dialect operations to lower
level dialect operations. This enables one to define custom
translations from operations in language-level dialects to op-
erations in a machine-level dialects like the LLVM IR. This
infrastructure and corresp onding workflow provide a flexible
architecture for the development of compilation pathways
taking language-level syntax trees down to a machine-level
IR, like the LLVM.
C. qcor
qcor provides C++ [20] and Python [26] lang uage exten-
sions for heterogen e ous quantu m-classical c omputing in an
effort to promote native quantu m kernel programming in a
single-source con text. Critically, qco r puts forward a com-
piler runtime library that enables quantum program execution
in a multi-modal, retargetable fashion. The execution model
enables two mode types of quantum instruction invocation
nisq and ftqc modes (see QCOR Runtime Linkage
in Figure
1). nisq mode supports runtime-level quantum
instruction queueing and flushing upon exit of a quantum
kernel function, implyin g a full quantum circuit submission
on a remote ly hosted quantum computer. ftqc m ode models
a tightly-coupled CPU-QPU integration model, and quan tum
instructions are instead streamed as they are invoked, enablin g
features like fast-feeback on qubit measurement results. The
qcor r untime ultimately delegates to the XACC framework
[
21], and supports remo te execution on IBM, Rigetti, IonQ,
and Honeywell quantum processors, and h as support for
various simulators for the ftqc mode of execution. It is this
runtime that plays a critical role in this work, as it provides a
target for our QIR runtime libr a ry implementatio n. Moreover,
we have provided an entrypoint to OpenQASM compilation
natively a s part of the qcor command line executable.
III. EXTENDING OPENQASM 3
OpenQASM version 3 has recently been put forward as a
formal specification [
13], and a extended Bachus-Naur form
description of the lang uage has been made public as an
ANTLR grammar file. This language departs fr om the previous
version (version 2. 0) in the introduction of classical control
flow and variable declarations, making version 3 much more
friendly to hybr id quantum-classical programming. The lan-
guage pr ovides standard quantum instruction calls, but enables
more complex quantum circuit synthesis via ctrl, adj, and
pow quantum ga te modifiers. Standard while and range-
based for loops are also allowed, as well as th e conditional
if-else block.
While our work seeks to ensure tha t our compiler implemen-
tation is fully compatible with the base grammar specification
for OpenQASM, we also a re in a unique position to enhance it
with features pertinen t to the qcor compiler platform and its
user base. We envision the languag e and com piler presented in
this work as a novel la nguage extension for the qcor quantum
kernel prog ramming mod e l, i.e. enabling users of qcor to
program quantum kernels using our extended OpenQASM
languag e. We seek extensions that (while rem a ining backwards
compatible) enable a more C-like syntax, C-like primitive type
declaration s, and common quantum programming patterns
3
(a) Compute-Action ANTLR Grammar Addition.
1 compute_act ion_stmt
2 : 'compute' com pute_block=programBlock
3 'action' a ction_block=programBlock ;
(b) OpenQASM code leveraging the compute-action statement.
1 qubit q [4];
2 let bot tom_three = q[1:3];
3 compute {
4 rx(1.57) q [0];
5 h bottom_three;
6 for i in [0:3] {
7 cnot q[i], q[i + 1];
8 }
9 } action {
10 rz(2.2) q[4];
11 }
Fig. 3: OpenQASM language extension for compute-
action-uncompute pattern.
already present in the qcor language. To start, we have
extended the gr ammar to provide familiar typedefs for 32
and 64 bit integers and floats, Specifically, we parse int as
int[32], int64_ t as int[64], fl oat as float[ 32],
and double as float[64]. We have also updated the
grammar and implemented the parser to handle bo th range-
based C-like for statements as well as the usual for state-
ment with with initializer, conditional expression, and iteration
expression.
Finally, we see an opportunity to enable syn ta x and seman-
tics for specific compile-time optimizations. We have updated
the Ope nQASM grammar with support for the ubiquitous
compute-action-uncompute pattern, first demonstrated
in [
30]. Given the common pattern W = U V U
, the new
syntax enables o ne to express U an d V as the code in the
compute and action scopes, respectively, and the compiler
auto-generates the U
code after application of the compute,
action segments. This is demonstrated in Figure
3a (addition
to OpenQASM grammar) and
3b (example usage), a nd is
not only useful for readability, but it also gives the compiler
implementation the opportunity to g enerate optimal quantum
code in the situation where the programmer wants to progra m
W controlled on the state of another qubit. In this case, the
compiler syn thesizes U ctrl-V U
as opposed to the naive ctr l-
U ctrl-V ctrl-U
, thereby leading to less multi-qubit operations
and shorter depth quantu m programs.
IV. COMPILER ARCHITECTURE
The architecture we put forward starts with the definition of
a frontend parser for the OpenQASM langu age. This parser
produces an AST and we provide c ustom tree walkers that
traverse the tree and construct a correspond ing MLIR repre-
sentation. To handle variable data and scoping, we introduce
a custom symbol table, enabling the tracking of variable
use-define chains as well as scope visibility. Our ML IR
representation is amenable to general transformation, and we
leverage this for quantum -level optimizatio ns. Ultimately, our
architecture lowers the M LIR down to the LLVM IR. At this
level, the generation of executable code and linking with a
valid QIR runtime implementation is readily accomplished
with standard assemblers a nd linkers.
A. Symbol Table
The symbol table is the data structure used by the com -
piler to cache information about each observed symbol (e.g.,
variable name, its type, its constness, etc.). Since OpenQASM
allows fo r local variables, the symbol table becomes critical for
tracking metadata about the variable symbol and its subsequen t
use. In other words, whe n processing ea ch statement, the
compiler, via the symbol table, is aware of the context of all
visible symbols, i.e., those from this scope and those above,
to perform proper n a me lookup. Therefore the symbol tab le
provides a mechanism to validate various semantic errors, such
as illegal operatio ns for a specific variable type or ref e rring
to o ut-of-scope variables, which could not be detected by
syntactic considerations.
We have implemented a symbol table that is composed
of an array of scope-indexed hash maps. Each map is a
lookup table from variable name to the corresponding MLIR
mlir::Value instance representing the variable. Name
lookup is performed from the current scope upward (to parent
scopes) to find the first match, i.e., the one in the n earest parent
scope. Another utility of the symbol table is the compile-
time evaluation of constant expressions. OpenQASM supports
constant integer an d floating-point variable declarations (via
the const keyword). The symbol table tracks these constant
values and provides a utility to evaluate simple math expres-
sions
1
involving these constants at compile time, if possible.
Critically, the compiler relies on the symbol table to track
qubit use-define chains for qua ntum instruction o perations
(typical quantum gate invocations). We have designed our
quantum instruction operation in the q uantum MLIR dialect
extension to adhere to the value semantics representation first
described in [14], whereby quantum operations consume one
or many qubit mlir:: Value instances and produce one
or many new mli r::Value instances as oper a tion return
types. Using the underlying pointer to the MLIR variable
(mlir::Value) as the looku p key, the symbol table re -
places in put qubit ope rands with the newly-created output
mlir::Value. Therefore, the OpenQASM source c ode is
compiled in to the MLIR representation with explicit use-define
chains for qubits amendable to compile-time optimization
techniques similar to the DAG representation of quantum
circuits.
B. ANTLR Parser
Our compiler implementation leverages the ANTLR [
28]
(ANother Tool for Language Recog nition) too lc hain to gen-
erate the compiler frontend as depicted in Figure
4. With our
extended OpenQASM g rammar as input, ANTLR generates
the corresponding lexer and parser utilities capable of scanning
1
using the C++ Mathematical Expression Toolkit (exprtk) Library
4
Source.qasm
qasm3Lexer
Token Stream
qasm3Parser
ANTLR AST
qasm3Visitor
MLIR Op Tree
ANTLR grammar
OpenQASM.g4
Error Listener
Syntax Errors!
Auto-generated
Fig. 4: ANTLR-based compiler front-end: lexer and par ser
are auto-generated from the extended OpenQASM grammar
(ANTLR grammar file). An error listener is attached to the
parser to capture and report any syntax vio lations encountered
during par sin g. The qasm3Visitor visits each node of
the ANTLR AST constructing the MLIR syntax tree in the
standard, affine, and quantum dialects.
and parsing source strings according to the provided grammar
rules. The compiler frontend produces an AST representing
the input source code against the set of syntactic rules in the
grammar. For instance, a valid OpenQA SM loop ( matching a
syntax rule named loopStatement) will be parsed into a
LoopStatementContext AST node along with all nested
sub-nodes, e.g., the loop termination conditions and the loop
body. ANTLR also generates a base AST visitor interface for
each grammar file, which includes all possible AST node types
that the parser may p roduce. The AST visitor is the mechan ism
we leverage to transfor m the raw OpenQASM syntax tree
into the MLIR representation as we will discuss in th e next
subsection.
While p rocessing the input source, the parser may throw ex-
ceptions indicating syntactic or semantic errors. The compiler
implements the ANTLR BaseErrorListener interface
(see Figure
4) to catch these potential issues and r eport them
to users with detailed inform ation, such as the location of
offending characters.
C. Visitor Handlers
Once th e valid OpenQASM source has been tr ansformed
into the ANTLR AST, the compiler traverses each node in
the AST in a depth-first manner producing the equ ivalent
MLIR tree using the standard (operation s for classical control
flow and memory ref e rences), affine (op e rations for looping),
and quantum dialects (our contribution modeling quantum
operations). Tab le
I summ arized OpenQASM-MLIR rewrite
patterns for important OpenQASM c onstructs.
In particular, quantu m types (qubit and qreg) and clas-
sical types (b oolean, variable-width integer or floating-point
numbers, and arra ys) are mapped to QIR types (Qubit and
Array) or memory-referenced (me mref) MLIR Standard di-
alect types (e.g., i1 for boolean bits, i8 for 8-bit integers, etc.)
Classical math operatio ns are converted to the correspond ing
instructions from the MLIR Standard dialect, such as addi
or cmpi for integer addition or comparison, resp ectively.
Impor tantly, OpenQASM for loops are transformed into an
AffineForOp [
24] (MLIR affine dialect) amenable to future
classical optimization passes, such as loop unrolling.
Intrinsic quantum gates are converted to value-semantics
quantum operations (static single assignment form) o f the
quantum dialect, as shown in Table
I. As described in
Sec.
IV-A, we use the symbol table to tr ack the qubit operand s
(as opaque mlir::Value pointers) a nd replace th e m with
the new values created by each value-semantics quantum gate
operation. In other words, eac h qubit SSA variable (shown
as %k in Table
I) will o nly be assigned and used once, thus
allowing us to trace gate operations on each qubit line. This
is to explicitly define the use-define chains, which we can
leverage in downstream quantum optimizations.
Another key feature of Open Q ASM is the ability to expre ss
quantum gate modifiers (e. g., controlled or adjoint) for both
intrinsic gates and subroutines. Our compiler imp lementation
takes a pragmatic approach by rewriting modifiers into scoped
regions with dedicated MLIR marker operations a s shown in
Table
I. These operations are effectively no-ops, but indicate
to the runtime tha t the following region of quantum operations
is to be handled differently. For example, fo r the ctrl marker
(q.ctrl_region), the operations within that region should
be processed to synthesize the controlled version of that
composite operation. We pre serve the high-level semantics
of these modifiers at both the MLIR and latter LLVM IR
levels ( see
IV-D2) rather than trying to perform compile-
time gate synthesis. Ultimately, the evaluation and synthesis
of these modifier-enclosed blocks will be perform e d by QIR-
compatible runtime implementation.
To handle the nested, recursive nature of the Open-
QASM syntax tree, the compiler uses a multi-layer visit-
ing strategy whereby a standalone visitor-like utility, named
qasm3_expression_generat or, is provid ed to traverse
and process sub-expression no des in-place, if necessary. To
give an example, when visiting a for loop with a m ath
expression as its upper bound, the main AST visitor would use
this qasm3_expression _generator to handle this sub-
expression (converting the math expression to a MLIR equiv-
alent) and then take the resulting value (as a mlir::Value)
to construct the current MLIR for loop.
D. Progressive Lowering
After visiting all the ANTLR AST n odes representing the
input OpenQASM program, the compiler has constructed a
MLIR code in the qu a ntum, affine, standa rd, and built-in
dialects. As depicted in Figure
5, this is the first stage of
a p rogressive, multi-stage IR transformation and lowering
pipeline that produces an optimized executable.
5
TABLE I: O penQASM to MLIR
Constructs
OpenQASM MLIR
Quantum Types
Qubit Register qubit qubi t_array[20]; %0 = q.qalloc(20) { name = qubit_arr ay } : !quantum .Array
Classical Types
Bits bit[20] bi t_array; %0 = alloca() : memref<20xi1>
Integers int[16] sh ort_int; %0 = alloca() : memref<i16>
Floating point numbers float[32] sp_float; %0 = alloca() : memref<f32>
Global Constants const shots = 1024; gl obal_memref "private" constan t @shots : memref<i64> = dense<102 4>
Quantum Instructions
Gates ry(theta) q; %4 = qvs.ry(%2, %3) : !quantum.Qubit
Note: %2: !quantum.Qubit; %3: f64
Measurements b = measure q;
%7 = q.mz(%6) : !quantum.Result
%9 = q.resultCast(%7) : i1
store %9, %4[] : memref<i 1>
Note: %6: !quantum.Qubit; %4: memref<i1>
Classical Operations a += 4;
%c4_i64 = constant 4 : i64
%1 = load %0[] : memref<i64>
%2 = addi %1, %c4_i64 : i64
store %2, %0[] : memref<i 64>
Note: %0: memref<i64> (represents a variable)
Branching if (i == 5) {...}
%c5_i64 = constant 5 : i64
%1 = load %0[] : memref<i64>
%2 = cmpi "eq", %1, %c5_i64 : i64
cond_br %2, ˆbb1, ˆbb2
ˆbb1: // pred: ˆbb0
br ˆbb2
ˆbb2: // 2 preds: ˆbb0, ˆbb1
Note: %0: memref<i64> (represents i variable)
Looping for i in [0:10] {...}
affine.for %arg0 = affine_map <(d0) -> (d0) >(%1) to affin e_map<(d0)
-> (d0)>(%0) {...}֒
Note: %0: index and %1 : index represent the constant values of 10 and 0,
respectively.
Subroutines def foo(float[64]:theta)
qubit[2]:q {...}
func @foo( %arg0: f64, % arg1: !quantum.Array ) {
...
return
}
foo(theta) qq; call @foo( %2, %0) : (f64, !quantum.Array) -> ()
Note: % 2: f64 and %0: !quan tum.Array represent the theta variable and the
q qubit array, respectively.
Extern functions extern foo (float[64]) ->
float[64];
func priva te @ foo(f64) -> f64
Modifiers
Adjoint inv @ phase(pi) q;
q.adj_region {
%2 = qvs.phase(%1, %cst) : !quantum.Qubit
}
Controlled
ctrl @ oracle q[0], q[1];
q.ctrl_region {
%3 = call @oracle(%2) : (!quantum.Qubit) -> !quantum. Qubit
} (ctrl_bi t = %1)
Power
pow(8) @ foo q;
q.pow_u_region {
%c8_i64 = constant 8 : i64
%2 = call @fo o(%1) : (!quan tum.Qubit) -> !quantum.Qubit
} (pow = %c8_i64)
Aliasing
Slicing let slice = reg[0:2:12];
%1 = q.qarray_slice(%0, %c0_i64, %c2_i64 , %c12_i64) :
!quantum.Array֒
Concatenation
let concat = reg1 || reg2;
%2 = q.qarray_concat (%0, %1) : !quantum.Array
Next, we perfor m a set of optimization passes at th e
MLIR level whereby c ontrol flow constructs (e.g., those fro m
the affine dialect) and the static single assignment (SSA)
form of quantum instructions in the qu antum d ia le c t are
suitable for static optimization procedures. The optimized
MLIR code is lowered to the LLVM dialect v ia the MLIR
ConversionPattern utility. At this point, all quantu m-
related operations have been lowered to QIR functions. The
final lowering to LLV M IR and any built-in LLVM optimiza-
tions (e.g., -O 3 optimization) a re provided by the MLIR-
LLVM infrastructur e, produc ing binary executables targetin g
the QIR runtime along with the classical compute ISA (e.g.,
x86/Arm/OpenPC depending on the target platform).
6
OpenQASM
ANTLR
AST
MLIR
MLIR
LLVM
LLVM
IR
Binary
Exe
Lower
-O3
Optimization
Quantum Dialect
Affine Dialect
Standard Dialect
LLVM Dialect
QIR + LLVM
Fig. 5: Compilation pipeline: The ANTLR-based frontend
parses the OpenQASM source string into an Abstract Syn tax
Tree (AST) data structure. By processing (visiting) the AST,
we generate a MLIR representatio n using a couple of dialects,
most importantly, our quantum dialect. A set o f optimization
passes can be ap plied at this stage to simplify the MLIR tree
before it is lowered to the LLVM dialect. At this stage, the
IR tree only contains valid LLVM instructions including QIR-
adherent function calls and types. Standard LLVM optimiza-
tion can be applied when the LLVM dialect is lowered to
bitcode, e.g., the -O3 L LVM optim iz ation flag. L astly, the
LLVM IR bitcode is compiled to binary executable by linking
in a comp a tible QIR runtime implementa tion.
1) Optimization Passes: Figure
6 illustrates the MLIR-
level optim iz ation pipeline that we have implemented in the
OpenQASM compiler. Spe cifically, we combine optimization
techniques from both classical and quantum programming,
such as function inlining, loop unrolling , and various quantum
circuit optimization pr ocedures. Table
II lists the quantum
optimization passes that we have implemente d for the MLIR
operations in our quantum diale ct. Inlining and loop unrolling
are built-in MLIR passes for the standard and affine dialects,
respectively.
The pseudocode in Algorithm 1 illustrates a typical MLIR
optimization pass based on dataflow ana lysis. Specifically,
we show the procedure to perform quantum gate me rging
on the MLIR AST tree. By adhering to value-semantics for
the quantum operations, we are able to follow the use-define
chain of each quantum instruction (shown as the Us er map
in Algorithm
1). With th e SSA dataflow information, we
can query the next quantum operation on the qubit line and
check whether a gate -merge opportun ity exists. For illustration
purposes, we depict the gate merging procedure as two black-
box functions, CanMerge and Merge, implementing check-
ing and gate generation p rocedur e s. The new injected gate
operation will have its input and output SSA values bridging
those two origina l instructions (line 10 and 11 in Algorithm
1).
Each optimization pass operating on th e MLIR operation tree
maintains the SSA value chain as they transform the IR.
We also want to note that this optimization procedure, as
well a s others listed in Table II, is most e ffective when th e
AST is a flat linear region whereby the use-define chain is
uninterr upted (e.g., due to subroutine calls or loops). There-
fore, it is crucial to have loop unrolling and function inlining
passes applied beforeha nd as shown in Figure
6. I n this pass
pipeline, some passes, espec ia lly those performing quantum
TABLE II: List of MLIR Passes
Pass Name Descriptions
Identity Pair
Removal
Simplify or remove redundant quantum instructions.
For example, this pass removes any gates immediately
followed by their adjoints, such as pairs of X-X, T -
T
, or CNOT -CNOT gates on the same qubits.
Repeated qubit reset instructions are also simplified.
Rotation Merg-
ing
Combine consecutive mergable quantum instructions,
e.g., Z and Rz(θ), Rx(θ
1
) and Rx(θ
2
), etc.
Gate Sequence
Simplification
Find a sequence of consecutive compile-time constant
gates and simplify if possible, i.e., resynthesize to
fewer gates. For example, H-T -H gate sequence can
be simplified to Rx(π/4).
Qubit Extract
Lifting
Merge duplicate qubit extracts from registers with
compile-time constant indices. This pass also unifies
the SSA use-define chain after loop unrolling (loop
induction variable as qubit array index) and function
inlining.
Gate Permuta-
tion
Permute gates that are commutative, e.g., Rz on the
control qubit of a CNOT gate. Despite no immediate
benefit (no gate count reduction), this pass might
produce optimization opportunities for others, such as
rotation merging.
Constant Prop-
agation
Propagate global constants, e.g., constant integer val-
ues as loop counts or constant floating points as
rotation angles.
Dead Code
Elimination
(DCE)
Eliminate unused operations (dead code). For example,
qreg allocation whose result is never used can be
eliminated. These dead values may emerge as a result
of other passes.
circuit optimization, are applied multiple times in a loop to
make sure that we can pick up new optimizing pa tterns that
emerged thanks to the code rewrite of previous passes.
In Figu re
7, we demonstrate th e MLIR transformation along
the optimization pipeline for a simple O penQASM source code
(Figure
7a). This code contains a subroutine definition and
later invocation, w hich is compiled to the MLIR CallOp ( line
2 in Figure
7b). This call is then inlined (Figure 7c, line 4-6),
resulting in a CNOT-CNOT identity pair which is removed b y
the identity pair removal pass (Figure 7d). What is left after
this step is a sequence of unused ope rations, such as extracting
qubit addresses and the qreg allocation itself. These are all
dead code, he nce removed by the final DCE pass as shown in
Figure
7e.
2) Dialec t Conversion and Lowering: After simplifying the
MLIR tree with optimization passes such as those listed in
Table
II, the compiler will lower the ML IR representation
to LLVM progressively, as depicted in Figure 5 . This low-
ering procedure is similar to the one describe d in [19] for
OpenQASM 2 compilation. We impleme nted a c ollection of
mlir::ConversionPatterns to perform the co nversion
from quantum dialect to the LLVM dialect targeting the QIR
specification.
As compared to the work in [
19], the lowering pipe line
of the qcor compiler has been enhanced with (1) dialect
conversion fro m affine to LLVM branch-based CFG (control
flow gr aph) representation and (2) conversion pattern im-
plementations for new qu antum dialect operations. The first
one stems from the fact tha t we utilize operations from the
affine dialect to handle control flows (e.g., for loop s) in
7
Inliner
Loop Unroll
Constant
Propagation
Identity Pair
Removal
Gate Merging
Gate
Permutation
Others DCE
Fig. 6: Optimization pass pipeline. Optimiz ation passes simplifying quantum gate operations are repe ated a set number of
times.
ALGORITHM 1 Gate Merging Optimization
Vars:
ops : [VSOp] (Sequence of value semantics ops)
Users: VSOp 7→ [VSOp] (use-define trace mechanism)
CanMerge: ( VSOp, VSOp) 7→ B (Mergeable check)
Merge: (VSOp, VSOp) 7→ VSOp (Crea te merge op)
Gate Merging:
1: dead_o ps: [VSOp] []
2: for op ops do
3: if op dea d_ops then
4: Skip to next op
5: end if
6: if Length(Users(op) ) == 1 then
7: next_op Users(op)[0]
8: if CanMerge(o p, next_op) then
9: merged_op Merge(op, next_op)
10: merged_op.input op.input
11: merged_op.output n ext_op.output
12: Add merged_op to IR tree
13: Append op and next_op to dead_ops
14: end if
15: end if
16: end for
17: for op dead_ops do
18: Erase op from IR tree
19: end for
OpenQASM. The latter involves lowering procedures for new
quantum dialect oper a tions for gate modifiers (see Table I) as
well as the new quantum value semantics instruction. MLIR
modifier-marked regions (ctrl, inv, or pow) are converted
to the calls to correspondin g quantum runtime functions at th e
beginning and the end of the scoped block. To lower the value-
semantics quantum ope rations (MLIR quantum dialect) to
memory-semantics LLVM QIR calls, the qubit SSA variables
are mapped back to their root variable (in the use-define chain )
by propagating the mlir::Value of input qubit operands to
the corresponding outputs.
E. QIR Implementation a nd Linking
The last stage of the compilation work flow, as shown in
Figure 5, involves the c ompilation of LLVM IR bytecode into
a binary object containing QIR function calls, that needs to be
linked to a valid QIR runtim e implemen ta tion to form a n ex-
ecutable. As described in [
19], qcor provides a QIR runtime
library implementation backed by the XACC framework [
21].
Since quantum value-semantics operations are con-
verted to memory-seman tics function calls (e.g., void
__quantum__qis__INSTNAME (Qubit
*
,...)) dur ing
the lowering stage, they are compatible with the existing
QIR intrinsic quantum gates in the runtime. Key extensions
to the QIR runtime to support OpenQASM are the region
marker functions to implement gate modifier concepts, suc h
as contr olled (ctrl) or a djoint (inv). Specifically, when
a quantum gate or subroutine is subjected to a mo difier
directive, the compiler injects the corresponding run time
functions befor e and after the modified operation. For ex-
ample, __qu antum__rt__st art_adj_u_reg ion and
__quantum__rt__end_adj_u _region are f unctions to
denote a region of code wh e reby the inverse (adjoint) of
the collec te d quantum sequ e nce generated within should be
applied. Once again, we take a pragma tic approa ch in imple-
menting the gate modifier fea ture of the OpenQASM language
by delegating the modified circuit re a lization to the run time.
By invoking the wrapped code region at runtime in a special
instruction collection mode, we can retrieve the flattened se-
quence of gates and thus construct the correspond ing modified
circuit (e.g., adjoint or c ontrolled) for backend execution.
Finally, the QIR runtime environment can b e further special-
ized via compilation flags or executable invocation arguments.
For instance, specific ha rdware or simulator backend (qpu)
can be selected, and the quantum runtim e can be configured
to ru n in a tightly-coupled execution mode (simulator-only)
whereby the dynamical measurement-controlled branching is
fully supported.
V. DEMONSTRATION
In this section, we demonstrate the utility and performance
of an MLIR-based compiler. We present a typical OpenQASM
programming, compilation, and execution workflow using the
qcor infrastructure. The compilation speed is benchmarked
against a variety of comparable quantum com pilers. Lastly, we
provide an example showing extensions to the Open QA SM
languag e provided by qcor.
A. Compila tion and execution workflow
Figure
8 shows an OpenQASM source code to compute the
expectation value of Pauli operators (e.g., σ
X
σ
X
in this case)
in an arb itrary state prepared by a variational ansatz. The state-
prepara tion is expressed as a quantum subroutin e (ansatz,
lines 6-10). This is a prototypical proced ure comm only used
in th e variational q uantum eigensolver (VQE) algorithm.
A feature that we want to highlight is the fact that Pauli
expectation accumulation is explicitly expressed as a for loop
(lines 16-28). Note the h qu antum instruction broad casts
across all qubits in the register. In each iteration , we count the
parity of measured bits to compute the average result across all
8
(a) OpenQASM source
1 def foo qubit[2]:qq {
2 cx qq [0], qq[1];
3 }
4
5 qubit q [2];
6 foo q;
7 cx q[0], q[1];
(b) Unoptimized MLIR
1 %0 = q.qalloc(2) { name = q } : !quantum.Array
2 call @f oo(%0) : (!quantum.Array) -> ()
3 %c0_i64 = constan t 0 : i64
4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit
5 %c1_i64 = constan t 1 : i64
6 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit
7 %3:2 = qvs.cx(%1, %2) : !qua ntum.Qubit,
!quantum.Qubit֒
8 q.dealloc(% 0)
9 return
(c) MLIR aft er inlining pass
1 %c0_i64 = constan t 0 : i64
2 %c1_i64 = constan t 1 : i64
3 %0 = q.qalloc(2) { name = q } : !quantum.Array
4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit
5 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit
6 %3:2 = qvs.cx(%1, %2) : !qua ntum.Qubit,
!quantum.Qubit֒
7 %6:2 = qvs.cx(%3#0, %3#1) : !quantum.Qubit ,
!quantum.Qubit֒
8 q.dealloc(% 0)
9 return
(d) MLIR after identity pair removal pass
1 %c0_i64 = constan t 0 : i64
2 %c1_i64 = constan t 1 : i64
3 %0 = q.qalloc(2) { name = q } : !quantum.Array
4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit
5 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit
6 q.dealloc(% 0)
7 return
(e) MLIR aft er DCE
1 return
Fig. 7: MLIR optimization example. The input OpenQASM
source code (a) is first co mpiled into the MLIR representation
(b), which is then processed by a seque nce of optimization
passes. After the call to foo (b, line 2) is inlined (c), the back-
to-back CNOT pattern emerges; thus, both gates are removed
by the identity pair removal pass (d). Finally, the DCE pass
eliminates the redundant qubit array allocation, constant value
declaration s, and qubit extract ca lls (e) since they have no
further use.
runs (sh ots) in line 31. An abbreviated MLIR representation of
the com pute sub routine in Figure
8 is depicted in Figu re 9
after all MLIR-level optimization passes have been applied.
For the sake of presentation, we only keep the high-level
structure of the MLIR printout (omitted regions are presented
as ellipses).
The semantics of the OpenQASM source code is faithfully
1
2 OPENQASM 3;
3 include "stdgates.inc ";
4
5 const shots = 1024;
6 // Stat e-preparation:
7 def ans atz(float[64]:th eta) qubit[2 ]:q {
8 x q[0];
9 ry(theta) q[1];
10 cx q[1], q[0];
11 }
12
13 def compute(float[64] :theta) qubit[2]:q ->
float[64] {֒
14 bit first, second;
15 float[64] num_parity_ones = 0.0;
16 float[64] result;
17 for i in [0:sho ts] {
18 ansatz(theta) q;
19 // Change measuremen t b asis
20 h q;
21 // Measure
22 first = measure q[0];
23 second = measure q[1];
24 if (first != second) {
25 num_parity_one s + = 1.0;
26 }
27 // Reset
28 reset q;
29 }
30
31 // Compute expectation value
32 result = (shots - num_parity_ones) / shot s -
num_parity_one s / shots;֒
33 return result;
34 }
35
36 float[64] th eta, exp_val ;
37 qubit qq[2];
38 // Try a theta value:
39 theta = 0.123;
40 exp_val = compute(theta) qq;
41 print("Avg < X0X1> = ", exp_val);
Fig. 8: OpenQASM example: Compute Pauli expectation
(σ
X
σ
X
) af te r a state-preparation circuit (ansatz).
translated into the MLIR representation consisting of opera-
tions from ou r quantum dialect (e.g., quantum gates) and the
standard dialect (e.g., branching an d arithmetic instructions).
We ca n now transform this high-level IR to executable co de
adhering to the QIR specification (see Figure
5). At the time of
writing, the execution of this type of tightly coupled quantum-
classical program, whereby measurement feedforward is re-
quired, is only applicable to simulator backend s o f qcors
ftqc runtime [
20]. We anticipate that quantum hardware
providers will su pport this dynamical runtime model in the
near future as OpenQASM becomes more mature and widely-
adopted . I mportantly, in our workflow, the runtime implemen-
tation is lin ked in at the final phase of the compilation pipeline,
thus could be provided interchangeably and dynamically to
target different accelerator targets.
9
1 func @c ompute(%arg0: f64, %arg1: !quantum.Arra y) ->
f64 {֒
2 ...
3 br ˆbb1
4 ˆbb1: // 2 preds: ˆ bb0, ˆbb5
5 %5 = load %4[] : memref<i64>
6 %6 = cmpi "slt", %5, %c1024_i64 : i64
7 cond_br %6 , ˆb b2, ˆbb3
8 ˆbb2: // pred: ˆ bb1
9 %7 = q.extract(%arg1, %c0_i64) : !quantum.Qubit
10 %8 = qvs.x(%7) : !quantum .Qubit
11 %9 = q.extract(%arg 1, %c1_i64) : !quantum.Qubit
12 %10 = qvs.ry(%9, %arg0) : !quantum.Qubit
13 %11 = q.extract(%arg1, %c1_i64 ) :
!quantum.Qubit֒
14 %12:2 = qvs.cx(%11, %8) : !quantum.Qubit,
!quantum.Qubit֒
15 ....
16 %23 = cmpi "ne" , %21, %22 : i1
17 cond_br %23, ˆbb4, ˆbb5
18 ˆbb3: // pred: ˆbb1
19 ...
20 %28 = subf %27, %26 : f64
21 ...
22 %37 = divf %34, %36 : f64
23 ...
24 return %39 : f64
25 ˆbb4: // pred: ˆbb2
26 %40 = load %2[] : memref<f64>
27 %41 = addf %40, %cst_0 : f64
28 store %41, %2[] : memref<f64>
29 br ˆbb5
30 ˆbb5: // 2 preds: ˆbb2, ˆbb4
31 %42 = qvs.reset(%14) : !quantum.Qubit
32 %43 = qvs.reset(%16) : !quantum.Qubit
33 %44 = load %4[] : memref<i64>
34 %45 = addi %44, %c1_i64 : i64
35 store %45, %4[] : memref<i64>
36 br ˆbb1
Fig. 9: Truncated MLIR representation of the OpenQASM
program in Figure 8. Here, we only show a simplified MLIR
printou t of the core deuteron func tion w hereby mo st of the
code has been omitted for clarity. H igh-level classical control
flow constructs such as the fo r loop and the if statement in
Figure
8 are converted into LLVM -style CFG constructs and
operations such as blocks and branches. Th anks to MLIR-
level o ptimization (sec. IV-D1), the ansatz subroutine in
Figure
8 has been inlined into the deu teron body as shown
as qvs.x, qvs.ry, and qvs.cx operations. Arithmetic op-
erations are translated to MLIR operations from the Standard
dialect, such as addf, subf, divf (floating-poin t numb ers)
or addi, cmpi (integers).
B. Compiler pe rformance
In this section, we benchmark
2
the compilation and resource
estimation runtime of the OpenQASM compiler against a set
of different quantu m programming languages and frameworks,
such Q#
3
, Qiskit
4
, and t|keti
5
. We benchmark the total runtime
2
Set-up: Intel Xeon CPU E5-2698 v4 @ 2.20GHz running Linux Debian
10 distribution.
3
Microsoft Quantum Development Kit 0.18.2107153439
4
qiskit-terra 0.18.1
5
pytket 0.13.0
10 20 30 40 50
10
0
10
1
10
2
10
3
N qubits
Runtime [secs]
Qiskit
t|keti
Q#-compile
OpenQASM-compile OpenQASM-total
Fig. 10 : Processing time of Trotter circuits for Heisenberg
Hamiltonian models (eq.
1) with a variable number of qubits.
Transpiler optimization level 3 and full peephole optimization
are applied for Qiskit and t|keti, respectively. For Op enQASM
compilation, all MLIR-level optimization passes (see
IV-D1)
and LLVM -O3 optimization are applied. For Q#, QIR
generation mode is used to generate LLVM IR, which is
then optimized (-O3) by LLVM’s clang. Link time to
qcors QIR runtime is included for both OpenQASM and
Q# compilatio n time. Each da ta point is the average of 10
runs with standard d eviation also plotted. OpenQASM total
time includes both co mpilation time and resourc es estimation
runtime (counting all executed gates).
required to generate and execute binary executables in the
resource e stimation mode, i.e., counting flattened quantum
gates, because it provides a mec hanism to compare statically-
compiled executables against Python-based interpreted scripts
constructing the equivalent circuits.
In Figure
10, we plot the c ompile data for Trotter circuits
simulating the generic Heisenberg Hamiltonian model of the
form
H = h
X
i
X
i
J
z
X
i
Z
i
Z
i+1
. (1)
The circuit is constructed by ‘for loops over a fixed number
of Trotter steps (steps = 100, step size (dt) = 0.01) with a
variable number of qubits from 5 to 50. In other words, we
construct the circuit representin g the unitary
U =
100
Y
k=1
Y
i
exp(hdtX
i
)
Y
i
exp(J
z
dtZ
i
Z
i+1
)
, (2)
where each Pauli exponen tial term is converted to an equiv-
alent gate-based sub-circuit. These sub-circuits are r epeated
at each time step to simulate the Trotterized evolution of the
Heisenberg Hamiltonian .
Quantum programmin g lan guages, such as Q# and Open-
QASM, preserve the loop construct in their IR representation,
10
resulting in almost con stant compilation time. We note that the
number of qubits is a compile-time constant fo r both the Q#
and Open Q A SM cases. The compiler ma y choose to unroll
these loops.
The compilation tim e of OpenQASM includes: (1) f rontend
parsing (ANTLR), (2) MLIR generation and optimization,
(3) lowering to LLVM IR, (4) LLVM optimization, and (5)
object code gen eration and linking. For Q#, we leverage
the Microsoft Quantum Development Kit (QDK) to perform
QIR LLVM generation, i.e., equivalent to steps (1)-(3) in o ur
OpenQASM work flow. The QDK-g e nerated LLVM IR is then
optimized and linked with the qcor runtime similar to steps
4 and 5 using the standard LLVM toolchain.
The results in Figure
10 highlight th e need for statically-
compiled quantum programming lang uages in order to de-
scribe large-scale programs. Imperative gate-by-gate construc-
tion of quantum circuits usin g scripting languages, despite its
flexibility and ease of use, does come with a significant p e rfor-
mance overhead. Impor ta ntly, our MLIR-based com piler for
OpenQASM demonstrates improved performance compared
to other compilers, such as Q#. It is worth noting that the
Q# language is much more feature-rich than OpenQASM,
therefore requiring a more elaborated frontend and build
system. In par ticular, initial Q# to QIR g e neration accounts
for the majority (80-90%) of the total Q# compilation time
in Figure
10. As of this writing, there are no other publicly
available OpenQASM compilers that we can compare our
implementation with.
C. Extensions for Optimal Code Gene ration
Here we demonstrate the utility of our proposed extensions
to the OpenQASM grammar specification with r egards to
optimal quantum code generation. Specifically, we show how
the compute-action syntax enables th e compiler imple-
mentation to generate optimal quantum instruction sequences
in the presence of a ctrl gate modifier. As stated in Sec.
III,
controlled operations on the W = U V U
pattern only require
controls applied to the operation s in V . Via programmer intent
i.e. leveraging the custom co mpute {...} action
{...} syntax the compiler can optimally synthesize quan-
tum instructions adhere nt to this pattern.
Take the Heisenberg Hamiltonian and corresponding time
evolution o perator U in Eqs.
1 and 2. In the context of the
quantum phase estimation algorithm, if we seek a correspond-
ing eigenvalue of U with respect to some eigen state |ψi, we
will require the application o f a series o f controlled versions
of U.
Figure
11 shows an OpenQASM subro utine describing
the trotter evolution in E q. 2. The seco nd nested for loop
(lines 18-29) could be w ritten manually a s the sequence
W = CX RZ(Θ) CX
, but by replacing it with a
compute-action block, we give th e c ompiler an oppor-
tunity for optimal instruction synthe sis u nder application of a
ctrl modifier. Specifically, we have implemented an ANTLR
visitor handler that processes the compute-action source
and adds the compute instructions, the acti on instructions,
1 const n b_qubits = 5;
2 def hei senberg_U() qubit[n b_qubits]:r {
3
4 // Extension-provided C-like data types
5 int n b_steps = 100;
6 double step_size = .01;
7 double Jz = 1.0;
8 double h = 1.0;
9
10 // -h
*
sigma_x layers
11 for step in [0:nb_steps] {
12 // -h
*
sigma_x layers
13 rx(-h
*
step_size) r;
14
15 // -Jz
*
sigma_z
*
sigma_z layers
16 for i in [0:nb_ qubits-1] {
17 compute {
18 cx r[i], r[i+1];
19 } action {
20 rz(-Jz
*
step_size) r[i + 1];
21 }
22 // Could be written manually like this
23 // No optimizations picked up
24 // cx r[i], r[i+1];
25 // rz(-Jz
*
step_size) r[i + 1];
26 // cx r[i], r[i+1];
27 }
28 }
29 }
30
31 // Allocate the qubits
32 qubit r[nb_qubits], c;
33
34 // Perform ctrl-U
35 ctrl @ heisenberg_U c, r;
Fig. 11: OpenQASM code defining a subroutine describing the
unitary oper ation in Eq. 2. The use of comp ute-action
from ou r grammar extension enables the compiler imple-
mentation to optimally synth esis con trolled versions of this
subroutine. This code snippet also demon stra te s the C-like
extensions for prim itive data types and for loops.
and the adjoint or reverse of the c ompute instructions to
the MLIR tree. More over, for instructions that are not in the
action block, the com piler marks the added instructions
with a flag to indicate that they are part of the compute
or uncompute block. At runtime, this inf ormation is used to
optimally synthesize controlled versions of this b lock of code.
We benchmark th e usage of compute-action vs manual
programming of W and present the results in Fig
12. The re-
sults show the number of controlled operations (CRZ, CNOT)
present in the compiled quantum program for the manual
(commented lines 22-26) and compute-action (lines 17-2 1)
cases. One can clearly see that via program mer intent, the
compiler can optimally synthesize instruction sequences and
improve on the resource utility of th e compiled program. With
this simple prog ramming extension, progra mmers can pick up
an order o f magnitude in g ate count reductions.
VI. CONCLUSION
We have presented an optimizing ahead-of-time Open-
QASM compiler built on the MLIR fram ework. Our approach
lowers OpenQASM codes to the LLVM IR in a manner that
11
10 20 30 40 50
10
4
10
5
N Qubits
N Controlled Operations (CX, CRZ)
Manual Compute / Action
Fig. 12: Th e number of co ntrolled o perations (CX and CRZ)
present in the compiled representation of Figure
11 for a given
number of qubits. The Manual plot represents a p rogram that
sequentially lists the instruction for the pattern W = U V U
,
while the other plot represents the case w hereby a programmer
expresses this pattern via the compute-action syntax from
our gramm ar extension.
is adhe rent to the QIR specification. We provide quantum
circuit optimization passes at the MLIR level and leverage
existing c la ssical optimization passes pre sent in both MLI R
and LLVM. Our work extends the OpenQASM grammar
with support for C-like constructs a nd the co mpute-action-
uncompute pattern for efficient programming and compile-
time optimizations. Targeting the QIR enables one to swap
runtime library implementatio ns enabling a write once run
anywhere character istic. We have provided a ru ntime library
implementation of the QIR specification that is backed by the
XACC quantum programming framework, thereby enabling
OpenQASM compilation that targets quan tum compute rs from
IBM, Rigetti, H oneywell, IonQ, as well as simulators that scale
from laptops to large-scale heterogeneous high performance
computers, like Summit. Moving f orward, our approach opens
up the possibility of true language integration at the LLVM IR
level. We envision a number of language approaches that map
to the LLVM IR adhe rent to the QIR specifica tion, and via
simple runtime linking, enabling the in tegra tion of quantu m
languag e A with code from quantum language B. As of this
writing, the integration of Q#, qcor C++, and Open QA SM
is now possible. We also envision this work as a n alternative
mechanism for embedded C++ quantu m kernels in qc or,
departing fro m the existing Clang Syntax Handler so urce
preprocessing. Future work will investigate true quantum
languag e integration and compile-time embedding of MLIR-
to-LLVM processing in the qcor C++ language extension.
ACKNOWLEDGMENT
This material is based upon work suppo rted by the U.S.
Departmen t of Energy, Office of Science, National Quantum
Information Scienc e Research Centers. ORNL is managed by
UT-Battelle, LLC, for the US Department o f Energy under
contract no. DE-AC05-00OR22725.
REFERENCES
[1] Circuit synthesis via quantum gate modifiers.
https://qiskit.github.io/openqasm/language/gates.html#quantum-gate-modifiers.
Accessed: 2021-08-30.
[2] Introducing quantum intermediate representation (qir).
https://devblogs.microsoft.com/qsharp/introducing-quantum-intermediate-representation-qir/.
Accessed: 2021-01-15.
[3] Mlir documentation. https://mlir.llvm.org. Accessed: 2021-01-15.
[4] Qcor github. https://github.com/ornl-qci/qcor. Accessed: 2021-01-15.
[5] Quantum intermediate representation (qir) specification.
https://github.com/microsoft/qsharp-language/tree/main/Specifications/QIR.
Accessed: 2021-01-15.
[6] Koen Bertels, J.W. Hogaboam, and Carmen G. Almudever. Quantum
computer architecture: a full-stack overview. Microprocessors and
Microsystems, 66:21–66, 2019.
[7] Benjamin Bichsel, Maximilian Baader, Timon Gehr, and Martin Vechev.
Silq: A high-level quantum language with safe uncomputation and intu-
itive semantics. In Proceedings of the 41st ACM SIGPLAN Conference
on Programming Language Design and Implementation, PLDI 2020,
page 286–300, New York, NY, USA, 2020. Association for Computing
Machinery.
[8] Keith A Britt and Travis S Hum ble. High-performance computing with
quantum processing units. ACM Journal on Em erging Technologies in
Computing Systems (JETC), 13(3):1–13, 2017.
[9] Andrew W. Cross, Ali Javadi-Abhari, Thomas Alexander, Niel de Beau-
drap, Lev S. Bishop, Steven H eidel, Colm A. Ryan, John Smolin, Jay M.
Gambetta, and Blake R. Johnson. Openqasm 3: A broader and deeper
quantum assembly language, 2021.
[10] E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris,
T. Papenbrock, R. C. Pooser, D. J. Dean, and P. Lougovski. Cloud
quantum computing of an atomic nucleus. Phys. Rev. Lett., 120:210501,
May 2018.
[11] Tobias Gysi, Christoph M¨uller, Oleksandr Zinenko, Stephan Herhut,
Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, and Tobias
Grosser. Domain-specific multi-level ir rewriting for gpu, 2020.
[12] Kathleen E. Hamilton, Eugene F. Dumitrescu, and Raphael C. Pooser.
Generative model benchmarks for superconducting qubits. Phys. Rev.
A, 99:062323, Jun 2019.
[13] IBM. Openqasm 3.x live specification, 2021.
https://qiskit.github.io/openqasm/.
[14] David Ittah, Thomas H¨aner, Vadym Kliuchnikov, and Torsten Hoefler.
Enabling dataflow optimization for quantum programs, 2021.
[15] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff Heckey, Alexey
Lvov, Frederic T. Chong, and Margaret Martonosi. Scaffcc: Scalable
compilation and analysis of quantum programs. Parallel Computing,
45:2 17, 2015. Computing Frontiers 2014: Best Papers.
[16] Tian Jin, Gheorghe-Teodor Bercea, Tung D. Le, Tong Chen, Gong
Su, Haruki Imai, Yasushi Negishi, Anh Leu, Kevin O’Brien, Kiyokuni
Kawachiya, and Alexandre E. Eichenberger. Compiling onnx neural
network models using mlir, 2020.
[17] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy
Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasi-
lache, and Oleksandr Zinenko. Mlir: A compiler infrastructure for the
end of moore’s law, 2020.
[18] Alexander McCaskey, Eugene Dumitrescu, Dmitry Liakh, and Travis
Humble. Hybrid programming for near-term quantum computing sys-
tems. In 2018 IEEE International Conference on Rebooting Computing
(ICRC), pages 1–12. IEEE, 2018.
[19] Alexander McCaskey and Thien Nguyen. A mlir dialect for quantum
assembly languages, 2021.
[20] Alexander Mccaskey, Thien Nguyen, Anthony Santana, Daniel Claudino,
Tyler Kharazi, and Hal Finkel. Extending c++ for heterogeneous
quantum-classical computing. ACM Transactions on Quantum Com-
puting, 2(2), July 2021.
12
[21] Alexander J McCaskey, Dmitry I Lyakh, Eugene F Dumitrescu, Sarah S
Powers, and Travis S Humble. XACC: a system-level s oftware in-
frastructure for heterogeneous quantum–classical computing. Quantum
Science and Technology, 5(2):024002, feb 2020.
[22] Alexander J. McCaskey, Zachary P. Parks, J acek Jakowski, Shirley V.
Moore, Titus D. Morris, Travis S. H umble, and Raphael C. Pooser.
Quantum chemistry as a benchmark for near-term quantum computers.
npj Quantum Information, 5(1):99, 2019.
[23] Tiffany M Mintz, Alexander J Mccaskey, Eugene F Dumitrescu,
Shirley V Moore, Sarah Powers, and Pavel Lougovski. Qcor: A language
extension s pecification for the heterogeneous quantum-classical model
of computation. arXiv preprint arXiv:1909.02457, 2019.
[24] MLIR. ’affine’ dialect. https://mlir.llvm.org/docs/Dialects/Affine/.
[25] Yunseong Nam, Neil J. Ross, Yuan Su, Andrew M. Childs, and Dmitri
Maslov. Automated optimization of large quantum circuits with con-
tinuous parameters. npj Quantum Information, 4(1):1–12, May 2018.
Bandiera
abtest: a Cc license type: cc by Cg type: Nature Research
Journals Number: 1 Primary atype: Research Publisher: Nature Pub-
lishing Group Subject
term: Computer science;Quantum information
Subject term id: computer-science;quantum-information.
[26] Thien Nguyen and Alexander J McCaskey. Extending python for
quantum-classical computing via quantum just-in-time compilation.
arXiv preprint arXiv:2105.04671, 2021.
[27] Thien Nguyen, Anthony Santana, Tyler Kharazi, Daniel Claudino, Hal
Finkel, and Alexander McCaskey. Extending c++ for heterogeneous
quantum-classical computing, 2020.
[28] T. J. Parr and R. W. Quong. Antlr: A predicated-ll(k) parser generator.
Software: Practice and Experience, 25(7):789–810, 1995.
[29] Ethan Smith, Marc G. Davis, Jeffrey M. Larson, Ed Younis, Costin
Iancu, and Wim Lavrijsen. Leap: Scaling numerical optimization based
synthesis using an incremental approach, 2021.
[30] Damian S. Steiger, Thomas H¨aner, and Matthias Troyer. Projectq: an
open source software framework for quantum computing. Quantum,
2:49, Jan 2018.
[31] Krysta Svore, Alan Geller, Matthias Troyer, John Azariah, Christopher
Granade, Bettina Heim, Vadym Kliuchnikov, Mariia Mykhailova, Andres
Paz, and Martin Roetteler. Q#: Enabling scalable quantum computing
and development with a high-level dsl. In Proceedings of the Real World
Domain Specific Languages Workshop 2018, RWDSL2018, New York,
NY, USA, 2018. Ass ociation for Computing Machinery.
[32] Ed Younis, Koushik Sen, Katherine Yelick, and Costin Iancu. Qfast:
Quantum synthesis using a hierarchical continuous circuit space, 2020.
APPENDIX A
BENCHMARK EXPERIMENTAL SETUP
Here, we list all the source codes used for the Trotter circuit
benchm arking (Figure
10). The number of qubits in the Q# and
OpenQASM source codes represent a particu la r data point. We
modify the number of qubits and recompile the source code
for the benchmark.
A. Qiskit script
1 from qi skit.aqua.operators import (X, Z, I,
EvolvedOp, PauliTrotterE volution)֒
2 from qi skit import QuantumReg ister, QuantumCircu it
3 from qi skit.compiler impor t transpile
4 from st atistics imp ort mean, stdev
5 import time
6
7 def X_o p(idxs, n_qu bits):
8 op = None
9 if 0 in idxs:
10 op = X
11 else:
12 op = I
13 for i in range(1, n_qubits):
14 if (i in idxs):
15 op ˆ= X
16 else:
17 op ˆ= I
18 return op
19
20 def Z_op(idxs, n_qubits):
21 op = None
22 if 0 in idxs:
23 op = Z
24 else:
25 op = I
26 for i in range(1, n_qubits):
27 if (i in idxs):
28 op ˆ= Z
29 else:
30 op ˆ= I
31 return op
32
33 def heisenberg_ham(n_ qubits):
34 Jz = 1.0
35 h = 1.0
36 H = -h
*
X_op([0], n_qubits)
37 for i in range(1, n_qubits):
38 H = H - h
*
X_op([i], n_qubits)
39 for i in range(n_qubits - 1):
40 H = H - Jz
*
(Z_op([i, i + 1], n_qubits))
41 return H
42
43 n_qubits = [10, 20, 50, 100]
44 nbSteps = 100
45 n_runs = 10
46
47 def trotter_circ(q, exp_args, n_steps):
48 qc = QuantumCircuit(q)
49 for i in range(n_steps):
50 for sub_op in exp_args:
51 qc += PauliTrotterEvolution()
52 .convert(Evolv edOp(sub_op))
53 .to_circuit()
54 return qc
55
56 for nbQubits in n_qubits:
57 data = []
58 for run_id in range(n_runs):
59 ham_op = heisenberg_ham(nbQubits)
60 q = Q uantumRegister(nbQubits, 'q')
61 start = time.time()
62 comp = trotter_circ(q, ham_op. oplist, nbSt eps)
63 comp = transpile(comp, optimiz ation_level=3)
64 end = time.time()
65 data.append(end - start)
66 print('n_qubit s = ', nbQubits, '; Elapsed time =',
mean(data), '+/-', stdev(data), '[secs]')֒
B. t|keti script
1 import time
2 from py tket.circuit import Circuit, PauliExpBox
3 from py tket.pauli import Pauli
4 from py tket.extensions. qiskit import
AerStateBacken d֒
5 from pytket.passes import FullPeepholeOptimise
6 from st atistics import mean, stdev
7 nb_steps = 100
8 step_size = 0.01
9 n_qubits = [5, 10, 20, 30, 40, 50]
10 n_runs = 10
11 for nb_qubits in n_qubits:
12 data = []
13 for run_id in range(n_runs):
14 # Start timer
15 start = time.time()
16 circ = Circuit(nb_qubits)
17 h = 1 .0
18 Jz = 1.0
19 for i in range( nb_steps):
20 # Using Heisenberg Hamiltonian :
21 for q in range(nb_qubits) :
13
22 circ.add_pauli expbox(PauliExpBox(
[Pauli.X], -h
*
step_size), [q])֒
23 for q in range(nb_qubits - 1):
24 circ.add_pauli expbox(PauliExpBox(
[Pauli.Z, Pauli.Z], -Jz
*
step_size), [q, q + 1])
֒
֒
25
26 # Compile to gates
27 backend = AerStateBackend()
28 circ = backend.get_compiled_circuit(circ)
29
30 # Apply optimization
31 FullPeepholeOptimise().apply(circ)
32
33 end = time.time()
34 data.append(end - start)
35
36 print('n_qubits =', nb _qubits, '; Elapsed time
=', mean(d ata), '+/-', stdev(data),
'[secs]')
֒
֒
C. Q# source cod e
1 namespace Benchmark.Heisenberg {
2 open Microsoft.Quantum.Intrinsic;
3 open Microsoft.Quantum.Canon;
4 open Microsoft.Quantum.Math;
5 open Microsoft.Quantum.Convert;
6 open Microsoft.Quantum.Arrays;
7 open Microsoft.Quantum.Measurement;
8 // In this example, w e will show how to
simulate the time evolution of֒
9 // an Heisenber g model:
10 operation HeisenbergTrotterEvolve(nSites : Int,
simulationTime : Double, trotterStepSize :
Double) : Unit {
֒
֒
11 // We pick arbitrary values for the X an d J
couplings֒
12 let hXCoup ling = 1 .0;
13 let jCoupl ing = 1. 0;
14
15 // This determines the number of Trotter
steps֒
16 let steps = Ceiling(simulationTime /
trotterStepSiz e);֒
17
18 // This resizes th e Trotter step so tha t
time evolu tion over the֒
19 // duratio n is accomplis hed.
20 let trotte rStepSizeResized = simulationTime
/ IntAsDouble(steps);֒
21
22 // Let us initialize nSites clean qubits.
These are all in the |0>֒
23 // state.
24 use qubits = Qubit[nSites];
25 // We then evolve for some time
26 for idxSte p in 0 .. ste ps - 1 {
27 for i in 0 .. nS ites - 1 {
28 Exp([PauliX], (-1.0
*
hXCoupling)
*
trotterStepSiz eResized,
[qubits[i]]);
֒
֒
29 }
30 for i in 0 .. nS ites - 2 {
31 Exp([PauliZ, PauliZ], (-1.0
*
jCoupling)
*
trotterStepSiz eResized,
qubits[i . . (i + 1)]);
֒
֒
֒
32 }
33
34 }
35 }
36
37 // Entry point: we allow the Q# program to have
full infor mation (compile-tim e) ab out the
number of qubits, steps, e tc.
֒
֒
38 @EntryPoint()
39 operation CircuitGen() : Unit {
40 HeisenbergTrot terEvolve(50, 1.0, 0.01);
41 }
42 }
D. Open Q ASM source code
1 OPENQASM 3;
2
3 const nb_steps = 100 ;
4 const n b_qubits = 50;
5 const s tep_size = 0.01;
6 const J z = 1.0;
7 const h = 1.0;
8
9 qubit r[nb_qubits];
10 for step in [0:nb_steps] {
11 // -h
*
sigma_x layers
12 for i in [0:nb_qubits] {
13 rx(-h
*
step_size) r[i];
14 }
15
16 // -Jz
*
sigma_z
*
sigma_z layers
17 for i in [0:nb_qubits - 1] {
18 cx r[i], r[i+1];
19 rz(-Jz
*
step_size) r[i + 1];
20 cx r[i], r[i+1];
21 }
22 }
14
This figure "CompilationFlow.png" is available in "png" format from:
http://arxiv.org/ps/2109.00506v1
This figure "pass_uml.png" is available in "png" format from:
http://arxiv.org/ps/2109.00506v1