Enabling Retargetable Optimizing Compilers for Quantum

arXiv:2109.00506v1 [quant-ph] 1 Sep 2021

Enabling Retargetable O ptimizing Compilers for

Quantum Accelerators via a Multi-Level

Intermediate Representation

Thien Nguyen

Quantum Science Center

Oak Ridge National Laboratory

Oak Ridge, TN, United States

nguye n[email protected]

Alexander McCaskey

Quantum Science Center

Oak Ridge National Laboratory

Oak Ridge, TN, United States

[email protected]

Abstract—We present a multi-level quantum-classical interme-

diate representation (IR) that enables an optimizing, retargetable,

ahead-of-time compiler for available quantu m programming

languages. To demonstrate our architecture, we leverage our

proposed IR to enable a compiler for version 3 of the OpenQASM

quantum language speciﬁcation. We support the entire gate-

based OpenQASM 3 language an d provide cu st om extensions for

common quantum programming patterns and improved syntax.

Our work builds upon the Multi-level Intermediate Represen-

tation (MLIR) framework and leverages its unique p rogressive

lowering capabilities to map quantum language expressions to

the LLVM machine-level IR. We provide both quantum and

classical optimizations via the MLIR pattern rewriting sub-

system an d standard LLVM optimization passes, and demon-

strate the programmability, compilation, and execution of our

approach via standard benchmarks and test cases. In comparison

to other standalone language and compiler efforts available today,

our work results in compile times that are 1000x faster than

standard Pythonic approaches, and 5-10x faster than comparative

standalone quantum language compilers. Our compiler provides

quantum resource optimizations via standard programming pat-

terns that result in a 10x reduction in entangling operations, a

common source of program noise. Ultimately, we see this work

as a vehicle for rapid quantum compiler prototyping enabli ng

language integration, optimizations, and interoperability with

classical compilation approaches.

Index Terms—quantum computing, quantum programming,

quantum simulation, programming languages

I. INTRODUCTION

Quantum acceleratio n of existing scientiﬁc computing work-

ﬂows has the potential to enhance computational scalability

for modeling and simulation tasks in ﬁelds such as nucle a r

physics, chemistry, and machine learning [

10], [12], [22].

As hardware architectures continue to scale and improve —

enabling more qubits at lower error rates — one can exp e ct

This manuscript has been authored by UT-Battelle, LL C under Contract No.

DE-AC05-00OR22725 with the U.S. Department of Energy. The United States

Government retains and the publisher, by accepting the article for publication,

acknowledges that the United States Government retains a non-exclusive, paid-

up, irrevocable, world-wide license to publish or reproduce the published form

of this manuscript, or allow others to do so, for United States Government

purposes. The Department of Energy will provide public access to these results

of federally sponsored research in accordance with the DOE Public Access

Plan. (http://energy.gov/downloads/doe-public-access-plan).

quantum-classical machine mode ls to move toward tighter

integration of CPU and QPU resources [

6], [8], [18]. These

architecture s stand to beneﬁt from robust language and com-

pilation approaches that enable high-level classical language

integration, quantum and classical compiler optimization tech-

niques [

25], and compiler-automated circuit synthesis strate-

gies [

29], [32]. There is a critical need to move the quantum

programming community from manual circuit construction via

vendor-provided data structures and frameworks embedded in

application-level classical languages (Python, etc.) toward per-

formant language app roaches that enable tight integration with

existing classical runtimes, libraries, and languages. Recently,

a nu mber of su c h langu age approach e s have begun to brid ge

this gap in the research commun ity, with stand- a lone languages

such as Q# [

31], OpenQASM 3.0 [9], [13], Silq [7], Scaffold

[15], and classical language extensions like QCOR [20], [23].

In parallel to quan tum pro gramming research and develop-

ment, there has been a wealth of work done to improve classi-

cal compilation frameworks and techniques. One result of note

is the introduction of multi-level intermediate representations

enabling compiler representations at a variety of abstraction

levels — including those close to the source language itself

— in tande m with associated progressive lowering workﬂows

that take high-level representations to low-level executable

object code via a hierarchy of intermediate representation (IR)

abstraction. Th is enables robust compiler development for do-

main speciﬁc languages that retain automated language-level

optimizations, transformations, and lowering to machine-level

IRs like the LLVM. Treating quantum program expressions as

stand-alon e domain speciﬁc languages represents a n opportu-

nity to leverage these state-of-the-art classical multi-level IRs.

Speciﬁcally, the MLIR framework [

3], [17] is an example of a

popular multi-level IR in u se today for classically accelerated

heteroge neous workﬂows [

11], [16], and is well-positioned to

provide a unique resource for the rapid pr ototyping of quantum

languag e com pilers via its extensib le language-level I R and

progressive IR-lowering capabilities.

We recently demonstrated the utility of MLIR for simple

quantum assembly lan guages with no true control ﬂow struc-

Quantum

Afﬁne

SCF Std

MLIR Target

LLVM Dialect - LLVM IR

QIR Speciﬁcation

OpenQASM 2 OpenQASM 3

...

OpenQASM 2 Parser OpenQASM 3 Parser

...

Binary Executable

QCOR Runtime Linkage:

NISQ / FTQC

Classical Compute Codegen:

x86, ARM, PowerPC, etc.

Accelerator

Host

Fig. 1: QCOR’s MLIR-based compilation stack for CPU-

QPU heterogeneous computing. Each quantum programming

languag e is processed by a dedicated parser that produces

the AST of the input source code. AST o f different source

languag es is all mapped to an MLIR representation expressed

in the QCOR quantum dialect and built-in Standard, Afﬁne,

and SCF dialects. This MLIR rep resentation is progressively

(multi-stage) transformed (optimization and dialect conver-

sion/lowering) until only operations in th e LLVM dialect

remain, i.e., quantum operations are converted to LLVM

function calls adhering to the QIR speciﬁcation . This guar-

antees that the ﬁnal binary executable is compatible with

any QIR-conformed runtime implementations provided at link

time, su c h as qcor runtime supporting both remotely hosted

(NISQ) and tightly coupled (FTQC) execution models.

tures [

19] — a low level MLIR dialect for quantu m computing.

In this work, we leverage and extend that simple q uantum

MLIR extension for a more complex quantum language —

OpenQASM version 3.0 (henceforth referred to a s Open-

QASM, unless stated otherwise), which provides robust con-

trol ﬂow structures, variable declaration and assignment, and

novel syntax for quantum circuit generation and synthesis [

1].

Our approach enables an optim iz ing com piler for OpenQASM

that compiles to the LLVM machine-level IR adherent to

the recently introduced Quantum Intermediate Representation

(QIR) sp eciﬁcation [

2], [5]. Moreover, this approach need

not be limited to OpenQA SM — the implementation pattern

shown in this work ca n serve as a robust mechanism for further

quantum language compiler prototyping and deployment. The

work presented here puts forward the requisite infrastructure

for quantum language expression and LLVM IR generation,

leaving future language compiler imp le mentations as a matter

of providing the mapping of a language abstract syntax tree

(AST) to our quantum MLIR extension (via ANTLR [

28],

for instance). Language lowering to executable code is then

readily available.

We integrate our approach with the qcor compiler platform

[4], [20] (note we use QCOR to denote the language extension

speciﬁcation [27] and qcor for the compiler implementation).

qcor enables single-sour ce quantum-classical programming

in both C++ and Python, promoting an ahead- of-time C++

compiler executable an d just-in-time compilation infrastruc -

ture for perfor mant quantum-classical code genera tion and

1 %Array = type opaque

2 %Qubit = type opaque

4 declare void @__quantum__q is__cnot(%Qubit

%0,

%Qubit

%1)֒→

5 declare void @__quantum__q is__h(%Qubit

%0)

7 declare void

@__quantum__rt __qubit_release_array(%Array

%0)֒→

8 declare i8

@__quantum__rt __array_get_element_ptr_1d(

%Array

%0, i64 %1)

֒→

9 declare %Array

@__quantum__rt __qubit_allocate_array(i64 %0)֒→

11 define i32 @ghz() {

12 %4 = call %Array

@__quantum__rt __qubit_allocate_array(i64 3)֒→

13 %5 = call i8

@__quantum__rt __array_get_element_ptr_1d(

%Array

%4, i64 0)

֒→

14 %6 = bitcast i8

%5 to %Qubit

15 %7 = load %Qubit

, %Qubit

%6, align 8

16 call void @__quantum__qis__h(%Qubit

%7)

17 %8 = call i8

@__quantum__rt __array_get_element_ptr_1d(

%Array

%4, i64 1)

֒→

18 %9 = bitcast i8

%8 to %Qubit

19 %10 = load %Qubit

, %Qubit

%9, align 8

20 call void @__quantum__qis__cnot(%Qubit

%7,

%Qubit

%10)֒→

21 %11 = call i8

@__quantum__rt __array_get_element_ptr_1d(

%Array

%4, i64 2)

֒→

22 %12 = bitcast i8

%11 to %Qubit

23 %13 = load %Qubit

, %Qubit

%12, align 8

24 call void @__quantum__qis__cnot(%Qubit

%10,

%Qubit

%13)֒→

25 call void

@__quantum__rt __qubit_release_array(%Array

%4)

֒→

26 call void @__quantum__rt__finalize()

27 ret i32 0

28 }

Fig. 2: A quantum pro gram generating the Greenberger-Horne-

Zeilinger (G H Z) state on 3 qubits represented in the LLVM

IR adherent to th e QIR speciﬁcation.

execution. This work extends the qcor executable to stand-

alone Ope nQASM source ﬁles, and enables their compilation

and execution in a retargetable (quantum hardware-agnostic)

fashion.

II. BACKGROUND

Figure

1 demonstrates the hierarc hy o f layers underlying our

compiler architecture. Ultimately, we take quantum-classical

languag es down to an MLIR representation be fore emitting

standard object code via lowering to the LLVM IR (we

leverage the LLVM ecosystem of executables to map an

IR bitcode representation to assembly and executable code).

Here we seek to provide nece ssary background o n the QIR

speciﬁcation, the MLIR framework, and the qcor compiler

frontend and quantum runtime library to set up the presentation

of the rest of the compiler a rchitecture.

A. Quantum Intermediate Representation

Recent work h as r esulted in th e development of a for-

mal speciﬁcation for a Quantum Intermediate Representation

(QIR) embedded in the LLVM IR [

5]. This speciﬁcation does

not extend the LLVM IR with new instructions pertinent to

quantum computing, instead, it expresses quantum speciﬁc op-

erations as declared function calls on opaque data types. Func-

tion declarations and correspond ing signatures are deﬁned for

quantum memory allocation and dealloca tion, individual qubit

addressing, quantum instruction invocation on allocated qubits,

and u tility func tions for arr ay management, tuple creation, and

callable invocation. By describ ing qubits, measurement results,

and arrays as opaque types, and pro moting function decla-

rations over concrete implementations, the QIR speciﬁcation

promotes a ﬂexible approach to quantum-classical compiler

architecture and integration. The approach represents a novel

target for language com pilers — all of the LLVM toolc hain

becomes available, includ ing runtime linking, compile time

classical op timizations, and external language inter operability.

Figure

2 demonstrates a LLVM IR code snippet adherent to

the QIR speciﬁcation for the generation of a Greenberger-

Horne-Z eilinger (GHZ) maximally entangle state. Note the

declaration of opaque Array and Qub it types (lines 1,2)

and the externally declared QIR runtime functions (lines 4-10).

These are left for implementation by appropriate QIR runtime

libraries, affecting actual execution o f quantum instructions,

array handling ( including array s of Qubit instances), and

data type actualization. Th e body of the code consists of

standard LLVM instructions (bitcast, call, etc.) and calls

to the declared QIR runtime functions. These fun ctions are

concretely provided at link time v ia the runtime libra ry.

B. MLIR

Moving up the IR abstraction hierar chy in Figure

1, recent

developments in classical compilation research and develop-

ment has resulted in the MLIR [

17]. The MLIR repre sents a

modular and extensible approach to deﬁning custom compiler

IRs that can express a spectrum of language abstraction

(language-level IR d own to the machine-level LLVM IR). At

its core, the framework pu ts forward the concept of a language

dialect which is composed of lan guage-speciﬁc o perations.

These operations are the core abstraction al unit in the MLIR,

and they model a unique mappin g of operands to return values,

and ca n optionally c arry a dictionary of co mpile-time metad ata

(attributes). Operands and return values are m odeled as an

mlir::Value type, which describes a computable , type d

value and its correspon ding set of users (enabling one to

construct standard use-deﬁne chains). T he creation of dialects

provides a mechanism for mapping language ASTs to a corre-

sponding MLIR operation tree compo sed of language-dialect-

speciﬁc op erations along sid e other utility dialect op e rations

(standard function calls, memory references and allocation, for

loops, conditional statements, etc.). Mo reover, the framework

puts forward a general progressive lowering ca pability —

incrementa l translation of higher dialect operations to lower

level dialect operations. This enables one to deﬁne custom

translations from operations in language-level dialects to op-

erations in a machine-level dialects like the LLVM IR. This

infrastructure and corresp onding workﬂow provide a ﬂexible

architecture for the development of compilation pathways

taking language-level syntax trees down to a machine-level

IR, like the LLVM.

C. qcor

qcor provides C++ [20] and Python [26] lang uage exten-

sions for heterogen e ous quantu m-classical c omputing in an

effort to promote native quantu m kernel programming in a

single-source con text. Critically, qco r puts forward a com-

piler runtime library that enables quantum program execution

in a multi-modal, retargetable fashion. The execution model

enables two mode types of quantum instruction invocation

— nisq and ftqc modes (see QCOR Runtime Linkage

in Figure

1). nisq mode supports runtime-level quantum

instruction queueing and ﬂushing upon exit of a quantum

kernel function, implyin g a full quantum circuit submission

on a remote ly hosted quantum computer. ftqc m ode models

a tightly-coupled CPU-QPU integration model, and quan tum

instructions are instead streamed as they are invoked, enablin g

features like fast-feeback on qubit measurement results. The

qcor r untime ultimately delegates to the XACC framework

[

21], and supports remo te execution on IBM, Rigetti, IonQ,

and Honeywell quantum processors, and h as support for

various simulators for the ftqc mode of execution. It is this

runtime that plays a critical role in this work, as it provides a

target for our QIR runtime libr a ry implementatio n. Moreover,

we have provided an entrypoint to OpenQASM compilation

natively a s part of the qcor command line executable.

III. EXTENDING OPENQASM 3

OpenQASM version 3 has recently been put forward as a

formal speciﬁcation [

13], and a extended Bachus-Naur form

description of the lang uage has been made public as an

ANTLR grammar ﬁle. This language departs fr om the previous

version (version 2. 0) in the introduction of classical control

ﬂow and variable declarations, making version 3 much more

friendly to hybr id quantum-classical programming. The lan-

guage pr ovides standard quantum instruction calls, but enables

more complex quantum circuit synthesis via ctrl, adj, and

pow quantum ga te modiﬁers. Standard while and range-

based for loops are also allowed, as well as th e conditional

if-else block.

While our work seeks to ensure tha t our compiler implemen-

tation is fully compatible with the base grammar speciﬁcation

for OpenQASM, we also a re in a unique position to enhance it

with features pertinen t to the qcor compiler platform and its

user base. We envision the languag e and com piler presented in

this work as a novel la nguage extension for the qcor quantum

kernel prog ramming mod e l, i.e. enabling users of qcor to

program quantum kernels using our extended OpenQASM

languag e. We seek extensions that (while rem a ining backwards

compatible) enable a more C-like syntax, C-like primitive type

declaration s, and common quantum programming patterns

(a) Compute-Action ANTLR Grammar Addition.

1 compute_act ion_stmt

2 : 'compute' com pute_block=programBlock

3 'action' a ction_block=programBlock ;

(b) OpenQASM code leveraging the compute-action statement.

1 qubit q [4];

2 let bot tom_three = q[1:3];

3 compute {

4 rx(1.57) q [0];

5 h bottom_three;

6 for i in [0:3] {

7 cnot q[i], q[i + 1];

8 }

9 } action {

10 rz(2.2) q[4];

11 }

Fig. 3: OpenQASM language extension for compute-

action-uncompute pattern.

already present in the qcor language. To start, we have

extended the gr ammar to provide familiar typedefs for 32

and 64 bit integers and ﬂoats, Speciﬁcally, we parse int as

int[32], int64_ t as int[64], fl oat as float[ 32],

and double as float[64]. We have also updated the

grammar and implemented the parser to handle bo th range-

based C-like for statements as well as the usual for state-

ment with with initializer, conditional expression, and iteration

expression.

Finally, we see an opportunity to enable syn ta x and seman-

tics for speciﬁc compile-time optimizations. We have updated

the Ope nQASM grammar with support for the ubiquitous

compute-action-uncompute pattern, ﬁrst demonstrated

in [

30]. Given the common pattern W = U V U

†

, the new

syntax enables o ne to express U an d V as the code in the

compute and action scopes, respectively, and the compiler

auto-generates the U

†

code after application of the compute,

action segments. This is demonstrated in Figure

3a (addition

to OpenQASM grammar) and

3b (example usage), a nd is

not only useful for readability, but it also gives the compiler

implementation the opportunity to g enerate optimal quantum

code in the situation where the programmer wants to progra m

W controlled on the state of another qubit. In this case, the

compiler syn thesizes U ctrl-V U

†

as opposed to the naive ctr l-

U ctrl-V ctrl-U

†

, thereby leading to less multi-qubit operations

and shorter depth quantu m programs.

IV. COMPILER ARCHITECTURE

The architecture we put forward starts with the deﬁnition of

a frontend parser for the OpenQASM langu age. This parser

produces an AST and we provide c ustom tree walkers that

traverse the tree and construct a correspond ing MLIR repre-

sentation. To handle variable data and scoping, we introduce

a custom symbol table, enabling the tracking of variable

use-deﬁne chains as well as scope visibility. Our ML IR

representation is amenable to general transformation, and we

leverage this for quantum -level optimizatio ns. Ultimately, our

architecture lowers the M LIR down to the LLVM IR. At this

level, the generation of executable code and linking with a

valid QIR runtime implementation is readily accomplished

with standard assemblers a nd linkers.

A. Symbol Table

The symbol table is the data structure used by the com -

piler to cache information about each observed symbol (e.g.,

variable name, its type, its constness, etc.). Since OpenQASM

allows fo r local variables, the symbol table becomes critical for

tracking metadata about the variable symbol and its subsequen t

use. In other words, whe n processing ea ch statement, the

compiler, via the symbol table, is aware of the context of all

visible symbols, i.e., those from this scope and those above,

to perform proper n a me lookup. Therefore the symbol tab le

provides a mechanism to validate various semantic errors, such

as illegal operatio ns for a speciﬁc variable type or ref e rring

to o ut-of-scope variables, which could not be detected by

syntactic considerations.

We have implemented a symbol table that is composed

of an array of scope-indexed hash maps. Each map is a

lookup table from variable name to the corresponding MLIR

mlir::Value instance representing the variable. Name

lookup is performed from the current scope upward (to parent

scopes) to ﬁnd the ﬁrst match, i.e., the one in the n earest parent

scope. Another utility of the symbol table is the compile-

time evaluation of constant expressions. OpenQASM supports

constant integer an d ﬂoating-point variable declarations (via

the const keyword). The symbol table tracks these constant

values and provides a utility to evaluate simple math expres-

sions

involving these constants at compile time, if possible.

Critically, the compiler relies on the symbol table to track

qubit use-deﬁne chains for qua ntum instruction o perations

(typical quantum gate invocations). We have designed our

quantum instruction operation in the q uantum MLIR dialect

extension to adhere to the value semantics representation ﬁrst

described in [14], whereby quantum operations consume one

or many qubit mlir:: Value instances and produce one

or many new mli r::Value instances as oper a tion return

types. Using the underlying pointer to the MLIR variable

(mlir::Value) as the looku p key, the symbol table re -

places in put qubit ope rands with the newly-created output

mlir::Value. Therefore, the OpenQASM source c ode is

compiled in to the MLIR representation with explicit use-deﬁne

chains for qubits amendable to compile-time optimization

techniques similar to the DAG representation of quantum

circuits.

B. ANTLR Parser

Our compiler implementation leverages the ANTLR [

28]

(ANother Tool for Language Recog nition) too lc hain to gen-

erate the compiler frontend as depicted in Figure

4. With our

extended OpenQASM g rammar as input, ANTLR generates

the corresponding lexer and parser utilities capable of scanning

using the C++ Mathematical Expression Toolkit (exprtk) Library

Source.qasm

qasm3Lexer

Token Stream

qasm3Parser

ANTLR AST

qasm3Visitor

MLIR Op Tree

ANTLR grammar

OpenQASM.g4

Error Listener

Syntax Errors!

Auto-generated

Fig. 4: ANTLR-based compiler front-end: lexer and par ser

are auto-generated from the extended OpenQASM grammar

(ANTLR grammar ﬁle). An error listener is attached to the

parser to capture and report any syntax vio lations encountered

during par sin g. The qasm3Visitor visits each node of

the ANTLR AST constructing the MLIR syntax tree in the

standard, afﬁne, and quantum dialects.

and parsing source strings according to the provided grammar

rules. The compiler frontend produces an AST representing

the input source code against the set of syntactic rules in the

grammar. For instance, a valid OpenQA SM loop ( matching a

syntax rule named loopStatement) will be parsed into a

LoopStatementContext AST node along with all nested

sub-nodes, e.g., the loop termination conditions and the loop

body. ANTLR also generates a base AST visitor interface for

each grammar ﬁle, which includes all possible AST node types

that the parser may p roduce. The AST visitor is the mechan ism

we leverage to transfor m the raw OpenQASM syntax tree

into the MLIR representation as we will discuss in th e next

subsection.

While p rocessing the input source, the parser may throw ex-

ceptions indicating syntactic or semantic errors. The compiler

implements the ANTLR BaseErrorListener interface

(see Figure

4) to catch these potential issues and r eport them

to users with detailed inform ation, such as the location of

offending characters.

C. Visitor Handlers

Once th e valid OpenQASM source has been tr ansformed

into the ANTLR AST, the compiler traverses each node in

the AST in a depth-ﬁrst manner producing the equ ivalent

MLIR tree using the standard (operation s for classical control

ﬂow and memory ref e rences), afﬁne (op e rations for looping),

and quantum dialects (our contribution modeling quantum

operations). Tab le

I summ arized OpenQASM-MLIR rewrite

patterns for important OpenQASM c onstructs.

In particular, quantu m types (qubit and qreg) and clas-

sical types (b oolean, variable-width integer or ﬂoating-point

numbers, and arra ys) are mapped to QIR types (Qubit and

Array) or memory-referenced (me mref) MLIR Standard di-

alect types (e.g., i1 for boolean bits, i8 for 8-bit integers, etc.)

Classical math operatio ns are converted to the correspond ing

instructions from the MLIR Standard dialect, such as addi

or cmpi for integer addition or comparison, resp ectively.

Impor tantly, OpenQASM for loops are transformed into an

AffineForOp [

24] (MLIR afﬁne dialect) amenable to future

classical optimization passes, such as loop unrolling.

Intrinsic quantum gates are converted to value-semantics

quantum operations (static single assignment form) o f the

quantum dialect, as shown in Table

I. As described in

Sec.

IV-A, we use the symbol table to tr ack the qubit operand s

(as opaque mlir::Value pointers) a nd replace th e m with

the new values created by each value-semantics quantum gate

operation. In other words, eac h qubit SSA variable (shown

as %k in Table

I) will o nly be assigned and used once, thus

allowing us to trace gate operations on each qubit line. This

is to explicitly deﬁne the use-deﬁne chains, which we can

leverage in downstream quantum optimizations.

Another key feature of Open Q ASM is the ability to expre ss

quantum gate modiﬁers (e. g., controlled or adjoint) for both

intrinsic gates and subroutines. Our compiler imp lementation

takes a pragmatic approach by rewriting modiﬁers into scoped

regions with dedicated MLIR marker operations a s shown in

Table

I. These operations are effectively no-ops, but indicate

to the runtime tha t the following region of quantum operations

is to be handled differently. For example, fo r the ctrl marker

(q.ctrl_region), the operations within that region should

be processed to synthesize the controlled version of that

composite operation. We pre serve the high-level semantics

of these modiﬁers at both the MLIR and latter LLVM IR

levels ( see

IV-D2) rather than trying to perform compile-

time gate synthesis. Ultimately, the evaluation and synthesis

of these modiﬁer-enclosed blocks will be perform e d by QIR-

compatible runtime implementation.

To handle the nested, recursive nature of the Open-

QASM syntax tree, the compiler uses a multi-layer visit-

ing strategy whereby a standalone visitor-like utility, named

qasm3_expression_generat or, is provid ed to traverse

and process sub-expression no des in-place, if necessary. To

give an example, when visiting a for loop with a m ath

expression as its upper bound, the main AST visitor would use

this qasm3_expression _generator to handle this sub-

expression (converting the math expression to a MLIR equiv-

alent) and then take the resulting value (as a mlir::Value)

to construct the current MLIR for loop.

D. Progressive Lowering

After visiting all the ANTLR AST n odes representing the

input OpenQASM program, the compiler has constructed a

MLIR code in the qu a ntum, afﬁne, standa rd, and built-in

dialects. As depicted in Figure

5, this is the ﬁrst stage of

a p rogressive, multi-stage IR transformation and lowering

pipeline that produces an optimized executable.

TABLE I: O penQASM to MLIR

Constructs

OpenQASM MLIR

Quantum Types

Qubit Register qubit qubi t_array[20]; %0 = q.qalloc(20) { name = qubit_arr ay } : !quantum .Array

Classical Types

Bits bit[20] bi t_array; %0 = alloca() : memref<20xi1>

Integers int[16] sh ort_int; %0 = alloca() : memref<i16>

Floating point numbers float[32] sp_float; %0 = alloca() : memref<f32>

Global Constants const shots = 1024; gl obal_memref "private" constan t @shots : memref<i64> = dense<102 4>

Quantum Instructions

Gates ry(theta) q; %4 = qvs.ry(%2, %3) : !quantum.Qubit

Note: %2: !quantum.Qubit; %3: f64

Measurements b = measure q;

%7 = q.mz(%6) : !quantum.Result

%9 = q.resultCast(%7) : i1

store %9, %4[] : memref<i 1>

Note: %6: !quantum.Qubit; %4: memref<i1>

Classical Operations a += 4;

%c4_i64 = constant 4 : i64

%1 = load %0[] : memref<i64>

%2 = addi %1, %c4_i64 : i64

store %2, %0[] : memref<i 64>

Note: %0: memref<i64> (represents a variable)

Branching if (i == 5) {...}

%c5_i64 = constant 5 : i64

%1 = load %0[] : memref<i64>

%2 = cmpi "eq", %1, %c5_i64 : i64

cond_br %2, ˆbb1, ˆbb2

ˆbb1: // pred: ˆbb0

br ˆbb2

ˆbb2: // 2 preds: ˆbb0, ˆbb1

Note: %0: memref<i64> (represents i variable)

Looping for i in [0:10] {...}

affine.for %arg0 = affine_map <(d0) -> (d0) >(%1) to affin e_map<(d0)

-> (d0)>(%0) {...}֒→

Note: %0: index and %1 : index represent the constant values of 10 and 0,

respectively.

Subroutines def foo(float[64]:theta)

qubit[2]:q {...}

func @foo( %arg0: f64, % arg1: !quantum.Array ) {

...

return

}

foo(theta) qq; call @foo( %2, %0) : (f64, !quantum.Array) -> ()

Note: % 2: f64 and %0: !quan tum.Array represent the theta variable and the

q qubit array, respectively.

Extern functions extern foo (float[64]) ->

float[64];

func priva te @ foo(f64) -> f64

Modiﬁers

Adjoint inv @ phase(pi) q;

q.adj_region {

%2 = qvs.phase(%1, %cst) : !quantum.Qubit

}

Controlled

ctrl @ oracle q[0], q[1];

q.ctrl_region {

%3 = call @oracle(%2) : (!quantum.Qubit) -> !quantum. Qubit

} (ctrl_bi t = %1)

Power

pow(8) @ foo q;

q.pow_u_region {

%c8_i64 = constant 8 : i64

%2 = call @fo o(%1) : (!quan tum.Qubit) -> !quantum.Qubit

} (pow = %c8_i64)

Aliasing

Slicing let slice = reg[0:2:12];

%1 = q.qarray_slice(%0, %c0_i64, %c2_i64 , %c12_i64) :

!quantum.Array֒→

Concatenation

let concat = reg1 || reg2;

%2 = q.qarray_concat (%0, %1) : !quantum.Array

Next, we perfor m a set of optimization passes at th e

MLIR level whereby c ontrol ﬂow constructs (e.g., those fro m

the afﬁne dialect) and the static single assignment (SSA)

form of quantum instructions in the qu antum d ia le c t are

suitable for static optimization procedures. The optimized

MLIR code is lowered to the LLVM dialect v ia the MLIR

ConversionPattern utility. At this point, all quantu m-

related operations have been lowered to QIR functions. The

ﬁnal lowering to LLV M IR and any built-in LLVM optimiza-

tions (e.g., -O 3 optimization) a re provided by the MLIR-

LLVM infrastructur e, produc ing binary executables targetin g

the QIR runtime along with the classical compute ISA (e.g.,

x86/Arm/OpenPC depending on the target platform).

OpenQASM

ANTLR

AST

MLIR

LLVM

Binary

Exe

Lower

-O3

Optimization

Quantum Dialect

Afﬁne Dialect

Standard Dialect

LLVM Dialect

QIR + LLVM

Fig. 5: Compilation pipeline: The ANTLR-based frontend

parses the OpenQASM source string into an Abstract Syn tax

Tree (AST) data structure. By processing (visiting) the AST,

we generate a MLIR representatio n using a couple of dialects,

most importantly, our quantum dialect. A set o f optimization

passes can be ap plied at this stage to simplify the MLIR tree

before it is lowered to the LLVM dialect. At this stage, the

IR tree only contains valid LLVM instructions including QIR-

adherent function calls and types. Standard LLVM optimiza-

tion can be applied when the LLVM dialect is lowered to

bitcode, e.g., the -O3 L LVM optim iz ation ﬂag. L astly, the

LLVM IR bitcode is compiled to binary executable by linking

in a comp a tible QIR runtime implementa tion.

1) Optimization Passes: Figure

6 illustrates the MLIR-

level optim iz ation pipeline that we have implemented in the

OpenQASM compiler. Spe ciﬁcally, we combine optimization

techniques from both classical and quantum programming,

such as function inlining, loop unrolling , and various quantum

circuit optimization pr ocedures. Table

II lists the quantum

optimization passes that we have implemente d for the MLIR

operations in our quantum diale ct. Inlining and loop unrolling

are built-in MLIR passes for the standard and afﬁne dialects,

respectively.

The pseudocode in Algorithm 1 illustrates a typical MLIR

optimization pass based on dataﬂow ana lysis. Speciﬁcally,

we show the procedure to perform quantum gate me rging

on the MLIR AST tree. By adhering to value-semantics for

the quantum operations, we are able to follow the use-deﬁne

chain of each quantum instruction (shown as the Us er map

in Algorithm

1). With th e SSA dataﬂow information, we

can query the next quantum operation on the qubit line and

check whether a gate -merge opportun ity exists. For illustration

purposes, we depict the gate merging procedure as two black-

box functions, CanMerge and Merge, implementing check-

ing and gate generation p rocedur e s. The new injected gate

operation will have its input and output SSA values bridging

those two origina l instructions (line 10 and 11 in Algorithm

1).

Each optimization pass operating on th e MLIR operation tree

maintains the SSA value chain as they transform the IR.

We also want to note that this optimization procedure, as

well a s others listed in Table II, is most e ffective when th e

AST is a ﬂat linear region whereby the use-deﬁne chain is

uninterr upted (e.g., due to subroutine calls or loops). There-

fore, it is crucial to have loop unrolling and function inlining

passes applied beforeha nd as shown in Figure

6. I n this pass

pipeline, some passes, espec ia lly those performing quantum

TABLE II: List of MLIR Passes

Pass Name Descriptions

Identity Pair

Removal

Simplify or remove redundant quantum instructions.

For example, this pass removes any gates immediately

followed by their adjoints, such as pairs of X-X, T -

†

, or CNOT -CNOT gates on the same qubits.

Repeated qubit reset instructions are also simpliﬁed.

Rotation Merg-

ing

Combine consecutive mergable quantum instructions,

e.g., Z and Rz(θ), Rx(θ

) and Rx(θ

), etc.

Gate Sequence

Simpliﬁcation

Find a sequence of consecutive compile-time constant

gates and simplify if possible, i.e., resynthesize to

fewer gates. For example, H-T -H gate sequence can

be simpliﬁed to Rx(π/4).

Qubit Extract

Lifting

Merge duplicate qubit extracts from registers with

compile-time constant indices. This pass also uniﬁes

the SSA use-deﬁne chain after loop unrolling (loop

induction variable as qubit array index) and function

inlining.

Gate Permuta-

tion

Permute gates that are commutative, e.g., Rz on the

control qubit of a CNOT gate. Despite no immediate

beneﬁt (no gate count reduction), this pass might

produce optimization opportunities for others, such as

rotation merging.

Constant Prop-

agation

Propagate global constants, e.g., constant integer val-

ues as loop counts or constant ﬂoating points as

rotation angles.

Dead Code

Elimination

(DCE)

Eliminate unused operations (dead code). For example,

qreg allocation whose result is never used can be

eliminated. These dead values may emerge as a result

of other passes.

circuit optimization, are applied multiple times in a loop to

make sure that we can pick up new optimizing pa tterns that

emerged thanks to the code rewrite of previous passes.

In Figu re

7, we demonstrate th e MLIR transformation along

the optimization pipeline for a simple O penQASM source code

(Figure

7a). This code contains a subroutine deﬁnition and

later invocation, w hich is compiled to the MLIR CallOp ( line

2 in Figure

7b). This call is then inlined (Figure 7c, line 4-6),

resulting in a CNOT-CNOT identity pair which is removed b y

the identity pair removal pass (Figure 7d). What is left after

this step is a sequence of unused ope rations, such as extracting

qubit addresses and the qreg allocation itself. These are all

dead code, he nce removed by the ﬁnal DCE pass as shown in

Figure

7e.

2) Dialec t Conversion and Lowering: After simplifying the

MLIR tree with optimization passes such as those listed in

Table

II, the compiler will lower the ML IR representation

to LLVM progressively, as depicted in Figure 5 . This low-

ering procedure is similar to the one describe d in [19] for

OpenQASM 2 compilation. We impleme nted a c ollection of

mlir::ConversionPatterns to perform the co nversion

from quantum dialect to the LLVM dialect targeting the QIR

speciﬁcation.

As compared to the work in [

19], the lowering pipe line

of the qcor compiler has been enhanced with (1) dialect

conversion fro m afﬁne to LLVM branch-based CFG (control

ﬂow gr aph) representation and (2) conversion pattern im-

plementations for new qu antum dialect operations. The ﬁrst

one stems from the fact tha t we utilize operations from the

afﬁne dialect to handle control ﬂows (e.g., for loop s) in

Inliner

Loop Unroll

Constant

Propagation

Identity Pair

Removal

Gate Merging

Gate

Permutation

Others DCE

Fig. 6: Optimization pass pipeline. Optimiz ation passes simplifying quantum gate operations are repe ated a set number of

times.

ALGORITHM 1 Gate Merging Optimization

Vars:

• ops : [VSOp] (Sequence of value semantics ops)

• Users: VSOp 7→ [VSOp] (use-deﬁne trace mechanism)

• CanMerge: ( VSOp, VSOp) 7→ B (Mergeable check)

• Merge: (VSOp, VSOp) 7→ VSOp (Crea te merge op)

Gate Merging:

1: dead_o ps: [VSOp] ← []

2: for op ∈ ops do

3: if op ∈ dea d_ops then

4: Skip to next op

5: end if

6: if Length(Users(op) ) == 1 then

7: next_op ← Users(op)[0]

8: if CanMerge(o p, next_op) then

9: merged_op ← Merge(op, next_op)

10: merged_op.input ← op.input

11: merged_op.output ← n ext_op.output

12: Add merged_op to IR tree

13: Append op and next_op to dead_ops

14: end if

15: end if

16: end for

17: for op ∈ dead_ops do

18: Erase op from IR tree

19: end for

OpenQASM. The latter involves lowering procedures for new

quantum dialect oper a tions for gate modiﬁers (see Table I) as

well as the new quantum value semantics instruction. MLIR

modiﬁer-marked regions (ctrl, inv, or pow) are converted

to the calls to correspondin g quantum runtime functions at th e

beginning and the end of the scoped block. To lower the value-

semantics quantum ope rations (MLIR quantum dialect) to

memory-semantics LLVM QIR calls, the qubit SSA variables

are mapped back to their root variable (in the use-deﬁne chain )

by propagating the mlir::Value of input qubit operands to

the corresponding outputs.

E. QIR Implementation a nd Linking

The last stage of the compilation work ﬂow, as shown in

Figure 5, involves the c ompilation of LLVM IR bytecode into

a binary object containing QIR function calls, that needs to be

linked to a valid QIR runtim e implemen ta tion to form a n ex-

ecutable. As described in [

19], qcor provides a QIR runtime

library implementation backed by the XACC framework [

21].

Since quantum value-semantics operations are con-

verted to memory-seman tics function calls (e.g., void

__quantum__qis__INSTNAME (Qubit

,...)) dur ing

the lowering stage, they are compatible with the existing

QIR intrinsic quantum gates in the runtime. Key extensions

to the QIR runtime to support OpenQASM are the region

marker functions to implement gate modiﬁer concepts, suc h

as contr olled (ctrl) or a djoint (inv). Speciﬁcally, when

a quantum gate or subroutine is subjected to a mo diﬁer

directive, the compiler injects the corresponding run time

functions befor e and after the modiﬁed operation. For ex-

ample, __qu antum__rt__st art_adj_u_reg ion and

__quantum__rt__end_adj_u _region are f unctions to

denote a region of code wh e reby the inverse (adjoint) of

the collec te d quantum sequ e nce generated within should be

applied. Once again, we take a pragma tic approa ch in imple-

menting the gate modiﬁer fea ture of the OpenQASM language

by delegating the modiﬁed circuit re a lization to the run time.

By invoking the wrapped code region at runtime in a special

instruction collection mode, we can retrieve the ﬂattened se-

quence of gates and thus construct the correspond ing modiﬁed

circuit (e.g., adjoint or c ontrolled) for backend execution.

Finally, the QIR runtime environment can b e further special-

ized via compilation ﬂags or executable invocation arguments.

For instance, speciﬁc ha rdware or simulator backend (qpu)

can be selected, and the quantum runtim e can be conﬁgured

to ru n in a tightly-coupled execution mode (simulator-only)

whereby the dynamical measurement-controlled branching is

fully supported.

V. DEMONSTRATION

In this section, we demonstrate the utility and performance

of an MLIR-based compiler. We present a typical OpenQASM

programming, compilation, and execution workﬂow using the

qcor infrastructure. The compilation speed is benchmarked

against a variety of comparable quantum com pilers. Lastly, we

provide an example showing extensions to the Open QA SM

languag e provided by qcor.

A. Compila tion and execution workﬂow

Figure

8 shows an OpenQASM source code to compute the

expectation value of Pauli operators (e.g., σ

in this case)

in an arb itrary state prepared by a variational ansatz. The state-

prepara tion is expressed as a quantum subroutin e (ansatz,

lines 6-10). This is a prototypical proced ure comm only used

in th e variational q uantum eigensolver (VQE) algorithm.

A feature that we want to highlight is the fact that Pauli

expectation accumulation is explicitly expressed as a for loop

(lines 16-28). Note the h qu antum instruction broad casts

across all qubits in the register. In each iteration , we count the

parity of measured bits to compute the average result across all

(a) OpenQASM source

1 def foo qubit[2]:qq {

2 cx qq [0], qq[1];

3 }

5 qubit q [2];

6 foo q;

7 cx q[0], q[1];

(b) Unoptimized MLIR

1 %0 = q.qalloc(2) { name = q } : !quantum.Array

2 call @f oo(%0) : (!quantum.Array) -> ()

3 %c0_i64 = constan t 0 : i64

4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit

5 %c1_i64 = constan t 1 : i64

6 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit

7 %3:2 = qvs.cx(%1, %2) : !qua ntum.Qubit,

!quantum.Qubit֒→

8 q.dealloc(% 0)

9 return

1 %c0_i64 = constan t 0 : i64

2 %c1_i64 = constan t 1 : i64

3 %0 = q.qalloc(2) { name = q } : !quantum.Array

4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit

5 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit

6 %3:2 = qvs.cx(%1, %2) : !qua ntum.Qubit,

!quantum.Qubit֒→

7 %6:2 = qvs.cx(%3#0, %3#1) : !quantum.Qubit ,

!quantum.Qubit֒→

8 q.dealloc(% 0)

9 return

(d) MLIR after identity pair removal pass

1 %c0_i64 = constan t 0 : i64

2 %c1_i64 = constan t 1 : i64

3 %0 = q.qalloc(2) { name = q } : !quantum.Array

4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit

5 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit

6 q.dealloc(% 0)

7 return

(e) MLIR aft er DCE

1 return

Fig. 7: MLIR optimization example. The input OpenQASM

source code (a) is ﬁrst co mpiled into the MLIR representation

(b), which is then processed by a seque nce of optimization

passes. After the call to foo (b, line 2) is inlined (c), the back-

to-back CNOT pattern emerges; thus, both gates are removed

by the identity pair removal pass (d). Finally, the DCE pass

eliminates the redundant qubit array allocation, constant value

declaration s, and qubit extract ca lls (e) since they have no

further use.

runs (sh ots) in line 31. An abbreviated MLIR representation of

the com pute sub routine in Figure

8 is depicted in Figu re 9

after all MLIR-level optimization passes have been applied.

For the sake of presentation, we only keep the high-level

structure of the MLIR printout (omitted regions are presented

as ellipses).

The semantics of the OpenQASM source code is faithfully

2 OPENQASM 3;

3 include "stdgates.inc ";

5 const shots = 1024;

6 // Stat e-preparation:

7 def ans atz(float[64]:th eta) qubit[2 ]:q {

8 x q[0];

9 ry(theta) q[1];

10 cx q[1], q[0];

11 }

13 def compute(float[64] :theta) qubit[2]:q ->

float[64] {֒→

14 bit first, second;

15 float[64] num_parity_ones = 0.0;

16 float[64] result;

17 for i in [0:sho ts] {

18 ansatz(theta) q;

19 // Change measuremen t b asis

20 h q;

21 // Measure

22 first = measure q[0];

23 second = measure q[1];

24 if (first != second) {

25 num_parity_one s + = 1.0;

26 }

27 // Reset

28 reset q;

29 }

31 // Compute expectation value

32 result = (shots - num_parity_ones) / shot s -

num_parity_one s / shots;֒→

33 return result;

34 }

36 float[64] th eta, exp_val ;

37 qubit qq[2];

38 // Try a theta value:

39 theta = 0.123;

40 exp_val = compute(theta) qq;

41 print("Avg < X0X1> = ", exp_val);

Fig. 8: OpenQASM example: Compute Pauli expectation

(σ

) af te r a state-preparation circuit (ansatz).

translated into the MLIR representation consisting of opera-

tions from ou r quantum dialect (e.g., quantum gates) and the

standard dialect (e.g., branching an d arithmetic instructions).

We ca n now transform this high-level IR to executable co de

adhering to the QIR speciﬁcation (see Figure

5). At the time of

writing, the execution of this type of tightly coupled quantum-

classical program, whereby measurement feedforward is re-

quired, is only applicable to simulator backend s o f qcor’s

ftqc runtime [

20]. We anticipate that quantum hardware

providers will su pport this dynamical runtime model in the

near future as OpenQASM becomes more mature and widely-

adopted . I mportantly, in our workﬂow, the runtime implemen-

tation is lin ked in at the ﬁnal phase of the compilation pipeline,

thus could be provided interchangeably and dynamically to

target different accelerator targets.

1 func @c ompute(%arg0: f64, %arg1: !quantum.Arra y) ->

f64 {֒→

2 ...

3 br ˆbb1

4 ˆbb1: // 2 preds: ˆ bb0, ˆbb5

5 %5 = load %4[] : memref<i64>

6 %6 = cmpi "slt", %5, %c1024_i64 : i64

7 cond_br %6 , ˆb b2, ˆbb3

8 ˆbb2: // pred: ˆ bb1

9 %7 = q.extract(%arg1, %c0_i64) : !quantum.Qubit

10 %8 = qvs.x(%7) : !quantum .Qubit

11 %9 = q.extract(%arg 1, %c1_i64) : !quantum.Qubit

12 %10 = qvs.ry(%9, %arg0) : !quantum.Qubit

13 %11 = q.extract(%arg1, %c1_i64 ) :

!quantum.Qubit֒→

14 %12:2 = qvs.cx(%11, %8) : !quantum.Qubit,

!quantum.Qubit֒→

15 ....

16 %23 = cmpi "ne" , %21, %22 : i1

17 cond_br %23, ˆbb4, ˆbb5

18 ˆbb3: // pred: ˆbb1

19 ...

20 %28 = subf %27, %26 : f64

21 ...

22 %37 = divf %34, %36 : f64

23 ...

24 return %39 : f64

25 ˆbb4: // pred: ˆbb2

26 %40 = load %2[] : memref<f64>

27 %41 = addf %40, %cst_0 : f64

28 store %41, %2[] : memref<f64>

29 br ˆbb5

30 ˆbb5: // 2 preds: ˆbb2, ˆbb4

31 %42 = qvs.reset(%14) : !quantum.Qubit

32 %43 = qvs.reset(%16) : !quantum.Qubit

33 %44 = load %4[] : memref<i64>

34 %45 = addi %44, %c1_i64 : i64

35 store %45, %4[] : memref<i64>

36 br ˆbb1

Fig. 9: Truncated MLIR representation of the OpenQASM

program in Figure 8. Here, we only show a simpliﬁed MLIR

printou t of the core deuteron func tion w hereby mo st of the

code has been omitted for clarity. H igh-level classical control

ﬂow constructs such as the fo r loop and the if statement in

Figure

8 are converted into LLVM -style CFG constructs and

operations such as blocks and branches. Th anks to MLIR-

level o ptimization (sec. IV-D1), the ansatz subroutine in

Figure

8 has been inlined into the deu teron body as shown

as qvs.x, qvs.ry, and qvs.cx operations. Arithmetic op-

erations are translated to MLIR operations from the Standard

dialect, such as addf, subf, divf (ﬂoating-poin t numb ers)

or addi, cmpi (integers).

B. Compiler pe rformance

In this section, we benchmark

the compilation and resource

estimation runtime of the OpenQASM compiler against a set

of different quantu m programming languages and frameworks,

such Q#

, Qiskit

, and t|keti

. We benchmark the total runtime

Set-up: Intel Xeon CPU E5-2698 v4 @ 2.20GHz running Linux Debian

10 distribution.

Microsoft Quantum Development Kit 0.18.2107153439

qiskit-terra 0.18.1

pytket 0.13.0

10 20 30 40 50

N qubits

Runtime [secs]

Qiskit

t|keti

Q#-compile

OpenQASM-compile OpenQASM-total

Fig. 10 : Processing time of Trotter circuits for Heisenberg

Hamiltonian models (eq.

1) with a variable number of qubits.

Transpiler optimization level 3 and full peephole optimization

are applied for Qiskit and t|keti, respectively. For Op enQASM

compilation, all MLIR-level optimization passes (see

IV-D1)

and LLVM -O3 optimization are applied. For Q#, QIR

generation mode is used to generate LLVM IR, which is

then optimized (-O3) by LLVM’s clang. Link time to

qcor’s QIR runtime is included for both OpenQASM and

Q# compilatio n time. Each da ta point is the average of 10

runs with standard d eviation also plotted. OpenQASM total

time includes both co mpilation time and resourc es estimation

runtime (counting all executed gates).

required to generate and execute binary executables in the

resource e stimation mode, i.e., counting ﬂattened quantum

gates, because it provides a mec hanism to compare statically-

compiled executables against Python-based interpreted scripts

constructing the equivalent circuits.

In Figure

10, we plot the c ompile data for Trotter circuits

simulating the generic Heisenberg Hamiltonian model of the

form

H = −h

− J

i+1

. (1)

The circuit is constructed by ‘for’ loops over a ﬁxed number

of Trotter steps (steps = 100, step size (dt) = 0.01) with a

variable number of qubits from 5 to 50. In other words, we

construct the circuit representin g the unitary

U =

100

k=1



exp(−hdtX

)

exp(−J

dtZ

i+1

)



, (2)

where each Pauli exponen tial term is converted to an equiv-

alent gate-based sub-circuit. These sub-circuits are r epeated

at each time step to simulate the Trotterized evolution of the

Heisenberg Hamiltonian .

Quantum programmin g lan guages, such as Q# and Open-

QASM, preserve the loop construct in their IR representation,

resulting in almost con stant compilation time. We note that the

number of qubits is a compile-time constant fo r both the Q#

and Open Q A SM cases. The compiler ma y choose to unroll

these loops.

The compilation tim e of OpenQASM includes: (1) f rontend

parsing (ANTLR), (2) MLIR generation and optimization,

(3) lowering to LLVM IR, (4) LLVM optimization, and (5)

object code gen eration and linking. For Q#, we leverage

the Microsoft Quantum Development Kit (QDK) to perform

QIR LLVM generation, i.e., equivalent to steps (1)-(3) in o ur

OpenQASM work ﬂow. The QDK-g e nerated LLVM IR is then

optimized and linked with the qcor runtime similar to steps

4 and 5 using the standard LLVM toolchain.

The results in Figure

10 highlight th e need for statically-

compiled quantum programming lang uages in order to de-

scribe large-scale programs. Imperative gate-by-gate construc-

tion of quantum circuits usin g scripting languages, despite its

ﬂexibility and ease of use, does come with a signiﬁcant p e rfor-

mance overhead. Impor ta ntly, our MLIR-based com piler for

OpenQASM demonstrates improved performance compared

to other compilers, such as Q#. It is worth noting that the

Q# language is much more feature-rich than OpenQASM,

therefore requiring a more elaborated frontend and build

system. In par ticular, initial Q# to QIR g e neration accounts

for the majority (80-90%) of the total Q# compilation time

in Figure

10. As of this writing, there are no other publicly

available OpenQASM compilers that we can compare our

implementation with.

C. Extensions for Optimal Code Gene ration

Here we demonstrate the utility of our proposed extensions

to the OpenQASM grammar speciﬁcation with r egards to

optimal quantum code generation. Speciﬁcally, we show how

the compute-action syntax enables th e compiler imple-

mentation to generate optimal quantum instruction sequences

in the presence of a ctrl gate modiﬁer. As stated in Sec.

III,

controlled operations on the W = U V U

†

pattern only require

controls applied to the operation s in V . Via programmer intent

— i.e. leveraging the custom co mpute {...} action

{...} syntax — the compiler can optimally synthesize quan-

tum instructions adhere nt to this pattern.

Take the Heisenberg Hamiltonian and corresponding time

evolution o perator U in Eqs.

1 and 2. In the context of the

quantum phase estimation algorithm, if we seek a correspond-

ing eigenvalue of U with respect to some eigen state |ψi, we

will require the application o f a series o f controlled versions

of U.

Figure

11 shows an OpenQASM subro utine describing

the trotter evolution in E q. 2. The seco nd nested for loop

(lines 18-29) could be w ritten manually a s the sequence

W = CX ⊗ RZ(Θ) ⊗ CX

†

, but by replacing it with a

compute-action block, we give th e c ompiler an oppor-

tunity for optimal instruction synthe sis u nder application of a

ctrl modiﬁer. Speciﬁcally, we have implemented an ANTLR

visitor handler that processes the compute-action source

and adds the compute instructions, the acti on instructions,

1 const n b_qubits = 5;

2 def hei senberg_U() qubit[n b_qubits]:r {

4 // Extension-provided C-like data types

5 int n b_steps = 100;

6 double step_size = .01;

7 double Jz = 1.0;

8 double h = 1.0;

10 // -h

sigma_x layers

11 for step in [0:nb_steps] {

12 // -h

sigma_x layers

13 rx(-h

step_size) r;

15 // -Jz

sigma_z

sigma_z layers

16 for i in [0:nb_ qubits-1] {

17 compute {

18 cx r[i], r[i+1];

19 } action {

20 rz(-Jz

step_size) r[i + 1];

21 }

22 // Could be written manually like this

23 // No optimizations picked up

24 // cx r[i], r[i+1];

25 // rz(-Jz

step_size) r[i + 1];

26 // cx r[i], r[i+1];

27 }

28 }

29 }

31 // Allocate the qubits

32 qubit r[nb_qubits], c;

34 // Perform ctrl-U

35 ctrl @ heisenberg_U c, r;

Fig. 11: OpenQASM code deﬁning a subroutine describing the

unitary oper ation in Eq. 2. The use of comp ute-action

from ou r grammar extension enables the compiler imple-

mentation to optimally synth esis con trolled versions of this

subroutine. This code snippet also demon stra te s the C-like

extensions for prim itive data types and for loops.

and the adjoint or reverse of the c ompute instructions to

the MLIR tree. More over, for instructions that are not in the

action block, the com piler marks the added instructions

with a ﬂag to indicate that they are part of the compute

or uncompute block. At runtime, this inf ormation is used to

optimally synthesize controlled versions of this b lock of code.

We benchmark th e usage of compute-action vs manual

programming of W and present the results in Fig

12. The re-

sults show the number of controlled operations (CRZ, CNOT)

present in the compiled quantum program for the manual

(commented lines 22-26) and compute-action (lines 17-2 1)

cases. One can clearly see that via program mer intent, the

compiler can optimally synthesize instruction sequences and

improve on the resource utility of th e compiled program. With

this simple prog ramming extension, progra mmers can pick up

an order o f magnitude in g ate count reductions.

VI. CONCLUSION

We have presented an optimizing ahead-of-time Open-

QASM compiler built on the MLIR fram ework. Our approach

lowers OpenQASM codes to the LLVM IR in a manner that

10 20 30 40 50

N Qubits

N Controlled Operations (CX, CRZ)

Manual Compute / Action

Fig. 12: Th e number of co ntrolled o perations (CX and CRZ)

present in the compiled representation of Figure

11 for a given

number of qubits. The Manual plot represents a p rogram that

sequentially lists the instruction for the pattern W = U V U

†

while the other plot represents the case w hereby a programmer

expresses this pattern via the compute-action syntax from

our gramm ar extension.

is adhe rent to the QIR speciﬁcation. We provide quantum

circuit optimization passes at the MLIR level and leverage

existing c la ssical optimization passes pre sent in both MLI R

and LLVM. Our work extends the OpenQASM grammar

with support for C-like constructs a nd the co mpute-action-

uncompute pattern for efﬁcient programming and compile-

time optimizations. Targeting the QIR enables one to swap

runtime library implementatio ns enabling a write once run

anywhere character istic. We have provided a ru ntime library

implementation of the QIR speciﬁcation that is backed by the

XACC quantum programming framework, thereby enabling

OpenQASM compilation that targets quan tum compute rs from

IBM, Rigetti, H oneywell, IonQ, as well as simulators that scale

from laptops to large-scale heterogeneous high performance

computers, like Summit. Moving f orward, our approach opens

up the possibility of true language integration at the LLVM IR

level. We envision a number of language approaches that map

to the LLVM IR adhe rent to the QIR speciﬁca tion, and via

simple runtime linking, enabling the in tegra tion of quantu m

languag e A with code from quantum language B. As of this

writing, the integration of Q#, qcor C++, and Open QA SM

is now possible. We also envision this work as a n alternative

mechanism for embedded C++ quantu m kernels in qc or,

departing fro m the existing Clang Syntax Handler so urce

preprocessing. Future work will investigate true quantum

languag e integration and compile-time embedding of MLIR-

to-LLVM processing in the qcor C++ language extension.

ACKNOWLEDGMENT

This material is based upon work suppo rted by the U.S.

Departmen t of Energy, Ofﬁce of Science, National Quantum

Information Scienc e Research Centers. ORNL is managed by

UT-Battelle, LLC, for the US Department o f Energy under

contract no. DE-AC05-00OR22725.

REFERENCES

[1] Circuit synthesis via quantum gate modiﬁers.

https://qiskit.github.io/openqasm/language/gates.html#quantum-gate-modiﬁers.

Accessed: 2021-08-30.

[2] Introducing quantum intermediate representation (qir).

https://devblogs.microsoft.com/qsharp/introducing-quantum-intermediate-representation-qir/.

Accessed: 2021-01-15.

[3] Mlir documentation. https://mlir.llvm.org. Accessed: 2021-01-15.

[4] Qcor github. https://github.com/ornl-qci/qcor. Accessed: 2021-01-15.

[5] Quantum intermediate representation (qir) speciﬁcation.

https://github.com/microsoft/qsharp-language/tree/main/Speciﬁcations/QIR.

Accessed: 2021-01-15.

[6] Koen Bertels, J.W. Hogaboam, and Carmen G. Almudever. Quantum

computer architecture: a full-stack overview. Microprocessors and

Microsystems, 66:21–66, 2019.

[7] Benjamin Bichsel, Maximilian Baader, Timon Gehr, and Martin Vechev.

Silq: A high-level quantum language with safe uncomputation and intu-

itive semantics. In Proceedings of the 41st ACM SIGPLAN Conference

on Programming Language Design and Implementation, PLDI 2020,

page 286–300, New York, NY, USA, 2020. Association for Computing

Machinery.

[8] Keith A Britt and Travis S Hum ble. High-performance computing with

quantum processing units. ACM Journal on Em erging Technologies in

Computing Systems (JETC), 13(3):1–13, 2017.

[9] Andrew W. Cross, Ali Javadi-Abhari, Thomas Alexander, Niel de Beau-

drap, Lev S. Bishop, Steven H eidel, Colm A. Ryan, John Smolin, Jay M.

Gambetta, and Blake R. Johnson. Openqasm 3: A broader and deeper

quantum assembly language, 2021.

[10] E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris,

T. Papenbrock, R. C. Pooser, D. J. Dean, and P. Lougovski. Cloud

quantum computing of an atomic nucleus. Phys. Rev. Lett., 120:210501,

May 2018.

[11] Tobias Gysi, Christoph M¨uller, Oleksandr Zinenko, Stephan Herhut,

Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoeﬂer, and Tobias

Grosser. Domain-speciﬁc multi-level ir rewriting for gpu, 2020.

[12] Kathleen E. Hamilton, Eugene F. Dumitrescu, and Raphael C. Pooser.

Generative model benchmarks for superconducting qubits. Phys. Rev.

A, 99:062323, Jun 2019.

[13] IBM. Openqasm 3.x live speciﬁcation, 2021.

https://qiskit.github.io/openqasm/.

[14] David Ittah, Thomas H¨aner, Vadym Kliuchnikov, and Torsten Hoeﬂer.

Enabling dataﬂow optimization for quantum programs, 2021.

[15] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff Heckey, Alexey

Lvov, Frederic T. Chong, and Margaret Martonosi. Scaffcc: Scalable

compilation and analysis of quantum programs. Parallel Computing,

45:2 – 17, 2015. Computing Frontiers 2014: Best Papers.

[16] Tian Jin, Gheorghe-Teodor Bercea, Tung D. Le, Tong Chen, Gong

Su, Haruki Imai, Yasushi Negishi, Anh Leu, Kevin O’Brien, Kiyokuni

Kawachiya, and Alexandre E. Eichenberger. Compiling onnx neural

network models using mlir, 2020.

[17] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy

Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasi-

lache, and Oleksandr Zinenko. Mlir: A compiler infrastructure for the

end of moore’s law, 2020.

[18] Alexander McCaskey, Eugene Dumitrescu, Dmitry Liakh, and Travis

Humble. Hybrid programming for near-term quantum computing sys-

tems. In 2018 IEEE International Conference on Rebooting Computing

(ICRC), pages 1–12. IEEE, 2018.

[19] Alexander McCaskey and Thien Nguyen. A mlir dialect for quantum

assembly languages, 2021.

[20] Alexander Mccaskey, Thien Nguyen, Anthony Santana, Daniel Claudino,

Tyler Kharazi, and Hal Finkel. Extending c++ for heterogeneous

quantum-classical computing. ACM Transactions on Quantum Com-

puting, 2(2), July 2021.

[21] Alexander J McCaskey, Dmitry I Lyakh, Eugene F Dumitrescu, Sarah S

Powers, and Travis S Humble. XACC: a system-level s oftware in-

frastructure for heterogeneous quantum–classical computing. Quantum

Science and Technology, 5(2):024002, feb 2020.

[22] Alexander J. McCaskey, Zachary P. Parks, J acek Jakowski, Shirley V.

Moore, Titus D. Morris, Travis S. H umble, and Raphael C. Pooser.

Quantum chemistry as a benchmark for near-term quantum computers.

npj Quantum Information, 5(1):99, 2019.

[23] Tiffany M Mintz, Alexander J Mccaskey, Eugene F Dumitrescu,

Shirley V Moore, Sarah Powers, and Pavel Lougovski. Qcor: A language

extension s peciﬁcation for the heterogeneous quantum-classical model

of computation. arXiv preprint arXiv:1909.02457, 2019.

[24] MLIR. ’afﬁne’ dialect. https://mlir.llvm.org/docs/Dialects/Afﬁne/.

[25] Yunseong Nam, Neil J. Ross, Yuan Su, Andrew M. Childs, and Dmitri

Maslov. Automated optimization of large quantum circuits with con-

tinuous parameters. npj Quantum Information, 4(1):1–12, May 2018.

Bandiera

abtest: a Cc license type: cc by Cg type: Nature Research

Journals Number: 1 Primary atype: Research Publisher: Nature Pub-

lishing Group Subject

term: Computer science;Quantum information

Subject term id: computer-science;quantum-information.

[26] Thien Nguyen and Alexander J McCaskey. Extending python for

quantum-classical computing via quantum just-in-time compilation.

arXiv preprint arXiv:2105.04671, 2021.

[27] Thien Nguyen, Anthony Santana, Tyler Kharazi, Daniel Claudino, Hal

Finkel, and Alexander McCaskey. Extending c++ for heterogeneous

quantum-classical computing, 2020.

[28] T. J. Parr and R. W. Quong. Antlr: A predicated-ll(k) parser generator.

Software: Practice and Experience, 25(7):789–810, 1995.

[29] Ethan Smith, Marc G. Davis, Jeffrey M. Larson, Ed Younis, Costin

Iancu, and Wim Lavrijsen. Leap: Scaling numerical optimization based

synthesis using an incremental approach, 2021.

[30] Damian S. Steiger, Thomas H¨aner, and Matthias Troyer. Projectq: an

open source software framework for quantum computing. Quantum,

2:49, Jan 2018.

[31] Krysta Svore, Alan Geller, Matthias Troyer, John Azariah, Christopher

Granade, Bettina Heim, Vadym Kliuchnikov, Mariia Mykhailova, Andres

Paz, and Martin Roetteler. Q#: Enabling scalable quantum computing

and development with a high-level dsl. In Proceedings of the Real World

Domain Speciﬁc Languages Workshop 2018, RWDSL2018, New York,

NY, USA, 2018. Ass ociation for Computing Machinery.

[32] Ed Younis, Koushik Sen, Katherine Yelick, and Costin Iancu. Qfast:

Quantum synthesis using a hierarchical continuous circuit space, 2020.

APPENDIX A

BENCHMARK EXPERIMENTAL SETUP

Here, we list all the source codes used for the Trotter circuit

benchm arking (Figure

10). The number of qubits in the Q# and

OpenQASM source codes represent a particu la r data point. We

modify the number of qubits and recompile the source code

for the benchmark.

A. Qiskit script

1 from qi skit.aqua.operators import (X, Z, I,

EvolvedOp, PauliTrotterE volution)֒→

2 from qi skit import QuantumReg ister, QuantumCircu it

3 from qi skit.compiler impor t transpile

4 from st atistics imp ort mean, stdev

5 import time

7 def X_o p(idxs, n_qu bits):

8 op = None

9 if 0 in idxs:

10 op = X

11 else:

12 op = I

13 for i in range(1, n_qubits):

14 if (i in idxs):

15 op ˆ= X

16 else:

17 op ˆ= I

18 return op

20 def Z_op(idxs, n_qubits):

21 op = None

22 if 0 in idxs:

23 op = Z

24 else:

25 op = I

26 for i in range(1, n_qubits):

27 if (i in idxs):

28 op ˆ= Z

29 else:

30 op ˆ= I

31 return op

33 def heisenberg_ham(n_ qubits):

34 Jz = 1.0

35 h = 1.0

36 H = -h

X_op([0], n_qubits)

37 for i in range(1, n_qubits):

38 H = H - h

X_op([i], n_qubits)

39 for i in range(n_qubits - 1):

40 H = H - Jz

(Z_op([i, i + 1], n_qubits))

41 return H

43 n_qubits = [10, 20, 50, 100]

44 nbSteps = 100

45 n_runs = 10

47 def trotter_circ(q, exp_args, n_steps):

48 qc = QuantumCircuit(q)

49 for i in range(n_steps):

50 for sub_op in exp_args:

51 qc += PauliTrotterEvolution()

52 .convert(Evolv edOp(sub_op))

53 .to_circuit()

54 return qc

56 for nbQubits in n_qubits:

57 data = []

58 for run_id in range(n_runs):

59 ham_op = heisenberg_ham(nbQubits)

60 q = Q uantumRegister(nbQubits, 'q')

61 start = time.time()

62 comp = trotter_circ(q, ham_op. oplist, nbSt eps)

63 comp = transpile(comp, optimiz ation_level=3)

64 end = time.time()

65 data.append(end - start)

66 print('n_qubit s = ', nbQubits, '; Elapsed time =',

mean(data), '+/-', stdev(data), '[secs]')֒→

B. t|keti script

1 import time

2 from py tket.circuit import Circuit, PauliExpBox

3 from py tket.pauli import Pauli

4 from py tket.extensions. qiskit import

AerStateBacken d֒→

5 from pytket.passes import FullPeepholeOptimise

6 from st atistics import mean, stdev

7 nb_steps = 100

8 step_size = 0.01

9 n_qubits = [5, 10, 20, 30, 40, 50]

10 n_runs = 10

11 for nb_qubits in n_qubits:

12 data = []

13 for run_id in range(n_runs):

14 # Start timer

15 start = time.time()

16 circ = Circuit(nb_qubits)

17 h = 1 .0

18 Jz = 1.0

19 for i in range( nb_steps):

20 # Using Heisenberg Hamiltonian :

21 for q in range(nb_qubits) :

22 circ.add_pauli expbox(PauliExpBox(

[Pauli.X], -h

step_size), [q])֒→

23 for q in range(nb_qubits - 1):

24 circ.add_pauli expbox(PauliExpBox(

[Pauli.Z, Pauli.Z], -Jz

step_size), [q, q + 1])

֒→

26 # Compile to gates

27 backend = AerStateBackend()

28 circ = backend.get_compiled_circuit(circ)

30 # Apply optimization

31 FullPeepholeOptimise().apply(circ)

33 end = time.time()

34 data.append(end - start)

36 print('n_qubits =', nb _qubits, '; Elapsed time

=', mean(d ata), '+/-', stdev(data),

'[secs]')

֒→

C. Q# source cod e

1 namespace Benchmark.Heisenberg {

2 open Microsoft.Quantum.Intrinsic;

3 open Microsoft.Quantum.Canon;

4 open Microsoft.Quantum.Math;

5 open Microsoft.Quantum.Convert;

6 open Microsoft.Quantum.Arrays;

7 open Microsoft.Quantum.Measurement;

8 // In this example, w e will show how to

simulate the time evolution of֒→

9 // an Heisenber g model:

10 operation HeisenbergTrotterEvolve(nSites : Int,

simulationTime : Double, trotterStepSize :

Double) : Unit {

֒→

11 // We pick arbitrary values for the X an d J

couplings֒→

12 let hXCoup ling = 1 .0;

13 let jCoupl ing = 1. 0;

15 // This determines the number of Trotter

steps֒→

16 let steps = Ceiling(simulationTime /

trotterStepSiz e);֒→

18 // This resizes th e Trotter step so tha t

time evolu tion over the֒→

19 // duratio n is accomplis hed.

20 let trotte rStepSizeResized = simulationTime

/ IntAsDouble(steps);֒→

22 // Let us initialize nSites clean qubits.

These are all in the |0>֒→

23 // state.

24 use qubits = Qubit[nSites];

25 // We then evolve for some time

26 for idxSte p in 0 .. ste ps - 1 {

27 for i in 0 .. nS ites - 1 {

28 Exp([PauliX], (-1.0

hXCoupling)

trotterStepSiz eResized,

[qubits[i]]);

֒→

29 }

30 for i in 0 .. nS ites - 2 {

31 Exp([PauliZ, PauliZ], (-1.0

jCoupling)

trotterStepSiz eResized,

qubits[i . . (i + 1)]);

֒→

32 }

34 }

35 }

37 // Entry point: we allow the Q# program to have

full infor mation (compile-tim e) ab out the

number of qubits, steps, e tc.

֒→

38 @EntryPoint()

39 operation CircuitGen() : Unit {

40 HeisenbergTrot terEvolve(50, 1.0, 0.01);

41 }

42 }

D. Open Q ASM source code

1 OPENQASM 3;

3 const nb_steps = 100 ;

4 const n b_qubits = 50;

5 const s tep_size = 0.01;

6 const J z = 1.0;

7 const h = 1.0;

9 qubit r[nb_qubits];

10 for step in [0:nb_steps] {

11 // -h

sigma_x layers

12 for i in [0:nb_qubits] {

13 rx(-h

step_size) r[i];

14 }

16 // -Jz

sigma_z

sigma_z layers

17 for i in [0:nb_qubits - 1] {

18 cx r[i], r[i+1];

19 rz(-Jz

step_size) r[i + 1];

20 cx r[i], r[i+1];

21 }

22 }

This figure "CompilationFlow.png" is available in "png" format from:

http://arxiv.org/ps/2109.00506v1

This figure "pass_uml.png" is available in "png" format from:

http://arxiv.org/ps/2109.00506v1