Creating a compiler using LLVM

August 2nd, 2009

This tutorial walks through creating a compiler using LLVM. This closely follows the LLVM tutorial. In fact, it was originally my intention to work through that tutorial, but the tools I ended up using were different enough that I thought a write-up would be helpful to others.

The LLVM tutorial uses a hand-rolled C++ lexer, parser, and code generator. I created a lexer using Flex, a parser using Bison, and a LLVM IR generator using Digital Mars D, using the
LLVM C interface.

As mentioned, the source language is the Kaleidoscope language, essentially as described by the LLVM tutorial. All values are of type double. Functions can be defined which return a single expression. All functions take a number of double parameters and return a double. External functions can be declared with the extern keyword. If/then/else creates a single expression based returning either the then expression or the else expression.

Input() will read a single double from standard input, and output creates a main() method which writes a single value to standard output.

This example shows off the features of the language, and will be the example used throughout this tutorial. It is a convoluted way of reading a value n, and displaying the nth fibonacci number:

extern cos(x)
extern sqrt(x)
extern fib(n)
def pi() 3.14159
def square(x) x * x
def sin(x) cos(x + pi() / 2)
def len(x, y) sqrt(square(x) + square(y))
def fib(n)
  if(n < 2) then
    n - 1
  else
    fib(n - 2) + fib(n - 1)

output len(sin(pi()), fib(input()))

First, let's look at the lexer:

%option noyywrap

%{
#include "parser.tab.h"

char *identifier_str;
%}

IDENTIFIER    [a-zA-Z][a-zA-Z0-9]*
NUMBER        [0-9.]+
WHITESPACE    [ \t\r\n]
SYMBOL        [^a-zA-z0-9.]
COMMENT       #.*

%%

{WHITESPACE}  /* Do nothing */
{COMMENT}     /* Do nothing */
{NUMBER}      return TOKEN_NUMBER;
def           return TOKEN_DEF;
extern        return TOKEN_EXTERN;
input         return TOKEN_INPUT;
output        return TOKEN_OUTPUT;
if            return TOKEN_IF;
then          return TOKEN_THEN;
else          return TOKEN_ELSE;
{IDENTIFIER}  identifier_str = yytext; return TOKEN_IDENTIFIER;
{SYMBOL}      return *yytext;
<<EOF>>       return TOKEN_EOF;

This is a very straightforward Flex lexer. The only logic involved is for identifiers, a global var identifier_str is set to the value of the identifier. All other tokens simply return their int
constants.

The parser is more complex. The goal of the parser is to build an Abstract Syntax Tree (AST) for the language elements. Later, we will walk the AST and generate the code for it.

The AST is defined in the D code. It is a hierarchy of classes, each of which is a subclass of ASTNode. ASTNode looks like the following:

class ASTNode {
        public:

        int id;
        ASTNode left;
        ASTNode right;

        this(int id, ASTNode left = null, ASTNode right = null) {
                this.id = id;
                this.left = left;
                this.right = right;
        }

        mixin Accept;
}

For interfacing with Bison, we give each node an id, and keep an array of all the nodes. The ids of lower-level nodes are passed as parameters from Bison when creating higher nodes. The D code then looks up the node with the given id, and inserts it into the AST.

A concrete example helps illustrate this. Here is the Bison code for a multiplication expression:

           | expression '*' expression
             { $$ = ast_binary_expression('*', $1, $3); }

This code passes the id of the two expressions to be multiplied into the ast_binary_expression function, which creates a new node for binary expressions and returns its id. That code looks like this:

int addNode(ASTNode node, bool isStatement = false) {
	nodes.length = cast(int) fmax(nodes.length, node.id + 1);
	nodes[node.id] = node;

	if(!statements) {
		statements = new Stack!(Statement)();
	}

	if(isStatement) {
		Statement statementNode = new Statement(nextId(), node);
		if(statements.length > 0) {
			statements.peek().nextStatement = statementNode;
		}
		statements.push(statementNode);
	}

	return node.id;
}
	int ast_binary_expression(char operation, int lhIndex, int rhIndex) {
		return addNode(new BinaryExpression(nextId(), operation, nodes[lhIndex], nodes[rhIndex]));
	}

The other nodes work the same way. Here is the complete parser source:

%{
#include 
#include 
#include 
#include "ast.h"

char *endp;
extern char *yytext;

int yylex();
void yyerror(char *s);

%}

%token TOKEN_EOF
%token TOKEN_DEF
%token TOKEN_EXTERN
%token TOKEN_IF
%token TOKEN_THEN
%token TOKEN_ELSE
%token TOKEN_IDENTIFIER
%token TOKEN_NUMBER
%token TOKEN_INPUT
%token TOKEN_OUTPUT

%start statement

%left '<' '=' '>'
%left '+' '-'
%left '*' '/'
%left ','

%%

statement : extern statement
          | function statement
          | output statement
          | TOKEN_EOF
          ;

identifier: TOKEN_IDENTIFIER
            { $$ = ast_identifier(strdup(yytext)); }

extern : TOKEN_EXTERN prototype
         { $$ = ast_extern($2); }
       ;

function : TOKEN_DEF prototype expression
           { $$ = ast_function($2, $3); }
         ;

input : TOKEN_INPUT '(' ')'
        { $$ = ast_input(); }
      ;

output : TOKEN_OUTPUT expression
         { $$ = ast_output($2); }
       ;

prototype : identifier '(' ')'
            { $$ = ast_prototype($1, -1); }
          | identifier '(' prototype_args ')'
            { $$ = ast_prototype($1, $3); }
          ;

prototype_args : identifier
                 { $$ = ast_prototype_arg($1, -1); }
               | identifier ',' prototype_args
                 { $$ = ast_prototype_arg($1, $3); }
               ;

call : identifier '(' ')'
       { $$ = ast_call($1, -1); }
     | identifier '(' call_args ')'
       { $$ = ast_call($1, $3); }
     ;

call_args : expression
            { $$ = ast_call_arg($1, -1); }
          | expression ',' call_args
            { $$ = ast_call_arg($1, $3); }
          ;

expression : expression '+' expression
             { $$ = ast_binary_expression('+', $1, $3); }
           | expression '-' expression
             { $$ = ast_binary_expression('-', $1, $3); }
           | expression '*' expression
             { $$ = ast_binary_expression('*', $1, $3); }
           | expression '/' expression
             { $$ = ast_binary_expression('/', $1, $3); }
           | '(' expression ')'
             { $$ = $2; }
           | identifier
             { $$ = ast_variable($1); }
           | TOKEN_NUMBER
             { $$ = ast_number(strtod(yytext, &endp)); }
           | call
           | if
           | input
           ;

boolean_expression : expression '<' expression
                     { $$ = ast_boolean_expression('<', $1, $3); }
                   | expression '=' expression
                     { $$ = ast_boolean_expression('=', $1, $3); }
                   | expression '>' expression
                     { $$ = ast_boolean_expression('>', $1, $3); }
                   ;

if : TOKEN_IF '(' boolean_expression ')' then_else
     { $$ = ast_if($3, $5); }
   ;

then_else : TOKEN_THEN expression TOKEN_ELSE expression
            { $$ = ast_then_else($2, $4); }
          ;
%%

void yyerror(char *s) {
  fprintf(stderr, "%s\n", s);
}

Once we have an AST tree, we can traverse the nodes for a variety of useful effects. As a simple example,
we can generate a GraphViz visualization of the AST. See the code for details, but here is the output for
the parser above:

ast

The final step is to transform the AST into a set of LLVM IR instructions. The full syntax of the language is documented in the reference manual. We are using the C interface to programatically generate it, though, so our input will look a little different.

First, we generate a module. Modules are the independent units of compilation; for example, in C each .c file would be a module. For Kaleidoscope, we always create a single module. We also create a builder. Builders are used to create the instructions, and automatically insert them into the appropriate place in the
output. We also define a few basic types, and store them for later:

                moduleRef = LLVMModuleCreateWithName(toStringz(name));
                builderRef = LLVMCreateBuilder();
                doubleType = LLVMDoubleType();
                intType = LLVMInt32Type();
                charType = LLVMInt8Type();
                charPtrType = LLVMPointerType(charType, 0);
                charPtrPtrType = LLVMPointerType(charPtrType, 0);

Constants in the source file get corresponding constants in the output:

        void previsit(Number number) {
                expressions.push(LLVMConstReal(doubleType, number.val));
        }

Variables work similarly:

        void previsit(Variable variable) {
                expressions.push(fnContext.params[variable.name]);
        }

Ok, those are pretty simple. Functions are divided into prototypes and implementations. Extern statements have a prototype with no implementation, and Defun statements have both. To facilitate this, we keep a context structure with information on the current function, if any,

                        auto functionType = getFunctionType(args.length);
                        fnContext.fn = LLVMAddFunction(moduleRef, toStringz(prototype.name), functionType);
                        LLVMSetLinkage(fnContext.fn, LLVMExternalLinkage);
                }

                void previsit(Prototype prototype) {
                auto preExisting = LLVMGetNamedFunction(moduleRef, toStringz(prototype.name));
                if(preExisting) {
                        fnContext.fn = preExisting;
                } else {
                        auto args = prototype.flatArgs;
                        auto functionType = getFunctionType(args.length);
                        fnContext.fn = LLVMAddFunction(moduleRef, toStringz(prototype.name), functionType);
                        LLVMSetLinkage(fnContext.fn, LLVMExternalLinkage);
                }

                foreach(idx, arg; prototype.flatArgs) {
                        fnContext.params[arg.name] = LLVMGetParam(fnContext.fn, idx);
                }

                if(fnContext.type == FunctionType.defined) {
                        fnContext.block = LLVMAppendBasicBlock(fnContext.fn, "entry");
                        LLVMPositionBuilderAtEnd(builderRef, fnContext.block);
                }
        }

The last statement, LLVMPositionBuilderAtEnd, ensures that subsequent statements are always inserted into the correct context.

The other complication comes with the if/then/else conditional. For an if statement, three blocks of code are required: a then block, a else block, and a merge block. Code flows to either the then or else branch, both of which calculate their associated expressions and then return to the merge block.

Finally, a Phi node is created. A Phi nodes takes the expression calculated by one of the two blocks, depending on which block was executed. This is the result expression returned by the if statement.

        void previsit(If ifNode) {
                ifThenElseContext.mergeBlock = LLVMAppendBasicBlock(fnContext.fn, toStringz("merge"));
        }

        void visit(If ifNode) {
                ifThenElseContext.thenBlock = LLVMAppendBasicBlock(fnContext.fn, toStringz("then"));
                ifThenElseContext.elseBlock = LLVMAppendBasicBlock(fnContext.fn, toStringz("else"));

                LLVMBuildCondBr(builderRef, expressions.pop(), ifThenElseContext.thenBlock, ifThenElseContext.elseBlock);
        }

        void postvisit(If ifNode) {
                fnContext.block = ifThenElseContext.mergeBlock;
                auto phi = LLVMBuildPhi(builderRef, doubleType, toStringz("phi"));
                LLVMAddIncoming(phi, &ifThenElseContext.thenValue, &ifThenElseContext.thenBlock, 1);
                LLVMAddIncoming(phi, &ifThenElseContext.elseValue, &ifThenElseContext.elseBlock, 1);
                expressions.push(phi);
        }

        void previsit(ThenElse thenElseNode) {
                LLVMPositionBuilderAtEnd(builderRef, ifThenElseContext.thenBlock);
        }

        void visit(ThenElse thenElseNode) {
                LLVMBuildBr(builderRef, ifThenElseContext.mergeBlock);
                LLVMPositionBuilderAtEnd(builderRef, ifThenElseContext.elseBlock);
        }

        void postvisit(ThenElse thenElseNode) {
                ifThenElseContext.elseValue = expressions.pop();
                ifThenElseContext.thenValue = expressions.pop();
                LLVMBuildBr(builderRef, ifThenElseContext.mergeBlock);
                LLVMPositionBuilderAtEnd(builderRef, ifThenElseContext.mergeBlock);
        }

Finally, the output script is written to a binary file:

int LLVMWriteBitcodeToFile(LLVMModuleRef, char *);

This file can be disassembled to see the source:

; ModuleID = 'output.i'
@format = internal constant [4 x i8] c"%f\0A\00"		; <[4 x i8]*> [#uses=1]

declare i32 @printf(i8*, ...)

declare i32 @scanf(i8*, ...)

declare double @cos(double)

declare double @sqrt(double)

define double @fib(double) {
entry:
	%lttmp = fcmp olt double %0, 2.000000e+00		; <i1> [#uses=1]
	br i1 %lttmp, label %then, label %else

merge:		; preds = %else, %then
	%phi = phi double [ %subtmp, %then ], [ %addtmp, %else ]		; <double> [#uses=1]
	ret double %phi

then:		; preds = %entry
	%subtmp = sub double %0, 1.000000e+00		; <double> [#uses=1]
	br label %merge

else:		; preds = %entry
	%subtmp1 = sub double %0, 2.000000e+00		; <double> [#uses=1]
	%call = call double @fib(double %subtmp1)		; <double> [#uses=1]
	%subtmp2 = sub double %0, 1.000000e+00		; <double> [#uses=1]
	%call3 = call double @fib(double %subtmp2)		; <double> [#uses=1]
	%addtmp = add double %call, %call3		; <double> [#uses=1]
	br label %merge
}

define double @pi() {
entry:
	ret double 3.141590e+00
}

define double @square(double) {
entry:
	%multmp = mul double %0, %0		; <double> [#uses=1]
	ret double %multmp
}

define double @sin(double) {
entry:
	%call = call double @pi()		; <double> [#uses=1]
	%divtmp = fdiv double %call, 2.000000e+00		; <double> [#uses=1]
	%addtmp = add double %0, %divtmp		; <double> [#uses=1]
	%call1 = call double @cos(double %addtmp)		; <double> [#uses=1]
	ret double %call1
}

define double @len(double, double) {
entry:
	%call = call double @square(double %0)		; <double> [#uses=1]
	%call1 = call double @square(double %1)		; <double> [#uses=1]
	%addtmp = add double %call, %call1		; <double> [#uses=1]
	%call2 = call double @sqrt(double %addtmp)		; <double> [#uses=1]
	ret double %call2
}

define i32 @main(i32, i8**) {
entry:
	%call = call double @pi()		; <double> [#uses=1]
	%call1 = call double @sin(double %call)		; <double> [#uses=1]
	%inputtedDouble = alloca double		; <double*> [#uses=0]
	%call2 = call double @fib(double 7.000000e+00)		; <double> [#uses=1]
	%call3 = call double @len(double %call1, double %call2)		; <double> [#uses=1]
	%printoutput = call i32 (i8*, ...)* @printf(i8* getelementptr ([4 x i8]* @format, i64 0, i64 0), double %call3)		; <i32> [#uses=0]
	ret i32 1
}

Some other tools that can be used:

  • lli--Runs the bytecode file directly
  • opt--Optimize the bytecode
  • llc--Compile to native code

So, with a few batch files, a compiler can be written to output optimized, native code.

The full source of this example can be found at
GitHub

Prime Checking

July 5th, 2009

I’ve been on a bit of a kick lately working on programming competitions. Many of these, such as Project Euler or the Sphere Online Judge have as one of their initial problems checking the primality of fairly large numbers. Since I’ve been spending a lot of time lately looking at this, I figured I should share what I’ve learned.

What is a prime?

To refresh your memory, a prime number is any integer > 1 which is not divisible by any number other than one. For example, 4 is not prime, because it is divisible by two, whereas 5 is, because it is not divisible by 4, 3, or 2.

Naive algorithm

This lends itself immediately to the naive algorithm for checking a prime:

A number n is prime iff for each p where 1 < p < n, p % n != 0

This test is 100% accurate, but it is too slow to be of practical use. The trick is to speed it up while still remaining accurate. One way to do this is to consider the equation above. For a number p to divide n, it would have to be in the form of a * b = n, for some values of a and b. Either a or b must be <= sqrt(n), since otherwise a * b > n. Therefore, we can limit the range of our algorithm above:

A number n is prime iff for each p 1 < p < sqrt(n), p % n != 0

This is better, but still not fast enough for common use. We can however introduce a few extra checks to improve its speed.

Sieve of Erastothenes

Obviously, no even numbers can be prime. So a good first step is to throw out any number where n % 2 = 0. This is actually the first step from our algorithm above. However, once we have checked that n % 2 != 0, there's no need to check n % 4, n % 6, n % 8, etc. Similarly, if n % 3 != 0, there's no need to check n % 6, n % 9, etc. The basic algorithm for the sieve of Erastothenes is as follows:

  1. Create a bitmask for each number < n
  2. Start with p = 2, check that n % p != 0. If this is false, n must be composite. If it is true, mark all multiples of 2 in the list as verified.
  3. Repeat with the next unverified number on the list, until the verification fails, or all numbers have been verified, in which case n is prime

This is much faster than the original method, but is not necessarily the fastest method.

Prime Divisor Check

A variant on this is to generate an initial list of small primes by performing the Sieve with n = <some small prime>, such as 257. The numbers marked "verified" are all prime. For each subsequent check, check if it is divisible by one of these numbers. If it passes, proceed with a more exhaustive test. This is basically a cached version of the Sieve method.

Fermat's Little Theorem

Fermat's little theorem states that if n is prime then for each a, 1 <= a < n, a ^ (p - 1) = 1 mod p. Now, if you're like me and your math is a little rusty, this can be confusing. I first looked at this and took it literally. "1 mod p = 1 for any p," I reasoned, "and therefore a ^ (p - 1) = 1 means that p = 1." Needless to say, this was off base. The equal here is actually the congruence operator. Conceptually, what it means is this: (a ^ (p - 1)) % p = 1 % p, or (a ^ (p - 1)) % p = 1. I find it easier, however, to rewrite it as (a ^ (p - 1)) % p - 1 = 0. This second form makes more sense later, when we find the congruence of -1.

Now, Fermat's theorem holds for all prime numbers. However, there are some composite numbers for which it holds as well. Numbers which pass this test are called a "probable prime", to reflect this. Composites which pass this test are known as pseudoprimes.

However, if a number fails this test, it is definitely composite. That makes this a good test for weeding out composites.

Modular Exponentiation

As an aside, there is a trick to making the Fermat test, and other similar ones, execute in a reasonable time. Part of the check involves calculating a ^ (p - 1) % n. Imagine using this test for n = 10,000,000. Even for small a, this quick becomes infeasible. The trick is to calculate it via modular exponentiation. This works by keeping a running total, then working through the bits of the exponent, from right to left. This can either be done by looking at the bits directly, or continuously dividing the exponent by two and checking whether the result is even or odd.

If the bit is set, the result is set to the result times the base, modulus n. That is, result = (result * base) % n. Either way, the base is updated to base = (base * base) % n.

Miller-Rabin

The Miller-Rabin test is perhaps one of the most famous means of testing primality. It works similar to the Fermat test. First, n - 1 is broken into two values, s and d, so that d * 2 ^ s = (n - 1). This can be done quite easily by repeatedly dividing n - 1 by 2.

Once that has been done, two congruences are checked:

a ^ d = 1 (mod n)

and

a ^ (d * 2 ^ r) = 1 (mod n) for all r where 0 <= r < s

Wikipedia gives an efficient algorithm for performing this check:

  x = (a * d) % n
  if(x != 1 and x != (n - 1)) {
    for r = 1 .. (s - 1)
      x = (x ^ 2) % n
      if(x = 1) COMPOSITE
      if(x = n - 1) PRIME
    next
  COMPOSITE

Like the Fermat test, failure of this test proves composite, but passing does not prove primality. A number which passes this test is therefore known as a "strong probable prime."

Due to the accuracy of this test, running it with some random a k times can be an extremely accurate method. This is in fact probably one of the most common means of checking primality. This test can be made deterministic, however, with just a little extra work.

Deterministic Miller-Rabin

The only thing needed to make Miller-Rabin deterministic is judicious choosing of values for a. The generic way to do this to check all a where 0 <= a <= min(n - 1, 2 * ln(n) ^ 2). However, for most values of n far fewer checks are needed. Instead, a value should be chosen from the table below:

n a's
< 1,373,653 2, 3
< 25,326,001 2, 3, 5
< 25,000,000,000, != 3,215,031,751 2, 3, 5, 7
< 2,152,302,898,747 2, 3, 5, 7, 11
< 3,474,749,660,383 2, 3, 5, 7, 11, 13
< 341,550,071,728,321 2, 3, 5, 7, 11, 13, 17
< 9,080,191 31, 73
< 4,759,123,141 2, 7, 61

Note: Though this method is believed to be deterministic, it has not been formally proven.

AKS

One method that was developed recently, to great fanfare, is known as AKS. This method was introduced in 2002, in a paper titled "PRIME is in P," proving that primality can be checked in polynomial time. Because of its importance, it can be tempting to attempt to implement this algorithm.

There are two problems with this. First of all, the algorithm is difficult to implement. The heart of it is solving a system of linear equations, which is best performed by Matlab or similar libraries.

Secondly, AKS is actually quite inefficient. Its importance comes because it can be mathematically proven to be deterministic and run in polynomial time, while prior methods have failed on one or the other.

ECPP

This is an advanced method, based on the properties of elliptical curves. This is the most common type of check performed by "industrial grade" primality checkers, and is outside the scope of this article.

Conclusion

So, what is the best way to check for primes? Each of these tests have different strengths and weaknesses. My advice is to try combining them in different ways, and profile the result. My conclusion was to do a divisor check for the primes from 1 to 257, followed by a deterministic Rabin-Miller for values within the range of the table, and beyond that perform a non-deterministic version with k = 5. I didn't actually try the Sieve method for comparison; it would probably make a good choice for something like n < 10,000,0000.

References

Here are some sites with more information:

Setting up a home servers for developers, Part 3: SSH

December 27th, 2008

One of the first things to set up is SSH access.  SSH allows you to open a remote prompt without physically being at the computer.

In this case, we’re going to use OpenSSH.  OpenSSH was at one point developed for OpenBSD, but is now available as a port for many operating systems.  In this case, we want the portable version.  It is downloadable from ftp://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/

Download this and unzip it, then compile and install it as normal; the only difference being the prefix (the directory under which the software is to be installed):

cd openssh-5.1p1

./configure --prefix=/opt/openssh-5.1p1

make

sudo make install

The SSH daemon, sshd, is now installed and runnable:

/opt/openssh-5.1p1/sbin/sshd

This is nice, but we want the daemon to start with the machine.  To do this, we create a new event for Upstart.  You can read more about upstart here, but for now we just need to create a new configuration file which describes our service.

/etc/events.d/sshd

start on runlevel 2
stop on runlevel [!2]

exec /opt/openssh-5.1p1/sbin/sshd

This starts the daemon when the system enters into runlevel 2 (system initialization is complete, networking is available, and the user is at the command prompt/graphical interface).  For this app, that is all that is needed.

Now, if you open up an SSH client from another machine, such as PuTTY, you should be able to connect successfully.