Transformation

%117 node_1 Parsing node_2 Transformation node_1->node_2 node_3 Code generation node_2->node_3
Step 2 of 3

The next type of stage for a compiler is transformation. Again, this just takes the AST from the last step and makes changes to it. It can manipulate the AST in the same language or it can translate it into an entirely new language.

Let’s look at how we would transform an AST.

You might notice that our AST has elements within it that look very similar. There are these objects with a type property. Each of these are known as an AST Node. These nodes have defined properties on them that describe one isolated part of the tree.

We can have a node for a NumberLiteral:

{
  type: 'NumberLiteral',
  value: '2'
}

Or maybe a node for a CallExpression:

{
  type: 'CallExpression',
  name: 'subtract',
  params: [...nested nodes go here...]
}

When transforming the AST we can manipulate nodes by adding/removing/replacing properties, we can add new nodes, remove nodes, or we could leave the existing AST alone and create an entirely new one based on it.

Since we’re targeting a new language, we’re going to focus on creating an entirely new AST that is specific to the target language.

Loading...

In order to navigate through all of these nodes, we need to be able to traverse through them. This traversal process goes to each node in the AST depth-first.

Let us first see how a typical depth first search is run.

{
  type: 'Program',
  body: [{
    type: 'CallExpression',
    name: 'add',
    params: [{
      type: 'NumberLiteral',
      value: '2'
    }, {
      type: 'CallExpression',
      name: 'subtract',
      params: [{
        type: 'NumberLiteral',
        value: '4'
      }, {
        type: 'NumberLiteral',
        value: '2'
      }]
    }]
  }]
}

So for the above AST we would go:

If we were manipulating this AST directly, instead of creating a separate AST, we would likely introduce all sorts of abstractions here. But just visiting each node in the tree is enough.

The reason I use the word “visiting” is because there is this pattern of how to represent operations on elements of an object structure.

Loading...

The basic idea here is that we are going to create a “visitor” object that has methods that will accept different node types.

var visitor = {
  NumberLiteral() {},
  CallExpression() {}
};

When we traverse our AST we will call the methods on this visitor whenever we encounter a node of a matching type.

In order to make this useful we will also pass the node and a reference to the parent node.

var visitor = {
  NumberLiteral(node, parent) {},
  CallExpression(node, parent) {}
};

Let us now view the code, where we will create the new AST.

// So we define a traverser function which accepts an AST and a
// visitor. Inside we're going to define two functions...
function traverser(ast, visitor) {
// A `traverseArray` function that will allow us to iterate over an array and
// call the next function that we will define: `traverseNode`.
function traverseArray(array, parent) {
array.forEach(function(child) {
traverseNode(child, parent);
});
}
// `traverseNode` will accept a `node` and its `parent` node. So that it can
// pass both to our visitor methods.
function traverseNode(node, parent) {
// We start by testing for the existence of a method on the visitor with a
// matching `type`.
var method = visitor[node.type];
// If it exists we'll call it with the `node` and its `parent`.
if (method) {
method(node, parent);
}
// Next we are going to split things up by the current node type.
switch (node.type) {
// We'll start with our top level `Program`. Since Program nodes have a
// property named body that has an array of nodes, we will call
// `traverseArray` to traverse down into them.
//
// (Remember that `traverseArray` will in turn call `traverseNode` so we
// are causing the tree to be traversed recursively)
case 'Program':
traverseArray(node.body, node);
break;
// Next we do the same with `CallExpressions` and traverse their `params`.
case 'CallExpression':
traverseArray(node.params, node);
break;
// In the case of `NumberLiterals` we don't have any child nodes to visit,
// so we'll just break.
case 'NumberLiteral':
break;
// And again, if we haven't recognized the node type then we'll throw an
// error.
default:
throw new TypeError(node.type);
}
}
// Finally we kickstart the traverser by calling `traverseNode` with our ast
// with no `parent` because the top level of the AST doesn't have a parent.
traverseNode(ast, null);
}
// So we have our transformer function which will accept the lisp ast.
function transformer(ast) {
// We'll create a `newAst` which like our previous AST will have a program
// node.
var newAst = {
type: 'Program',
body: []
};
// Next I'm going to cheat a little and create a bit of a hack. We're going to
// use a property named `context` on our parent nodes that we're going to push
// nodes to their parent's `context`. Normally you would have a better
// abstraction than this, but for our purposes this keeps things simple.
//
// Just take note that the context is a reference *from* the old ast *to* the
// new ast.
ast._context = newAst.body;
// We'll start by calling the traverser function with our ast and a visitor.
traverser(ast, {
// The first visitor method accepts `NumberLiterals`
NumberLiteral: function(node, parent) {
// We'll create a new node also named `NumberLiteral` that we will push to
// the parent context.
parent._context.push({
type: 'NumberLiteral',
value: node.value
});
},
// Next up, `CallExpressions`.
CallExpression: function(node, parent) {
// We start creating a new node `CallExpression` with a nested
// `Identifier`.
var expression = {
type: 'CallExpression',
callee: {
type: 'Identifier',
name: node.name
},
arguments: []
};
// Next we're going to define a new context on the original
// `CallExpression` node that will reference the `expression`'s arguments
// so that we can push arguments.
node._context = expression.arguments;
// Then we're going to check if the parent node is a `CallExpression`.
// If it is not...
if (parent.type !== 'CallExpression') {
// We're going to wrap our `CallExpression` node with an
// `ExpressionStatement`. We do this because the top level
// `CallExpressions` in JavaScript are actually statements.
expression = {
type: 'ExpressionStatement',
expression: expression
};
}
// Last, we push our (possibly wrapped) `CallExpression` to the `parent`'s
// `context`.
parent._context.push(expression);
}
});
// At the end of our transformer function we'll return the new ast that we
// just created.
return newAst;
}
function printAST(ast) {
function traverseArray(array, parent, depth) {
array.forEach(function(child) {
traverseNode(child, parent,depth);
});
}
function traverseNode(node, parent, depth) {
let spaces = "";
for(let i=0; i < depth; i ++) {
spaces = spaces + "\t";
}
// Next we are going to split things up by the current node type.
switch (node.type) {
case 'Program':
console.log(spaces, node.type);
traverseArray(node.body, node, depth+1);
break;
case 'CallExpression':
console.log(spaces, node.type);
traverseArray([node.callee], node, depth+1);
traverseArray(node.arguments, node, depth+1);
break;
case 'ExpressionStatement':
console.log(spaces, node.type);
traverseArray([node.expression], node, depth+1);
break;
case 'Identifier':
console.log(spaces, node.type, ' - ', node.name);
break;
case 'NumberLiteral':
console.log(spaces, node.type,' - ', node.value);
break;
default:
throw new TypeError(node.type);
}
}
traverseNode(ast, null, 0);
}
var ast = {
type: 'Program',
body: [{
type: 'CallExpression',
name: 'add',
params: [{
type: 'NumberLiteral',
value: '2'
}, {
type: 'CallExpression',
name: 'subtract',
params: [{
type: 'NumberLiteral',
value: '4'
}, {
type: 'NumberLiteral',
value: '2'
}]
}]
}]
};
var newAst = transformer(ast);
printAST(newAst);