As mentioned before the compiler is based off of the Super Tiny Compiler.
I strongly suggest that you read through and maybe even build your own compiler! It's a fun rewarding learning process. Anyway...
How it goes:
The compiler operates using 6 scripts:
1. Tokenizer
2. Parser
3. Transformer
4. Traverser
5. Generator
6. 'Compiler' script that pulls it all together
So what do they all do? Let's start with the parser.
The Tokenizer
Takes a string of code and break it down into an array of tokens. Like this:
tokens = [ {type: 'name', value: 'h1'}, {type: 'colon', value: ':'}, {type: 'bracket', value: '{' }, {type: 'paren', value: '('}, {type: 'arrow', value: '<'}, etc... ]
When the tokenizer encounters a character of interest ({, [, (, etc.) it either grabs only that character and saves it as a token object with a particular 'type' associated with it, or if a string is encountered, the full length of the string is counted as the token.
For our purposes the tokenizer is interesting because it is where all of the HTML elements and attributes are listed. I have not been exhaustive in accounting for elements and attributes.
The Parser
The simplest way to conceptualize the Parser is that it loops through the array of tokens, and compiles them into a larger object that includes the token type and value as nodes within that object. Since HTML is nested hierarchically, this object (known in the biz as an Abstract Syntax Tree (AST)) captures this quality nicely. At the end of the script you are left with an AST.
Traverser/Transformer
The Traverser and Transformer are tightly coupled. The Traverser's job is to navigate the AST provided by the Parser and to enter and exit nodes. The Transformer then defines methods that are called when a node of a certain type is encountered in the AST. This essentially builds the AST from the Parser into a new AST that contains all of the info we need in a format that can be easily digested.
Generator
This script pulls in the new AST the Transformer created and creates a string of straight HTML baby!
Compiler Script
This one just pulls the whole thing together and creates a pipeline. The output is routed right into the Handlebars compiler for templating.