a curated list of database news from authoritative sources

May 04, 2019

Writing a lisp compiler from scratch in JavaScript: 4. LLVM conditionals and compiling fibonacci

Previously in compiler basics: <! forgive me, for I have sinned >
1. lisp to assembly
2. user-defined functions and variables
3. LLVM
Next in compiler basics:
5. LLVM system calls
6. an x86 upgrade

In this post we'll extend the compiler's LLVM backend to support compiling conditionals such that we can support an implementation of the fibonacci algorithm.

Specifically we're aiming for the following:

$ cat tests/fib.lisp
(def fib (n)
     (if (< n 2)
         n
       (+ (fib (- n 1)) (fib (- n 2)))))

(def main ()
     (fib 8))
$ node src/ulisp.js tests/fib.lisp
$ ./build/prog
$ echo $?
21

To do this we'll have to add the <, - and if built-ins.

All source code is available on Github.

Subtraction

This is the easiest to add since we already support addition. They are both arithmetic operations that produce an integer. We simply add a mapping of - to the LLVM instruction sub so our LLVM backend constructor (src/backends/llvm.js) looks like this:

...

class Compiler {
  constructor() {
    this.outBuffer = [];
    this.primitiveFunctions = {
      def: this.compileDefine.bind(this),
      begin: this.compileBegin.bind(this),
      'if': this.compileIf.bind(this),
      '+': this.compileOp('add'),
      '-': this.compileOp('sub'),

...

Less than

The < builtin is a logical operation. These are handled differently from arithmetic operations in LLVM IR. A logical operation looks like this:

%3 = icmp slt i32 %1, %2

This says that we're doing an integer comparison, icmp, (with signed less than, slt) on the i32 integers in variables %1 and %2.

We can shim this into our existing compileOp helper like so:

...

class Compiler {
  constructor() {
    this.outBuffer = [];
    this.primitiveFunctions = {
      def: this.compileDefine.bind(this),
      begin: this.compileBegin.bind(this),
      'if': this.compileIf.bind(this),
      '+': this.compileOp('add'),
      '-': this.compileOp('sub'),
      '<': this.compileOp('icmp slt'),

...

Conditionals

The last part we need to add is support for conditional execution of code at runtime. Assembly-like languages handle this with "jumps" and "labels". Jumping causes execution to continue at the address being jumped to (instead of just the line following the jump instruction). Labels give you a way of naming an address instead of having to calculate it yourself. Our code will look vaguely like this:

  %test = icmp slt i32 %n, %1
  br i1 %test, label %iftrue, label %iffalse
iftrue:
  ; do true stuff
iffalse:
  ; do false stuff

  ; do next stuff

The br instruction can jump (or branch) conditionally or unconditionally. This snippet demonstrates a conditional jump.

But there are a few things wrong with this pseudo-code. First off if the condition is true, execution will just continue on into the false section once finished. Second, LLVM IR actually requires all labels to end with a branch instruction. So we'll add a new label after the true and false section called ifresult and jump to it unconditionally after both.

  %test = icmp slt i32 %n, %1
  br i1 %test, label %iftrue, label %iffalse
iftrue:
  ; do true stuff
  br label %ifresult
iffalse:
  ; do false stuff
  br label %ifresult
ifresult:
  ; do next stuff

Scope

One last thing we'll need to do before implementing the code generation for this is to update our Scope class to accept symbol prefixes so we can pass our labels through Scope to make sure they are unique but still have useful names.

...

class Scope {

  ...

  symbol(prefix = 'sym') {
    const nth = Object.keys(this.locals).length + 1;
    return this.register(prefix + nth);
  }

...

compileIf

Now we can add a primitive function mapping if to a new compileIf helper and implement the helper.

...

class Compiler {
  constructor() {
    this.outBuffer = [];
    this.primitiveFunctions = {
      def: this.compileDefine.bind(this),
      begin: this.compileBegin.bind(this),
      '+': this.compileOp('add'),
      '-': this.compileOp('sub'),
      '<': this.compileOp('icmp slt'),
      'if': this.compileIf.bind(this),

...

  compileIf([test, thenBlock, elseBlock], destination, scope) {
    const testVariable = scope.symbol();

    // Compile expression and branch
    this.compileExpression(test, testVariable, scope);
    const trueLabel = scope.symbol('iftrue');
    const falseLabel = scope.symbol('iffalse');
    this.emit(1, `br i1 %${testVariable}, label %${trueLabel}, label %${falseLabel}`);

    // Compile true section
    this.emit(0, trueLabel + ':');
    this.compileExpression(thenBlock, destination, scope);
    const endLabel = scope.symbol('ifend');
    this.emit(1, 'br label %' + endLabel);
    this.emit(0, falseLabel + ':');

    // Compile false section
    this.compileExpression(elseBlock, destination, scope);
    this.emit(1, 'br label %' + endLabel);

    // Compile cleanup
    this.emit(0, endLabel + ':');
  }
...

Note that this code generation sends the destination variable into both the true and false sections. Let's try it out.

$ node src/ulisp.js tests/fib.lisp
llc: error: llc: build/prog.ll:19:3: error: multiple definition of local value named 'sym5'
  %sym5 = add i32 %sym15, %sym16
  ^
child_process.js:665
    throw err;
    ^

Error: Command failed: llc -o build/prog.s build/prog.ll
llc: error: llc: build/prog.ll:19:3: error: multiple definition of local value named 'sym5'
  %sym5 = add i32 %sym15, %sym16

That's annoying. An unfortunate aspect of LLVM's required single-static assignment form is that you cannot reuse variable names within a function even if it is not possible for the variable to be actually reused.

To work around this we need to allocate memory on the stack, store the result in each true/false section in this location, and read from this location afterward to store it in the destination variable.

Stack memory instructions

LLVM IR gives us alloca to allocate memory on the stack, store to store memory at a stack address, and load to read the value at a stack address into a variable. Here's a simple example:

%myvar = add i32 42, 0
%stackaddress = alloca i32, align 4
store i32 %myvar, i32* %stackaddress, align 4
%newvar = load i32, i32* %stackaddress, align 4

Such that newvar is now 42.

compileIf again

Applying this back to our compileIf helper gives us:

...

  compileIf([test, thenBlock, elseBlock], destination, scope) {
    const testVariable = scope.symbol();
    const result = scope.symbol('ifresult');
    // Space for result
    this.emit(1, `%${result} = alloca i32, align 4`);

    // Compile expression and branch
    this.compileExpression(test, testVariable, scope);
    const trueLabel = scope.symbol('iftrue');
    const falseLabel = scope.symbol('iffalse');
    this.emit(1, `br i1 %${testVariable}, label %${trueLabel}, label %${falseLabel}`);

    // Compile true section
    this.emit(0, trueLabel + ':');
    const tmp1 = scope.symbol();
    this.compileExpression(thenBlock, tmp1, scope);
    this.emit(1, `store i32 %${tmp1}, i32* %${result}, align 4`);
    const endLabel = scope.symbol('ifend');
    this.emit(1, 'br label %' + endLabel);
    this.emit(0, falseLabel + ':');

    // Compile false section
    const tmp2 = scope.symbol();
    this.compileExpression(elseBlock, tmp2, scope);
    this.emit(1, `store i32 %${tmp2}, i32* %${result}, align 4`);
    this.emit(1, 'br label %' + endLabel);

    // Compile cleanup
    this.emit(0, endLabel + ':');
    this.emit(1, `%${destination} = load i32, i32* %${result}, align 4`);
  }

...

Trying it out

We run our compiler one more time:

$ node src/ulisp.js tests/fib.lisp
$ ./build/prog
$ echo $?
21

And get what we expect!

Next up

  • Tail call optimization
  • Lists and dynamic memory
  • Strings?
  • Foreign function calls?
  • Self-hosting?

April 30, 2019

Responsibility and ownership

Responsibility is only possible by granting ownership and setting expectations. If you don't turn over ownership, don't expect folks to take responsibility. When you grant ownership and set expectations, you'll be astounded what folks will accomplish without you.

I am astounded.

April 29, 2019

2019 MySQL Community Contributor Award Program

Building the Vitess community has been our pride and joy. Being able to contribute to the MySQL community, even more so. Vitess' Sugu Sougoumarane has been nominated by the MySQL group for Oracle's 2019 MySQL Community Contributor Award Program, where he joins folks like Shlomi Noach, Peter Zaitsev, Gabriela D'Ávila Ferrara, Giuseppe Maxia and many other active MySQL community members recognised for their contributions to MySQL. Criteria for the nominations included: most active code contributor, bug report,most active MySQL blogger, people who play an active role in translating or documenting MySQL articles, people who provide feedback on DMR releases, Labs release, or change proposal, as well as anyone in the community who did really useful work that ought to be thanked publicly.

April 21, 2019

April 15, 2019

Typical Challenges of Building Your Data Layer

When you start a digital product you usually put your data in a database. It does not matter if it is a simple text file, an excel spreadsheet or a managed Postgres instance on the cloud, your data always lives somewhere.

April 14, 2019

Interpreting TypeScript

In addition to providing a static type system and compiler for a superset of JavaScript, TypeScript makes much of its functionality available programmatically. In this post we'll use the TypeScript compiler API to build an interpreter. We'll build off of a TypeScript wiki article and cover a few areas that were confusing to me as I built out a separate project.

The end result we're building will look like this:

$ cat test.ts # A program we can interpret
print(1 + 5);
$ tsc interpreter.ts # Build the source code for the interpreter
$ node interpreter.js test.ts # Run the interpreter against test program
6

All code is available on Github.

Setup

To begin with, we need Node.js and some dependencies:

$ yarn add typescript @types/node

Then we can begin the first stage of an interpreter: parsing the code.

Parsing

Parsing a fixed set of files is simple enough. We pass a list of files to createProgram along with compiler options. But, as a user, we don't want to keep track of all files used by a program (i.e. everything we import). The most ideal situation is to pass a single-file entrypoint (something like a main.js) and have our interpreter figure out all the imports and handle them recursively. More on this later, for now we'll just parse the single-file entrypoint.

import * as ts from 'typescript';

const TS_COMPILER_OPTIONS = {
  allowNonTsExtensions: true,
};

function parse(fileName: string): ts.Program {
  return ts.createProgram([fileName], TS_COMPILER_OPTIONS);
}

function interpret(program: ts.Program) { // TODO }

function main(entrypoint: string) {
  const program = parse(entrypoint);
  interpret(program);
}

main(process.argv[2]);

interpret and ts.Program

A program contains all source files as well as any implicitly needed TypeScript definition files (for us it will just be the TypeScript definitions for the Node.js standard library).

The program also gives us access to a type checker that we can use to query the type of any node in the program tree. We'll get into this in another post.

Our interpret program will iterate over all the source files, ignoring the TypeScript definition files, and call interpretNode on all the elements of the source file.

function interpretNode(node: ts.Node) { // TODO }

function interpret(program: ts.Program) {
  return program.getSourceFiles().map((source) => {
    const { fileName } = source;
    if (fileName.endsWith('.d.ts')) {
      return;
    }

    const results = [];
    ts.forEachChild(source, (node) => {
      results.push(interpretNode(node));
    });
    return results;
  });
}

interpretNode and ts.Node

A Node is a wrapper for most elements of what we consider a program to be, such as a binary expression (2 + 3), a literal expression (2), a function call expression (a(c)), and so forth. When exploring a parser, it takes time to become familiar with the particular way that a parser breaks out a program into a tree of nodes.

As a concrete example, the following program:

print(a);

Will be built into ts.Node tree along these lines:

Node: ExpressionStatement: print(a);
  Node: CallExpression: print, a
    Node: Identifier: print
    Node: Identifier: a
Node: EndOfFileToken

And another example:

1 + 3;

Will be built into a ts.Node tree along these lines:

Node: Expression: 1 + 3
  Node: BinaryExpression: 1, 3, +
    Node: NumericLiteral: 1
    Node: NumericLiteral: 3
    Node: PlusToken
Node: EndOfFileToken

But how would one come to know this?

Exploring the ts.Node tree

The easiest thing to do is throw an error on every Node type we don't yet know about and fill in support for each program we throw at the interpreter.

For example:

function interpretNode(node: ts.Node) {
  switch (node.kind) {
    default:
      throw new Error('Unsupported node type: ' + ts.SyntaxKind[node.kind]);
  }
}

Now let's run our interpreter against an input file, test.ts, that combines these two to make a semi-interesting program:

$ cat test.ts
print(1 + 2);
$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: ExpressionStatement
...

And we see an outer wrapper, an ExpressionStatement. To proceed we look up the definition of an ExpressionStatement in TypeScript source code, src/compiler/types.ts to be specific. This file will become our best friend. Hit ctrl-f and look for "interface ExpressionStatement ". We see that it has only one child, expression, so we call interpretNode on this recursively:

function interpretNode(node: ts.Node) {
  switch (node.kind) {
    case ts.SyntaxKind.ExpressionStatement: {
      const es = node as ts.ExpressionStatement;
      return interpretNode(es.expression);
    }
    default:
      throw new Error('Unsupported node type: ' + ts.SyntaxKind[node.kind]);
  }
}

Thankfully TypeScript will be very quick to call us out if we misunderstand this structure.

It's pretty weird to me that the ts.Node tree is organized such that I must cast at each ts.Node but that's what they do even in the TypeScript source so I don't think I'm misunderstanding.

Now we recompile and run the interpreter against the program to discover the next ts.Node type.

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: CallExpression
...

Cool! Back to src/compiler/types.ts. Call expressions are complex enough that we'll break out handling them into a separate function.

interpretCall and ts.CallExpression

From our reading of types.ts we need to handle the expression that evaluates to a function, and we need to handle its parameters. We'll just call interpretNode on each of these to get their real value. And finally we'll call the function with the arguments.

function interpretCall(ce: ts.CallExpression) {
  const fn = interpretNode(ce.expression);
  const args = ce.arguments.map(interpretNode);
  return fn(...args);
}

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.CallExpression: {
      const ce = node as ts.CallExpression;
      return interpretCall(ce);
    }
    ...
  }
}

Please ignore the fact that we are not correctly setting this here.

Recompile and let's see what's next!

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: Identifier
...

And back to types.ts.

ts.Identifier

In order to support identifiers in general we'd need to have a context we could use to look up the value of an identifier. But we don't have a context like this right now so we'll add builtin support for a print function so we can get some output!

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.Identifier: {
      const id = (node as ts.Identifier).escapedText as string;
      if (id === 'print') {
        return function (...args) { console.log(...args); };
      }

      throw new Error('Unsupported identifier: ' + id);
    }
    ...
  }
}

Recompile and let's see what's next!

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: BinaryExpression
...

And we're finally into the parameters.

interpretBinaryExpression and ts.BinaryExpression

Looking into types.ts for this Node type suggests we may want to break this out into its own function as well; there are a ton of operator types. Within the interpretBinaryExpression helper we'll interpret each operand and then switch on the operator type. We'll throw an error on operators we don't know about -- all of them at first:

function interpretBinaryExpression(be: ts.BinaryExpression) {
  const left = interpretNode(be.left);
  const right = interpretNode(be.right);
  switch (be.operatorToken.kind) {
    default:
      throw new Error('Unsupported operator: ' + ts.SyntaxKind[be.operatorToken.kind]);
  }
}

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.BinaryExpression: {
      const be = node as ts.BinaryExpression;
      return interpretBinaryExpression(be);
    }
    ...
  }
}

We know the drill.

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: FirstLiteralToken
...

At this point we're actually failing first on an unknown node type rather than an operator. This is because we interpret the operands (which are numeric literals) before we look up the operator. Time to revisit types.ts!

ts.FirstLiteralToken, ts.NumericLiteral

Looking at types.ts shows us that FirstLiteralToken is a synonym for NumericLiteral. The latter name is more obvious, so let's add that to our supported Node list:

function interpretNode() {
  switch (node.kind) {
    ...
      case ts.SyntaxKind.NumericLiteral: {
      const nl = node as ts.NumericLiteral;
      return Number(nl.text);
    }
    ...
  }
}

And we keep going!

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported operator: PlusToken
...

And we're into unknown operator territory!

interpretBinaryExpression and ts.PlusToken

A simple extension to our existing interpretBinaryExpression, we return the sum of the left and right values:

function interpretBinaryExpression(be: ts.BinaryExpression) {
  const left = interpretNode(be.left);
  const right = interpretNode(be.right);
  switch (be.operatorToken.kind) {
    case ts.SyntaxKind.PlusToken:
      return left + right;
    default:
      throw new Error('Unsupported operator: ' + ts.SyntaxKind[be.operatorToken.kind]);
  }
}

And we give it another shot.

$ tsc interpreter.ts
$ node interpreter.js test.ts
...
Error: Unsupported node type: EndOfFileToken
...

ts.SyntaxKind.EndOfFileToken

Our final Node type before a working program, we simply do nothing:

function interpretNode() {
  switch (node.kind) {
    ...
    case ts.SyntaxKind.EndOfFileToken:
      break;
    ...
  }
}

One more time:

$ tsc interpreter.ts
$ node interpreter.js test.ts
3

A working program! And if we jiggle the test?

$ cat test.ts
print(1 + 5);
$ node interpreter.js test.ts
6

We're well on our way to interpreting TypeScript, and have gained some familiarity with the TypeScript Compiler API.

All code is available on Github.

April 06, 2019

Writing a web server from scratch: 1. HTTP and sockets

Say we have some HTML:

<html>
  <body>
    <h1>Hello world!</h1>
  </body>
</html>

And say we'd like to be able to render this page in a web browser. If the server is hosted locally we may want to enter localhost:9000/hello-world.html in the address bar, hit enter, make a request (done by the browser), receive a response (sent by some server), and render the result (done by the browser).

Here is a minimal, often incomplete, and unsafe Node.js program (about 100 LoC) that would serve this (code available on Github):

const fs = require('fs');
const net = require('net');

const CRLF = '\r\n';

const HELLO_WORLD = `<html>
  <body>
    <h1>Hello world!</h1>
  </body>
</html>`;
const NOT_FOUND = `<html>
  <body>
    <h1>Not found</h1>
  </body>
</html>`;

class HTTPRequestHandler {
  constructor(connection) {
    this.connection = connection;
    this.request = {
      statusLine: null,
      headers: {},
      body: null,
    };
  }

  parse(buffer) {
    const lines = buffer.toString().split(CRLF);

    // Parse/store status line if necessary
    if (!this.request.statusLine) {
      const [method, path, protocol] = lines.shift().split(' ');
      this.request.statusLine = { method, path, protocol };
    }

    // Parse/store headers if the body hasn't begun
    if (!this.request.body) {
      for (let line = lines.shift(); lines.length; line = lines.shift()) {
        // Reached the end of headers, double CRLF
        if (line === '') {
          this.request.body = '';
          break;
        }

        const [key, value] = line.split(':');

        const safeKey = key.toLowerCase();
        if (!this.request.headers[safeKey]) {
          this.request.headers[safeKey] = [];
        }

        this.request.headers[safeKey].push(value.trimStart());
      }
    }

    this.request.body += lines.join(CRLF);
  }

  requestComplete() {
    if (!this.request.statusLine || !Object.keys(this.request.headers).length || this.request.body === null) {
      return false;
    }

    const [contentLength] = this.request.headers['content-length'] || [];
    if (this.request.statusLine.method !== 'GET' && this.request.body.length !== contentLength) {
      return false;
    }

    return true;
  }

  sendResponse() {
    const response = { status: 200, statusMessage: 'OK', body: '' };

    if (this.request.statusLine.path === '/hello-world.html') {
      response.body = HELLO_WORLD;
    } else {
      response.status = 404;
      response.statusMessage = 'NOT FOUND';
      response.body = NOT_FOUND;
    }

    const serialized = 'HTTP/1.1 ${response.status} ${response.statusMessage}' + CRLF +
                       'Content-Length: ' + response.body.length + CRLF + CRLF +
                       response.body;
    this.connection.write(serialized);
  }

  handle(buffer) {
    this.parse(buffer);

    if (!this.requestComplete()) {
      return;
    }

    this.sendResponse();

    // Other-wise the connection may attempt to be re-used, we don't support this.
    this.connection.end();
  }
}

function handleConnection(connection) {
  const handler = new HTTPRequestHandler(connection);
  connection.on('data', (buffer) => handler.handle(buffer));
}

const server = net.createServer(handleConnection);

server.listen('9000');

So what's going on?

The protocol

HTTP (version 1.1, specifically) is a convention for connecting over TCP/IP and sending plain-text messages between two processes. HTTP messages are broken into two categories: requests (the sender of a request is called a "client") and responses (the sender of a response is called a "server").

HTTP is important because it is the default protocol of web browsers. When we type in localhost:9000/hello-world.html and hit enter, the browser will open an TCP/IP connection to the location localhost on the port 9000 and send an HTTP request. If/when it receives the HTTP response from the server it will try to render the response.

An HTTP request

A bare minimum HTTP/1.1 request (defined here) based on the request for localhost:9000/hello-world.html is the following:

GET /hello-world.html HTTP/1.1\r\nHost: localhost:9000\r\n\r\n

The spec explicitly requires the \r\n combo to represent a newline instead of simply \n.

If we printed out this request it would look like this:

GET /hello-world.html HTTP/1.1
Host: localhost:9000

Components of an HTTP request

An HTTP/1.1 request is made up of a few parts:

  • [Mandatory]: The status line (the first line) followed by a CRLF (the \r\n combo)
  • [Mandatory]: HTTP headers separated by a CRLF and followed by an additional CRLF
  • [Optional]: The request body

The status line consists of the request method (e.g. GET, POST, PUT, etc.), the path for the request, and the protocol -- all separated by a space.

An HTTP header is a key-value pair separated by a colon. Spaces following the colon are ignored. The key is case insensitive. Only the Host header appears to be mandatory. Since these headers are sent by the client they are intended for the server's use.

The request body is text and is only relevant for requests of certain methods (e.g. POST but not GET).

An HTTP response

A bare minimum HTTP/1.1 response (defined here) based on the file we wanted to send back is the following:

HTTP/1.1 200 OK\r\n\r\n<html>\n  <body>\n    <h1>Hello world!</h1>\n  </body>\n</html>

If we printed out this response it would look like this:

HTTP/1.1 200 OK

<html>
  <body>
    <h1>Hello world!</h1>
  </body>
</html>

Components of an HTTP response

An HTTP/1.1 response is made up of a few parts:

  • [Mandatory]: The status line (the first line) followed by a CRLF
  • [Optional]: HTTP headers separated by a CRLF and followed by an additional CRLF
  • [Optional]: The request body

The status line consists of the protocol, the status code, and the status message -- all separated by a space.

HTTP response headers are the same as HTTP request headers although in a response they are directives from the server to the client. There are many standard headers that are used for such things as setting cache rules, setting cookies, settings response type (e.g. HTML vs CSS vs PNG so the browser knows how to handle the response).

The response body is similar to the HTTP request body.

Sockets

Most operating systems have a built-in means of connecting over TCP/IP (and sending and receiving messages) called "sockets". Sockets allow us to treat TCP/IP connections like files in memory. Most programming languages have a built-in socket library. Node.js provides a high-level interface for listening on a port and handling new connections.

function handleConnection(connection) {
  connection.on('data', (buffer) => doSomething???);
}

const server = net.createServer(handleConnection);

server.listen('9000');

Once the program is listening, clients can open TCP/IP connections to the address (localhost) and port (9000) and our program takes over from there. Each connection is handled separately and receives "data" events. Each data event includes new bytes available for us to handle.

Let's encapsulate the state of each connection in HTTPRequestHandler class. Its function will be to parse data as it becomes available and respond to the request when the request is done.

class HTTPRequestHandler {
  constructor(connection) {
    this.connection = connection;
  }

  parse(buffer) {}

  requestComplete() {}

  sendResponse() {}

  handle(buffer) {
    this.parse(buffer);

    if (!this.requestComplete()) {
      return;
    }

    this.sendResponse();

    // Other-wise the connection may attempt to be re-used, we don't support this.
    this.connection.end();
  }
}

function handleConnection(connection) {
  const handler = new HTTPRequestHandler(connection);
  connection.on('data', (buffer) => handler.handle(buffer));
}

...

There are three functions we need to implement now: parse(buffer), requestComplete(), and sendResponse.

parse(buffer)

This function will be responsible for progressively pulling out data from the buffer. If the status line has not been received, it will try to grab the status line. If the body has not yet started, it will accumulate headers. Then it will continue accumulating the body until we close the connection (this will happen implicitly when requestComplete() returns true).

class HTTPRequestHandler {
  constructor(connection) {
    this.connection = connection;
    this.request = {
      statusLine: null,
      headers: {},
      body: null,
    };
  }

  parse(buffer) {
    const lines = buffer.toString().split(CRLF);

    // Parse/store status line if necessary
    if (!this.request.statusLine) {
      const [method, path, protocol] = lines.shift().split(' ');
      this.request.statusLine = { method, path, protocol };
    }

    // Parse/store headers if the body hasn't begun
    if (this.request.body === null) {
      for (let line = lines.shift(); lines.length; line = lines.shift()) {
        // Reached the end of headers, double CRLF
        if (line === '') {
          this.request.body = '';
          break;
        }

        const [key, value] = line.split(':');

        const safeKey = key.toLowerCase();
        if (!this.request.headers[safeKey]) {
          this.request.headers[safeKey] = [];
        }

        this.request.headers[safeKey].push(value.trimStart());
      }
    }

    this.request.body += lines.join(CRLF);
  }

...

}

requestComplete()

This function will look at the internal request state and return false if the status line has not been received, no headers have been received (although this is stricter than the HTTP/1.1 standard requires), or if the body length is not equal to the value of the Content-Length header.

class HTTPRequestHandler {

...

  requestComplete() {
    if (!this.request.statusLine || !Object.keys(this.request.headers).length || this.request.body === null) {
      return false;
    }

    const [contentLength] = this.request.headers['content-length'] || [];
    if (this.request.statusLine.method !== 'GET' && this.request.body.length !== contentLength) {
      return false;
    }

    return true;
  }

...

}

sendResponse()

Finally we'll hard-code two responses (one for the valid request for /hello-world.html and a catch-all 404 response for every other request). These responses need to be serialized according the HTTP response format described above and written to the connection.

class HTTPRequestHandler {

...

  sendResponse() {
    const response = { status: 200, statusMessage: 'OK', body: '' };

    if (this.request.statusLine.path === '/hello-world.html') {
      response.body = HELLO_WORLD;
    } else {
      response.status = 404;
      response.statusMessage = 'NOT FOUND';
      response.body = NOT_FOUND;
    }

    const serialized = 'HTTP/1.1 ${response.status} ${response.statusMessage}' + CRLF +
                       'Content-Length: ' + response.body.length + CRLF + CRLF +
                       response.body;
    this.connection.write(serialized);
  }  

...

}

Run it

Now that we've got all the pieces we can finally run the initial program:

$ node uweb.js &
$ open localhost:9000/hello-world.html

And we see the page! Try any other path and we receive a 404.

Review and next steps

We covered the basics of HTTP/1.1: a very simple, plain-text protocol oriented around requests and responses over a TCP/IP connection. We realize we need to know little about anything but parsing and formatting text on top of the TCP/IP blackbox called sockets. We created a simple application that returns different responses based on the request. But we're a far shot from a more general library, a web framework. Future posts will explore this transition as well as performance and more features.

Code is available on Github.