a curated list of database news from authoritative sources

January 16, 2020

Update your analytical data selectively

Updating specific records in your analytical database couldn't be easier with Tinybird's new 'replace with condition' functionality

January 14, 2020

Vitess Graduation Retrospective

Last November, Vitess became the eighth CNCF project to reach graduation, joining a host of amazing projects such as Kubernetes, Prometheus, Envoy, CoreDNS, containerd, Fluentd, and Jaeger. To contextualize this milestone, I picked some tidbits from the brain of Vitess co-creator, Sugu Sougoumarane, allowing him to share internal perspective about how we got here, the hurdles we faced, and where we’re headed. Incubation # Considering that it was only the February of 2018 when the CNCF agreed to host Vitess as an incubating project, Vitess has made efficient strides toward becoming the de-facto standard for horizontally scaling MySQL.

January 12, 2020

Queries Effect Performance

Queries effect database performance. That’s not a typo: “effect” not “affect”. The difference is not a word game but an important way to think about database performance.

Many things can affect performance. For example, if the system runs out of memory then starts swapping which causes high disk IO latency, that will negatively affect performance. But external factors like that notwithstanding (i.e. when hardware and MySQL are normal and stable), it’s important to understand that queries effect performance.

January 09, 2020

December 15, 2019

Fixing Ghosted GTIDs

MySQL auto-positioning is an integral part of replication with GTID, but it’s neither required nor guaranteed to work. It’s possible to enable GTIDs but disable auto-positioning, and it’s possible that one MySQL instance cannot auto-position on another even when GTIDs are used. The former (GTID on but auto-pos off) is an issue for another time. The latter is the topic of this post: when MySQL GTID auto-positioning fails—and how to fix it.

December 11, 2019

Quick hack for GTID_OWN lack

One of the benefits of MySQL GTIDs is that each server remembers all GTID entries ever executed. Normally these would be ranges, e.g. 0041e600-f1be-11e9-9759-a0369f9435dc:1-3772242 or multi-ranges, e.g. 24a83cd3-e30c-11e9-b43d-121b89fcdde6:1-103775793, 2efbcca6-7ee1-11e8-b2d2-0270c2ed2e5a:1-356487160, 46346470-6561-11e9-9ab7-12aaa4484802:1-26301153, 757fdf0d-740e-11e8-b3f2-0a474bcf1734:1-192371670, d2f5e585-62f5-11e9-82a5-a0369f0ed504:1-10047. One of the common problems in asynchronous replication is the issue of consistent reads. I’ve just written to the master. Is the data […]

December 08, 2019

Writing a lisp compiler from scratch in JavaScript: 6. an x86 upgrade

Previously in compiler basics: <! forgive me, for I have sinned >
1. lisp to assembly
2. user-defined functions and variables
3. LLVM
4. LLVM conditionals and compiling fibonacci
5. LLVM system calls

This post upgrades the ulisp x86 backend from using a limited set of registers (with no spilling support) to solely using the stack to pass values between expressions.

This is a slightly longer post since we've got a lot of catchup to do to get to feature parity with the LLVM backend. Namely:

  • "Infinite" locals, parameters
  • Function definitions
  • Variable references
  • Arithmetic and logical operations
  • If
  • Syscalls

We'll tackle the first four points first and finish up with the last two. This way we can support the same fibonacci program that prints integers to stdout that we support in the LLVM backend.

As always the code is available on Github.

But first a digression into how this is suddenly easy for us to do with x86 and one-pass (sorta) code generation.

Stack-based languages

Stack-based languages have the extremely convenient attribute that values are (by default) stored on the stack, which allows a code generator targeting a stack-based language the option to omit handling register allocation. And as it happens, x86 has enough support to make it easy to treat as a stack machine.

As we build out the code generator for x86 as a stack machine we need to keep two commitments in mind:

  • Every expression must pop all arguments/operands
  • Every expression must store one result back on the stack

In the future, we may replace the second commitment. But for now it is more than enough.

Boilerplate

We'll start with the existing x86 backend code and strip all the implementation code:

const cp = require('child_process');
const fs = require('fs');
const os = require('os');

let GLOBAL_COUNTER = 0;

const SYSCALL_MAP = {
  darwin: {
    exit: '0x2000001',
    write: '0x2000004',
  },
  linux: {
    exit: 60,
    write: 1,
  },
}[os.platform()];

class Scope {}

class Compiler {
  constructor() {
    this.outBuffer = [];
    this.primitiveFunctions = {
      def: this.compileDefine.bind(this),
      begin: this.compileBegin.bind(this),
      if: this.compileIf.bind(this),
      ...this.prepareArithmeticWrappers(),
      ...this.prepareLogicalWrappers(),
      ...this.prepareSyscallWrappers(),
    };
  }

  prepareArithmeticWrappers() {}

  prepareLogicalWrappers() {}

  prepareSyscallWrappers() {}

  emit(depth, args) {
    if (depth === undefined || args === undefined) {
      throw new Error('Invalid call to emit');
    }

    const indent = new Array(depth + 1).join('  ');
    this.outBuffer.push(indent + args);
  }

  compileExpression(arg, scope, depth) {}

  compileIf([test, then, els], scope, depth) {}

  compileBegin(body, scope, depth, topLevel = false) {}

  compileDefine([name, params, ...body], scope, depth) {}

  compileCall(fun, args, scope, depth) {}

  emitPrefix() {
    this.emit(1, '.global _main\n');

    this.emit(1, '.text\n');
  }

  emitPostfix() {
    this.emit(0, '_main:');
    this.emit(1, 'CALL main');
    this.emit(1, 'MOV RDI, RAX'); // Set exit arg
    this.emit(1, `MOV RAX, ${SYSCALL_MAP['exit']}`);
    this.emit(1, 'SYSCALL');
  }

  getOutput() {
    const output = this.outBuffer.join('\n');

    // Leave at most one empty line
    return output.replace(/\n\n\n+/g, '\n\n');
  }
}

module.exports.compile = function(ast) {
  const c = new Compiler();
  c.emitPrefix();
  const s = new Scope();
  c.compileBegin(ast, s, 1, true);
  c.emitPostfix();
  return c.getOutput();
};

module.exports.build = function(buildDir, program) {
  const prog = 'prog';
  fs.writeFileSync(`${buildDir}/${prog}.s`, program);
  cp.execSync(
    `gcc -mstackrealign -masm=intel -o ${buildDir}/${prog} ${buildDir}/${prog}.s`,
  );
};

The prefix and postfix stays mostly the same as the original implementation. But we'll assume a couple of new helpers to get us in parity with the LLVM backend:

  • compileDefine
  • compileBegin
  • compileIf
  • compileCall
  • prepareArithmeticWrappers
  • prepareLogicalWrappers
  • prepareSyscallWrappers

The prepareArithmeticWrappers helper will define wrappers for arithmetic instructions. The prepareLogicalWrappers helper will define wrappers for logical instructions. And the prepareSyscallWrappers helper will define a wrapper for syscalls and generate builtins based on the SYSCALL_MAP entries.

Scope

Similar to our LLVM backend's Context and Scope helpers we'll define our own for the x86 backend. Since we're placing all locals on the stack, the two most important things Scope will do for us are:

  • Map identifiers to escaped strings
  • Store and increment the location of the local on the stack

Here's what it will look like:

class Scope {
  constructor() {
    this.localOffset = 1;
    this.map = {};
  }

  assign(name) {
    const safe = name.replace('-', '_');
    this.map[safe] = this.localOffset++;
    return safe;
  }

  symbol() {
    return this.localOffset++;
  }

  lookup(name) {
    const safe = name.replace('-', '_');
    if (this.map[safe]) {
      return { name: safe, offset: this.map[safe] };
    }

    return null;
  }

  copy() {
    const s = new Scope();
    // In the future we may need to store s.scopeOffset = this.scopeOffset + 1
    // so we can read outer-scoped values at runtime.
    s.map = { ...this.map };
    return s;
  }
}

compileExpression

An expression will be one of:

  • A function call (possibly a builtin like def or +)
  • A literal value (e.g. 29)
  • A reference (e.g. &c)
  • An identifier (e.g. my-var)

We'll handle compiling an expression in that order. If the AST argument passed to compileExpression is an array, we will call compileCall and return.

  compileExpression(arg, scope, depth) {
    // Is a nested function call, compile it
    if (Array.isArray(arg)) {
      this.compileCall(arg[0], arg.slice(1), scope, depth);
      return;
    }
  }

If the AST is a number, we will push the number onto the stack and return.

  compileExpression(arg, scope, depth) {
    // Is a nested function call, compile it
    if (Array.isArray(arg)) {
      this.compileCall(arg[0], arg.slice(1), scope, depth);
      return;
    }

    if (Number.isInteger(arg)) {
      this.emit(depth, `PUSH ${arg}`);
      return;
    }
  }

If the AST is a string that starts with & we will look up the location of the identifier after the &, push its location onto the stack and return.

We count on the Scope storing its offset from the "frame pointer", which we will later set up to be stored in RBP.

Locals will be stored after the frame pointer and parameters will be stored before it. So we'll need to add or subtract from the frame pointer depending on if we need a positive or negative offset from it.

  compileExpression(arg, scope, depth) {
    // Is a nested function call, compile it
    if (Array.isArray(arg)) {
      this.compileCall(arg[0], arg.slice(1), scope, depth);
      return;
    }

    if (Number.isInteger(arg)) {
      this.emit(depth, `PUSH ${arg}`);
      return;
    }

    if (arg.startsWith('&')) {
      const { offset } = scope.lookup(arg.substring(1));
      // Copy the frame pointer so we can return an offset from it
      this.emit(depth, `MOV RAX, RBP`);
      const operation = offset < 0 ? 'ADD' : 'SUB';
      this.emit(depth, `${operation} RAX, ${Math.abs(offset * 8)} # ${arg}`);
      this.emit(depth, `PUSH RAX`);
      return;
    }
  }

Finally, we'll look up the identifier and copy the value (in its offset on the stack) to the top of the stack.

  compileExpression(arg, scope, depth) {
    // Is a nested function call, compile it
    if (Array.isArray(arg)) {
      this.compileCall(arg[0], arg.slice(1), scope, depth);
      return;
    }

    if (Number.isInteger(arg)) {
      this.emit(depth, `PUSH ${arg}`);
      return;
    }

    if (arg.startsWith('&')) {
      const { offset } = scope.lookup(arg.substring(1));
      // Copy the frame pointer so we can return an offset from it
      this.emit(depth, `MOV RAX, RBP`);
      const operation = offset < 0 ? 'ADD' : 'SUB';
      this.emit(depth, `${operation} RAX, ${Math.abs(offset * 8)} # ${arg}`);
      this.emit(depth, `PUSH RAX`);
      return;
    }

    // Variable lookup
    const { offset } = scope.lookup(arg);
    if (offset) {
      const operation = offset < 0 ? '+' : '-';
      this.emit(
        depth,
        `PUSH [RBP ${operation} ${Math.abs(offset * 8)}] # ${arg}`,
      );
    } else {
      throw new Error(
        'Attempt to reference undefined variable or unsupported literal: ' +
          arg,
      );
    }
  }

And that's it for handling expression! Let's add compileCall support now that we referenced it.

compileCall

A call will first check if the call is a builtin. If so, it will immediately pass control to the builtin.

  compileCall(fun, args, scope, depth) {
    if (this.primitiveFunctions[fun]) {
      this.primitiveFunctions[fun](args, scope, depth);
      return;
    }
  }

Otherwise it will compile every argument to the call (which will leave all the resulting values on the stack.)

  compileCall(fun, args, scope, depth) {
    if (this.primitiveFunctions[fun]) {
      this.primitiveFunctions[fun](args, scope, depth);
      return;
    }

    // Compile registers and store on the stack
    args.map((arg, i) => this.compileExpression(arg, scope, depth));
  }

Then we will check that function is defined and call it.

  compileCall(fun, args, scope, depth) {
    if (this.primitiveFunctions[fun]) {
      this.primitiveFunctions[fun](args, scope, depth);
      return;
    }

    // Compile registers and store on the stack
    args.map((arg, i) => this.compileExpression(arg, scope, depth));

    const fn = scope.lookup(fun);
    if (fn) {
      this.emit(depth, `CALL ${fn.name}`);
    } else {
      throw new Error('Attempt to call undefined function: ' + fun);
    }
  }

Then we'll reset the stack pointer (to maintain our commitment) based on the number of arguments and push RAX (where the return result of the function call will be stored) onto the stack. We'll make two minor optimizations for when there is exactly zero or one argument to the function.

  compileCall(fun, args, scope, depth) {
    if (this.primitiveFunctions[fun]) {
      this.primitiveFunctions[fun](args, scope, depth);
      return;
    }

    // Compile registers and store on the stack
    args.map((arg, i) => this.compileExpression(arg, scope, depth));

    const fn = scope.lookup(fun);
    if (fn) {
      this.emit(depth, `CALL ${fn.name}`);
    } else {
      throw new Error('Attempt to call undefined function: ' + fun);
    }

    if (args.length > 1) {
      // Drop the args
      this.emit(depth, `ADD RSP, ${args.length * 8}`);
    }

    if (args.length === 1) {
      this.emit(depth, `MOV [RSP], RAX\n`);
    } else {
      this.emit(depth, 'PUSH RAX\n');
    }
  }

When there is only one argument, we can just set the top value on the stack to be the return result of the call rather than resetting the stack pointer just to push onto it.

And that's it for compileCall! Now that we've got a feel for expressions and function calls, let's add some simple arithmetic operations.

prepareArithmeticWrappers

There are two major kind of arithmetic instructions we'll wrap for now:

  • "General" instructions that operate on two arguments, putting the return result in the first argument
  • "RAX" instructions that operate on RAX and the first argument, putting the return result in RAX and possibly RDX

prepareGeneral

This helper will compile its two arguments and pop the second argument into RAX. This is because x86 instructions typically require one argument to be a register if one argument is allowed to be a memory address.

We'll use the stack address as the first argument so 1) that non-commutative operations are correct and 2) the result is stored right back onto the stack in the right location.

    const prepareGeneral = (instruction) => (arg, scope, depth) => {
      depth++;
      this.emit(depth, `# ${instruction.toUpperCase()}`);

      // Compile first argument
      this.compileExpression(arg[0], scope, depth);

      // Compile second argument
      this.compileExpression(arg[1], scope, depth);
      this.emit(depth, `POP RAX`);

      // Compile operation
      this.emit(depth, `${instruction.toUpperCase()} [RSP], RAX`);

      this.emit(depth, `# End ${instruction.toUpperCase()}`);
    };

prepareRax

This helper will similarly compile its two arguments and pop the second argument into RAX. But the RAX-implicit instructions require the argument to be stored in a register so we'll use the XCHG instruction to swap RAX with the value on the top of the stack (the first argument).

    const prepareRax = (instruction, outRegister = 'RAX') => (
      arg,
      scope,
      depth,
    ) => {
      depth++;
      this.emit(depth, `# ${instruction.toUpperCase()}`);

      // Compile first argument, store in RAX
      this.compileExpression(arg[0], scope, depth);

      // Compile second argument
      this.compileExpression(arg[1], scope, depth);

      // POP second argument and swap with first
      this.emit(depth, `POP RAX`);
      this.emit(depth, `XCHG [RSP], RAX`);

This may seem roundabout but remember that we must pop all arguments to the instruction to maintain our commitment.

Next we'll zero out the RDX register if the operation is DIV, perform the operation, and store the result on the top of the stack.

    const prepareRax = (instruction, outRegister = 'RAX') => (
      arg,
      scope,
      depth,
    ) => {
      depth++;
      this.emit(depth, `# ${instruction.toUpperCase()}`);

      // Compile first argument, store in RAX
      this.compileExpression(arg[0], scope, depth);

      // Compile second argument
      this.compileExpression(arg[1], scope, depth);

      // POP second argument and swap with first
      this.emit(depth, `POP RAX`);
      this.emit(depth, `XCHG [RSP], RAX`);

      // Reset RDX for DIV
      if (instruction.toUpperCase() === 'DIV') {
        this.emit(depth, `XOR RDX, RDX`);
      }

      // Compiler operation
      this.emit(depth, `${instruction.toUpperCase()} QWORD PTR [RSP]`);

      // Swap the top of the stack
      this.emit(depth, `MOV [RSP], ${outRegister}`);
    };

We parameterize the out register because the % wrapper will call DIV but need RDX rather than RAX after the operation.

prepareArithmeticWrappers

Putting everything together we get:

  prepareArithmeticWrappers() {
    // General operatations
    const prepareGeneral = (instruction) => (arg, scope, depth) => {
      depth++;
      this.emit(depth, `# ${instruction.toUpperCase()}`);

      // Compile first argument
      this.compileExpression(arg[0], scope, depth);

      // Compile second argument
      this.compileExpression(arg[1], scope, depth);
      this.emit(depth, `POP RAX`);

      // Compile operation
      this.emit(depth, `${instruction.toUpperCase()} [RSP], RAX`);

      this.emit(depth, `# End ${instruction.toUpperCase()}`);
    };

    // Operations that use RAX implicitly
    const prepareRax = (instruction, outRegister = 'RAX') => (
      arg,
      scope,
      depth,
    ) => {
      depth++;
      this.emit(depth, `# ${instruction.toUpperCase()}`);

      // Compile first argument, store in RAX
      this.compileExpression(arg[0], scope, depth);

      // Compile second argument
      this.compileExpression(arg[1], scope, depth);

      // POP second argument and swap with first
      this.emit(depth, `POP RAX`);
      this.emit(depth, `XCHG [RSP], RAX`);

      // Reset RDX for DIV
      if (instruction.toUpperCase() === 'DIV') {
        this.emit(depth, `XOR RDX, RDX`);
      }

      // Compiler operation
      this.emit(depth, `${instruction.toUpperCase()} QWORD PTR [RSP]`);

      // Swap the top of the stack
      this.emit(depth, `MOV [RSP], ${outRegister}`);
    };

    return {
      '+': prepareGeneral('add'),
      '-': prepareGeneral('sub'),
      '&': prepareGeneral('and'),
      '|': prepareGeneral('or'),
      '=': prepareGeneral('mov'),
      '*': prepareRax('mul'),
      '/': prepareRax('div'),
      '%': prepareRax('div', 'RDX'),
    };
  }

Next we'll tackle compileBegin and compileDefine.

compileBegin

A begin form is an expression made up of a series of expressions where all expression values are thrown away and the last expression value is the result of the begin form.

To compile this form we will compile each expression passed in and pop from the stack to throw its value away. If the expression is the last in the list we will not pop since it is the result of the begin form.

We will add one exception to this popping logic: if the begin is called from the top-level we will omit the popping.

  compileBegin(body, scope, depth, topLevel = false) {
    body.forEach((expression, i) => {
      this.compileExpression(expression, scope, depth);
      if (!topLevel && i < body.length - 1) {
        this.emit(depth, `POP RAX # Ignore non-final expression`);
      }
    });
  }

That's it for compileBegin!

compileDefine

The prelude for a function definition will add its name to scope, push the current frame pointer (RBP) onto the stack and store the current stack pointer (RSP) as the new frame pointer (RBP).

Remember that we use the frame pointer as a point of reference when setting and getting local and parameter values. It works out entirely by convention.

  compileDefine([name, params, ...body], scope, depth) {
    // Add this function to outer scope
    const safe = scope.assign(name);

    // Copy outer scope so parameter mappings aren't exposed in outer scope.
    const childScope = scope.copy();

    this.emit(0, `${safe}:`);
    this.emit(depth, `PUSH RBP`);
    this.emit(depth, `MOV RBP, RSP\n`);
  }

Next we copy the parameters into local scope at their negative (from the frame pointer) location. In the future we may decide to actually copy in the parameter values into the local stack but for now there's no benefit.

  compileDefine([name, params, ...body], scope, depth) {
    // Add this function to outer scope
    const safe = scope.assign(name);

    // Copy outer scope so parameter mappings aren't exposed in outer scope.
    const childScope = scope.copy();

    this.emit(0, `${safe}:`);
    this.emit(depth, `PUSH RBP`);
    this.emit(depth, `MOV RBP, RSP\n`);

    // Copy params into local scope
    params.forEach((param, i) => {
      childScope.map[param] = -1 * (params.length - i - 1 + 2);
    });
  }

Next we'll compile the body of the function within a begin block.

  compileDefine([name, params, ...body], scope, depth) {
    // Add this function to outer scope
    const safe = scope.assign(name);

    // Copy outer scope so parameter mappings aren't exposed in outer scope.
    const childScope = scope.copy();

    this.emit(0, `${safe}:`);
    this.emit(depth, `PUSH RBP`);
    this.emit(depth, `MOV RBP, RSP\n`);

    // Copy params into local scope
    params.forEach((param, i) => {
      childScope.map[param] = -1 * (params.length - i - 1 + 2);
    });

    // Pass childScope in for reference when body is compiled.
    this.compileBegin(body, childScope, depth);
  }

Then in the postlude we'll pop the stack (for the return result of the begin form), save it in RAX, pop the previous frame pointer back to restore the calling frame, and return.

  compileDefine([name, params, ...body], scope, depth) {
    // Add this function to outer scope
    const safe = scope.assign(name);

    // Copy outer scope so parameter mappings aren't exposed in outer scope.
    const childScope = scope.copy();

    this.emit(0, `${safe}:`);
    this.emit(depth, `PUSH RBP`);
    this.emit(depth, `MOV RBP, RSP\n`);

    // Copy params into local scope
    params.forEach((param, i) => {
      childScope.map[param] = -1 * (params.length - i - 1 + 2);
    });

    // Pass childScope in for reference when body is compiled.
    this.compileBegin(body, childScope, depth);

    // Save the return value
    this.emit(0, '');
    this.emit(depth, `POP RAX`);
    this.emit(depth, `POP RBP\n`);

    this.emit(depth, 'RET\n');