a curated list of database news from authoritative sources

March 11, 2024

First month on a database team

A little over a month ago, I joined EnterpriseDB on a distributed Postgres product (PGD). The process of onboarding myself has been pretty similar at each company in the last decade, though I think I've gotten better at it. The process is of course influenced by the team, and my coworkers have been excellent. Still, I wanted to share my thought process and personal strategies.

Avoid, at first, what is always challenging

Trickier things at companies are the people, organization, and processes. What code exists? How does it work together? Who owns what? How can I find easy code issues to tackle? How do I know what's important (so I can avoid picking it up and becoming a bottleneck).

But also, in the first few days or weeks you aren't exactly expected to contribute meaningfully to features or bugs. Your sprint contributions are not tracked too closely.

The combination of 1) what to avoid and 2) the sprint-freedom-you-have leads to a few interesting and valuable areas to work on on your own: the build process, tests, running the software, and docs.

But code need not be ignored either. Some frequent areas to get your first code contributions in include user configuration code, error messages, and stale code comments.

What follows are some little 1st day, 1st week, 1st month projects I went through to bootstrap my understanding of the system.

Build process

First off, where is the code and how do you build it? This requires you to have all the relevant dependencies. Much of my work is on a Postgres extension. This meant having a local Postgres development environment, having gcc, gmake (on mac), Perl, and so on. And furthermore, PGD is a pretty mature product so it supports building against multiple Postgres distributions. Can I build against all of them?

The easiest situation is when there are instructions for all of this, linked directly from your main repo. When I started, the instructions did exist but in a variety of places. So over the first week I started collecting all of what I had learned about building the system, with dependencies, across distributions, and with various important flags (debug mode on, asserts enabled, etc.). I finished the first week by writing a little internal blog post called "Hacking on PGD".

I hadn't yet figured out the team processes so I didn't want to bother anyone by trying to get this "blog post" committed anywhere yet as official internal documentation. Maybe there already was a good doc, I just hadn't noticed it yet. So I just published it in a private Confluence page and shared it in the private team slack. If anyone else benefited from it, great! Otherwise, I knew I'd want to refer back to it.

This is an important attitude I think. It can be hard to tell what others will benefit from. If you get into the habit of writing things down internally for your own sake, but making it available internally, it is likely others will benefit from it too. This is something I've learned from years of blogging publicly outside of work.

Moreover, the simple act of writing a good post creates yourself as something of an authority. This is useful for yourself if no one else.

Writing a good post

Let's get distracted here for a second. One of the most important things I think in documentation is documenting not just what does exist but what doesn't. If you had to take a path to get something to work, did you try other paths that didn't work? It can be extremely useful to figure out what exactly is required for something.

Was there a flag that you tried to build with but you didn't try building without it? Well try again without it and make sure it was necessary. Was there some process you executed where the build succeeded but you can't remember if it was actually necessary for the build to succeed?

It's difficult to explain why I think this sort of precision is useful but I'm pretty sure it is. Maybe because it builds the habit of not treating things as magic when you can avoid it. It builds the habit of asking questions (if only to yourself) to understand and not just to get by.

Static analysis? Dynamic analysis?

Going back to builds, another aspect to consider is static and dynamic analysis. Are there special steps to using gdb or valgrind or other analyzers? Are you using them already? Can you get them running locally? Has any of this been documented?

Maybe the answer to all of those is yes, or maybe none of those are relevant but there are likely similar tools for your ecosystem. If analysis tools are relevant and no one has yet explored them, that's another very useful area to explore as a newcomer.

Testing

After I got the builds working, I felt the obvious next step was to run tests. But what tests exist? Are there unit tests? Integration tests? Anything else? Moreover, is there test coverage? I was certain I'd be able to find some low-hanging contributions to make if I could find some files with low test coverage.

Alas, my certainty hit the wall in that there were in fact too many types of integration tests that all do provide coverage already. They just don't all report coverage.

The easiest ways to report coverage (with gcov) were only reporting coverage for certain integration tests that we run locally. There are more integration tests run in cloud environments and getting coverage reports there to merge with my local coverage files would have required more knowledge of people and processes, areas that I wanted not to be forced to think about too quickly.

So coverage wasn't a good route to go. But around this time, I noticed a ticket that asked for a simple change to user configuration code. I was able to make the change pretty quickly and wanted to add tests. We have our own test framework built on top of Postgres's powerful Perl test framework. But it was a little difficult to figure out how to use either of them.

So I copied code from other tests and pared it down until I got the smallest version of test code I could get. This took maybe a day or two of tweaking lines and rerunning tests since I didn't understand everything that was/wasn't required. Also it's Perl and I've never written Perl before so that took a bit of time and ChatGPT. (Arrays, man.)

In the end though I was able to collect my learnings into another internal confluence post just about how to write tests, how to debug tests, how to do common things within tests (for example, ensuring a Postgres log line was outputted), etc. I published this post as well and shared it in the team Slack.

Running

I had PGD built locally and was able to run integration tests locally, but I still hadn't gotten a cluster running. Nor played with the eventual consistency demos I knew we supported. We had a great quickstart that ran through all the manual steps of getting a two-node cluster up. This was a distillation, for devs, of a more elaborate process we give to customers in a production-quality script.

But I was looking for something in between a production-quality script and manually initializing a local cluster. And I also wanted to practice my understanding of our test process. So I ported our quickstart to our integration test framework and made a PR with this new test, eventually merging this into the repo. And I wrote a minimal Python script for bringing up a local cluster. I've got an open PR to add this script to the repo. Maybe I'll learn though that a simple script such as this does already exist, and that's fine!

Docs

The entire time, as I'd been trying to build and test and run PGD, I was trying to understand our terminology and architecture by going through our public docs. I had a lot of questions coming out of this I'd ask in the team channel.

Not to toot my horn but I think it's somewhat of a superpower to be able/willing to ask "dumb questions" in a group setting. That's how I frame it anyway. "Dumb question: what does X mean in this paragraph?" Or, "dumb question: when we say performance improvement because of Y, what is the intuition here?" Because of the time spent here, I was able to make a few more docs contributions as I read through the docs as well.

You have to balance where you ask your dumb questions though. Asking dumb questions to one person doesn't benefit the team. But asking dumb questions in too wide a group is sometimes bad politics. Asking "dumb questions" in front of your team seems to have the best bang for buck.

But maybe the more important contributions were, when I got more comfortable with the team, proposing to merge my personal, internal Confluence blog posts into the repo as docs. I think in a number of cases, what I wrote about indeed hadn't been concisely collected before and thus was useful to have as team documentation.

Even more challenging was trying to distill (a chunk of) the internal architecture. Only after following many varied internal docs and videos, and following through numerous code paths, was I able to propose an architecture diagram outlining major components and communication between them, with their differing formats (WAL records, internal enums, etc.) and means of communication (RPC, shared memory, etc.). This architecture diagram is still in review and may be totally off. But it's already helped at least me think about the system.

In most cases this was all information that the team had already written or explained but just bringing it together and summarizing provided a different useful perspective I think. Even if none of the docs got merged it still helped to build my own understanding.

Beyond the repo

Learning the project is just one aspect of onboarding. Beyond that I join the #cats channel, the #dogs channel, found some fellow New Yorkers and opened a NYC channel, and tried to find Zoom-time with the various people I'd see hanging around common team Slack channels. Trying to meet not just devs but support folk, product managers, marketing folk, sales folk, and anyone else!

Walking the line between scouring our docs and GitHub and Confluence and Jira on my own, and bugging people with my incessant questions.

I've enjoyed my time at startups. I've been a dev, a manager, a founder, a cofounder. But I'm incredibly excited to be back, at a bigger company, full-time as a developer hacking on a database!

And what about you? What do you do to onboard yourself at a new company or new project?

Iterating terabyte-sized ClickHouse®️ tables in production

ClickHouse schema migrations can be challenging even on batch systems. But when you're streaming 100s of MB/s, it's a whole different beast. Here's how we make schema changes on a large ClickHouse table deployed across many clusters while streaming… without missing a bit.

March 08, 2024

Lightweight Migrations

Patch all of your data in your database table with the bulk edit feature on the Convex dashboard, without writing migration code.

March 06, 2024

PlanetScale forever

PlanetScale is committed to providing a reliable and sustainable platform for our customers, not just in the short-term, but forever. For this reason, we are prioritizing profitability.

Announcing Vitess 19

Announcing Vitess 19 # We're thrilled to announce the release of Vitess 19, our latest version packed with enhancements aimed at improving scalability, performance, and usability of your database systems. With this release, we continue our commitment to providing a powerful, scalable, and reliable database clustering solution for MySQL. What's New in Vitess 19 # Dropping Support for MySQL 5.7: As Oracle has marked MySQL 5.7 end of life in October 2023, we're also moving forward by dropping support for MySQL 5.

March 03, 2024

March 02, 2024

Learning C

Some resources for learning C

March 01, 2024

February 29, 2024

Ecobee Settings for Heat Pumps with Resistive Aux Heat

I’m in the process of replacing a old radiator system with a centrally-ducted, air-source heat pump system with electric resistive backup heat. I’ve found that the default ecobee algorithm seems to behave surprisingly poorly for this system, and wanted to write up some of the settings that I’ve found yield better behavior.

A disclaimer. I’m not an HVAC professional. I have two decades in software operations, a background in physics, and far too much experience inferring system dynamics from timeseries graphs. This advice may void your warranty, burn your house down, etc.; everything you do is at your own risk.

The System

First, a bit about the system in question. You can skip this section if you know about heat pumps, short cycling, staging, etc.

There are two main subsystems: a heat pump and an air handler. The heat pump sits outside: it has a fan which moves outside air over a heat exchanger, and a compressor, which compresses a working fluid. The working fluid is connected in a loop to the air handler, where it runs through another heat exchanger to heat or cool the inside air. The air handler also has a blower fan which circulates air through the whole house. If the heat pump can’t keep up with demand, the air handler also has a pair of resistive electric heating coils, called aux heat, which can supplement or take over from the heat pumps.

A few important things to know about heat pumps. First, electric resistive heaters have a Coefficient of Performance (CoP) of essentially 1: they take 1 joule of electricity and turn it into 1 joule of heat in the air. My heat pumps have a typical heating CoP of about 2-4, depending on temperature and load. They take 1 joule of electricity and suck 2 to 4 joules of heat from the outside air into the inside. This means they cost 2-4 times less (in electric opex, at least) than a standard resistive electric heating system.

Second, heat pumps, like A/C systems, shouldn’t start and stop too frequently. Starting up causes large transient electrical and mechanical stresses. Ideally they should run at a low speed for several hours, rather than running at full blast, shutting off, then turning on again ten minutes later. This is called “short cycling”.

Third, the heat pump’s fan, heat pump’s compressor, and the air handler’s fan are all variable-speed: they can run very slow (quiet, efficient), very fast (loud, more powerful), or at any speed in between. This helps reduce short-cycling, as well as improving efficiency and reducing noise. However, directly setting compressor and fan speed requires a special “communicating” thermostat made by the same manufacturer, which speaks a proprietary wire protocol. My manufacturer’s communicating thermostats are very expensive and have a reputation for buggy hardware and software, so I opted to get an ecobee 3 lite. Like essentially every other thermostat on the planet, the ecobee uses ~8 wires with simple binary signals, like “please give me heat” and “please turn on the fan”. It can’t ask for a specific amount of heat.

However, all is not lost. The standard thermostat protocol has a notion of a “two-stage” system—if the Y1 wire is hot, it’s asking for “some heat”, and if Y2 is also hot, it’s asking for “more heat”. My variable-speed heat pump emulates a two-stage system using a hysteresis mechanism. In stage 1, the heat pump offers some nominal low degree of heat. When the thermostat calls for stage 2, it kicks up the air handler blower a notch, and after 20 minutes, it slowly ramps up the heat pump compressor as well. I assume there’s a ramp-down for going back to stage 1. They say this provides “true variable-capacity operation”. You can imagine that the most efficient steady state is where the thermostat toggles rapidly between Y1 and Y2, causing the system to hang out at exactly the right variable speeds for current conditions—but I assume ecobee has some kind of of frequency limiter to avoid damaging systems that actually have two separate stages with distinct startup/shutdown costs.

The air handler’s aux heat is also staged: if the W1 wire is hot, I think (based on staring at the wiring diagram and air handler itself) it just energizes one of two coils. If W2 is also hot, it energizes both. I think this is good: we want to use as much of the heat pump heat as possible, and if we can get away with juuuust a little aux heat, instead of going full blast, that’ll save energy.

In short: aux heat is 2-4x more expensive than heat pump heat; we want to use as little aux as possible. Short-cycling is bad: we want long cycle times. For maximum efficiency, we want both the heat pump and aux heat to be able to toggle between stage 1 and 2 depending on demand.

Automatic Problems

I initially left the ecobee at its automatic default settings for a few weeks; it’s supposed to learn the house dynamics and adapt. I noticed several problems. Presumably this behavior depends on weather, building thermal properties, HVAC dynamics, and however ecobee’s tuned their algorithm last week, so YMMV: check your system and see how it looks.

It’s kind of buried, but ecobee offers a really nice time-series visualization of thermostat behavior on their web site. There’s also a Home Assistant integration that pulls in data from their API. It’s a pain in the ass to set up (ecobee, there’s no need for this to be so user-hostile), but it does work.

Over the next few weeks I stared obsessively at time-series plots from both ecobee and Home Assistant, and mucked around with ecobee’s settings. Most of what I’ll describe below is configurable in the settings menu on the thermostat: look for “settings”, “installation settings”, “thresholds”.

Reducing Aux Heat

First, the automatics kicked on aux heat a lot. Even in situations where the heat pump would have been perfectly capable of getting up to temp, ecobee would burn aux heat to reach the target temperature (set point) faster.

Part of the problem was that ecobee ships (I assume for safety reasons) with ludicrously high cut-off thresholds for heat pumps. Mine had “compressor min outdoor temperature” of something like 35 degrees, so the heat pump wouldn’t run for most of the winter. The actual minimum temperature of my model is -4, cold-climate heat pumps run down to -20. I lowered mine to -5; the manual says there’s a physical thermostat interlock on the heat pump itself, and I trust that more than the ecobee weather feed anyway.

Second: ecobee seems to prioritize speed over progress: if it’s not getting to the set point fast enough, it’ll burn aux heat to get there sooner. I don’t want this: I’m perfectly happy putting on a jacket. After a bit I worked out that the heat pumps alone can cover the house load down to ~20 degrees or so, and raised “aux heat max outdoor temperature” to 25. If it’s any warmer than that, the system won’t use aux heat.

Reverse Staging

A second weird behavior: once the ecobee called for stage 2, either from the heat pump or aux, it would run in stage 2 until it hit the set point, then shut off the system entirely. Running aux stage 2 costs more energy. Running the heat pump in stage 2 shortens the cycle time: remember, the goal is a low, long running time.

The setting I used to fix this is called “reverse staging”. Ecobee’s documentation says:

Compressor Reverse Staging: Enables the second stage of the compressor near the temperature setpoint.

As far as I can tell this documentation is completely wrong. From watching the graphs, this setting seems to allow the staging state machine to move from stage 2 back to stage 1, rather than forcing it to run in stage 2 until shutting off entirely. It’ll go back up to stage 2 if it needs to, and back down again.

Manual Staging

I couldn’t seem to get ecobee’s automatic staging to drop back to stage 1 heat reliably, or avoid kicking on aux heat when stage 2 heat pump heat would have done fine. I eventually gave up and turned off automatic staging altogether. I went with the delta temperature settings. If the temperature delta between the set point and indoor air is more than 1 degree, it turns on heat pump stage 1. More than two degrees, stage 2. More than four degrees, aux 1. More than five degrees, aux 2. The goal here is to use only as much aux heat as absolutely necessary to supplement the heat pump. I also have aux heat configured to run concurrently with the heat pump: there’s a regime where the heat pump provides useful heat, but not quite enough, and my intuition is that some heat pump heat is cheaper than all aux.

I initially tried the default 0.5 degree delta before engaging the heat pump’s first stage. It turns out that for some temperature regimes this creates rapid cycling: that first-phase heat is enough to heat the house rapidly to the set point, and then there’s nothing to do but shut the system off. The house cools, and the system kicks on again, several times per hour. I raised the delta to 1 degree, which significantly extended the cycle time.

Large Setback with Preheating

A setback is when you lower your thermostat, e.g. while away from home or sleeping. There’s some folk wisdom that heat pumps should run at a constant temperature all the time, rather than have a large setback. As far as I can tell, this is because a properly-sized heat pump system (unlike a gas furnace) doesn’t deliver a ton of excess heat, so it can’t catch up quickly when asked to return to a higher temperature. To compensate, the system might dip into aux heat, and that’s super expensive.

I’m in the US Midwest, where winter temperatures are usually around 15-40 F. I drop from 68 to 60 overnight, and the house can generally coast all night without having to run any HVAC at all. In theory the ecobee should be able to figure out the time required to come back to 68 and start the heat pump early in the morning, but in practice I found it would wait too long, and then the large difference between actual and set temp would trigger aux heat. To avoid this, I added a custom activity in Ecobee’s web interface (I call mine “preheat”), with a temperature of 64. I have my schedule set up with an hour of preheat in the morning, before going to the normal 68. This means there’s less of a delta-T, and the system can heat up entirely using the heat pump.