November 19, 2024
November 15, 2024
Active and influential NYC infrastructure people
These are some of the most influential (mostly due to experience or expertise) and active folks (I actually see them attend events) in the NYC infrastructure scene (that I have a personal connection to).
If you're running a dinner or are just looking to meet interesting people in NYC in software infrastructure, consider this list and feel free to mention "Phil said you are awesome".
I've normalized titles a little bit but I say every title in the most generous way. These folks are brilliant.
This list is intentionally randomized. Also not a complete list. I've surely forgotten (let alone not yet met) great folk.
- Parker Timmerman, developer
- Taq Karim, director of engineering
- Peixian Wang, developer
- Sujay Jayakar, chief scientist
- Paul Dix, ceo
- Angelo Saraceno, developer
- Taylor Baldwin, cto
- Ben Linsay, cto
- Nicholas Ursa, developer
- Sam Gross, developer
- Tramale Turner, vp of engineering
- Justin Jaffray, developer
- Kojo Osei, vc
- Bryan Russett, ceo
- Adrien Guillo, cofounder
- Thiago Ghisi, director of engineering
- Gil Forsyth, developer
- Dan Fried, cto
- David Golden, director of engineering
- Akshat Bubna, cto
- Andrew Werner, cofounder
- Vikram Oberoi, founder
- Sam Kottler, developer
- Jordan Lewis, director of engineering
- Mykola Kurutin, engineering manager
- Paulo Motta, developer
- Priyanka Somrah, vc
- Jimmy Zelinskie, cpo
- Vy Ton, product manager
- John Viega, ceo
- Ben Burkert, cto
- Pete Vilter, developer
- Sean Loiselle, developer
- Rahul Lath, vp of engineering
- Kelley Mak, vc
- Ram Kumar Rengaswamy, cofounder
- Ori Bernstein, consultant
- Mitch Ward, director of engineering
- Philippe Noël, ceo
- Paul Butler, ceo
- Abel Mathew, cofounder
- Andrew Packer, developer
- Matt Sherman, engineering manager
- Sesh Nalla, director of engineering
- Andrei Matei, cofounder
- Ryan Wexler, vc
- Alex Kesling, cto
- Larry Diehl, ceo
- Will Manning, ceo
- Paul Nowoczynski, founder
- Alex Sarkesian, developer
- Megan Reynolds, vc
- Nikhil Benesch, cto
- Saleh Hindi, founder
- Stephanie Wang, developer
- Justin Bennett, cofounder
- Evan Schwartz, developer
- Eric Zhang, developer
November 13, 2024
Executing Dynamic JavaScript Code on Supabase with Edge Functions
November 12, 2024
Grouping and Aggregations on Vitess
November 09, 2024
Fixing some of the InnoDB scan perf regressions in a MySQL fork
I recently learned of Advanced MySQL, a MySQL fork, and ran my sysbench benchmarks for it. It fixed some, but not all, of the regressions for write heavy workloads that landed in InnoDB after MySQL 8.0.28.
In response to my results, the project lead filed a bug for performance regressions and then quickly came up with a diff. The bug in this case is for regressions that are most obvious during full table scans and the problems arrived in MySQL 8.0.29 and 8.0.30 -- see bug 111538 and this post. The bug is closed for upstream but the perf regressions remain so I am excited to see the community working to solve this problem.
tl;dr
- Advanced MySQL with the fix removes much of the regression in scan performance
I tried 4 builds
- my8028 - upstream MySQL 8.0.28
- my8040 - upstream MySQL 8.0.40
- my8040adv_pre - Advanced MySQL 8.0.40 without the fix (without d347cdb)
- my8040adv_post - Advanced MySQL 8.0.40 with the fix (at d347cdb)
- dell32
- Dell Precision 7865 Tower Workstation with 1 socket, 128G RAM, AMD Ryzen Threadripper PRO 5975WX with 32-Cores, 2 m.2 SSD (each 2TB, RAID SW 0, ext4).
- ax162-s
- AMD EPYC 9454P 48-Core Processor with SMT disabled, 128G RAM, Ubuntu 22.04 and ext4 on 2 NVMe devices with SW RAID 1. This is in the Hetzner cloud.
- bee
- Beelink SER 4700u with Ryzen 7 4700u, 16G RAM, Ubuntu 22.04 and ext4 on NVMe
Benchmark
- dell32 - 8 tables, 10M rows per table and 24 threads
- ax162-s - 8 tables, 10M rows per table and 40 threads
- bee - 1 table, 30M rows and 1 thread
- rQPS is: (QPS for my version / QPS for base version)
- base version is the QPS from MySQL 8.0.28
- my version is one of the other versions
- Summary
- QPS with the fix in Advanced MySQL is ~9% better than without the fix
- QPS with the fix in Advanced MySQL is ~2% better than my8040.
- I am not sure why my8040adv_pre did much worse than my8040
- QPS is ~18% larger with the fix in Advanced MySQL
- CPU overhead is ~15% smaller with the fix
- QPS is ~17% larger with the fix in Advanced MySQL
- CPU overhead is ~15% smaller with the fix
RocksDB benchmarks: large server, leveled compaction
I recently shared benchmark results for RocksDB a few weeks ago for both leveled and universal compaction on a small server. This post has results from a large server with leveled compaction.
tl;dr
- there are a few regressions from bug 12038
- QPS for overwrite is ~1.5X to ~2X better in 9.x than 6.0 (ignoring bug 12038)
- otherwise QPS in 9.x is similar to 6.x
Hardware
The server is an ax162-s from Hetzner with an AMD EPYC 9454P processor, 48 cores, AMD SMT disabled and 128G RAM. The OS is Ubuntu 22.04. Storage is 2 NVMe devices with SW RAID 1 and ext4.
- 6.x - 6.0.2, 6.10.4, 6.20.4, 6.29.5
- 7.x - 7.0.4, 7.3.2, 7.6.0, 7.10.2
- 8.x - 8.0.0, 8.3.3, 8.6.7, 8.9.2, 8.11.4
- 9.x - 9.0.1, 9.1.2, 9.2.2, 9.3.2, 9.4.1, 9.5.2, 9.6.1 and 9.7.3
- fillseq -- load in key order with the WAL disabled
- revrangeww -- reverse range while writing, do short reverse range scans as fast as possible while another thread does writes (Put) at a fixed rate
- fwdrangeww -- like revrangeww except do short forward range scans
- readww - like revrangeww except do point queries
- overwrite - do overwrites (Put) as fast as possible
There are three workloads, all of which use 40 threads:
- byrx - the database is cached by RocksDB (100M KV pairs)
- iobuf - the database is larger than memory and RocksDB uses buffered IO (2B KV pairs)
- iodir - the database is larger than memory and RocksDB uses O_DIRECT (2B KV pairs)
- fillseq is worse from 6.0 to 8.0 but stable since then
- overwrite has large improvements late in 6.0 and small improvements since then
- fwdrangeww has small improvements in early 7.0 and is stable since then
- revrangeww and readww are stable from 6.0 through 9.
- bug 12038 explains the drop in throughput for overwrite since 8.6.7
- otherwise QPS in 9.x is similar to 6.0
- the QPS drop for overwrite in 8.6.7 occurs because the db_bench client wasn't updated to use the new default value for compaction readahead size
- QPS for overwrite is ~2X better in 9.x relative to 6.0
- otherwise QPS in 9.x is similar to 6.0
Efficient MySQL Performance In 10 Sentences
Don’t have time to read Efficient MySQL Performance? Here’s the book (10 chapters) in one-liners.
- Performance is query response time.
- Proper left-most indexing is required for performance.
- The less data, the better.
- Access patterns (part of the workload) help or hinder performance.
- Sharding is how to scale writes when single-node performance is truly reached.
- Server metrics reflect how the app workload causes MySQL to work.
- Replication lag is data loss.
- Locks are held until a transaction commits, so commit quickly.
- There are many other challenges that you might need to address—sorry.
- MySQL in the cloud is slower and more expensive, so performance is more important than ever.
PSA: Most databases do not do checksums by default
PSA: SQLite does not do checksums
November 07, 2024
Introducing sharding on PlanetScale with workflows
November 06, 2024
Application Architecture: Combining DynamoDB and Tinybird
RocksDB on a big server: LRU vs hyperclock
This has benchmark results for RocksDB using a big (48-core) server. I ran tests to document the impact of the the block cache type (LRU vs hyperclock) and a few other configuration choices for a CPU-bound workload. A previous post with great results for the hyperclock block cache is here.
tl;dr
- read QPS is up to ~3X better with auto_hyper_clock_cache vs LRU
- read QPS is up to ~1.3X better with the per-level fanout set to 32 vs 8
- read QPS drops by ~15% as the background write rate increases from 2 to 32 M/s
I used RocksDB 9.6, compiled with gcc 11.4.0.
Hardware
The server is an ax162-s from Hetzner with an AMD EPYC 9454P processor, 48 cores, AMD SMT disabled and 128G RAM. The OS is Ubuntu 22.04. Storage is 2 NVMe devices with SW RAID 1 and ext4.
Benchmark
Overviews on how I use db_bench are here and here.
All of my tests here use a CPU-bound workload with a database that is cached by RocksDB and are repeated for 1, 10, 20 and 40 threads.
I focus on the readwhilewriting benchmark where performance is reported for the reads (point queries) while there is a fixed rate for writes done in the background. I prefer to measure read performance when there are concurrent writes because read-only benchmarks with an LSM suffer from non-determinism as the state (shape) of the LSM tree has a large impact on CPU overhead and throughput.
To save time I did not run the fwdrangewhilewriting benchmark. Were I to repeat this work I would include it because the results from it would be interesting for a few of the configuration options I compared.
I did tests to understand the following:
- LRU vs auto_hyper_clock_cache for the block cache implementation
- LRU is the original implementation. The code was simple, which is nice. The implementation for LRU is sharded with a mutex per shard and that mutex can become a hot spot. The hyperclock implementation is much better at avoiding hot spots.
- per level fanout (8 vs 32)
- By per level fanout I mean the value of --max_bytes_for_level_multiplier which determines the target size difference between adjacent levels. By default I use 8, while 10 is also a common choice. Here I compare 8 vs 32. When the fanout is larger the LSM tree has fewer levels -- meaning there are fewer places to check for data which should reduce CPU overhead and increase QPS.
- background write rate
- I repeated tests with the background write rate (--benchmark_write_rate_limit) set to 2, 8 and 32 MB/s. With a higher write rate there is more chance for interference between reads and writes. The interference might be from mutex contention, compaction threads using more CPU, more L0 files to check or more data in levels L1 and larger.
- target size for L0
- By target size I mean the number of files in the L0 that trigger compaction. The db_bench option for this is --level0_file_num_compaction_trigger. When the value is larger there will be more L0 files on average that a query might have to check and that means there is more CPU overhead. Unfortunately, I configured RocksDB incorrectly so I don't have results to share. The issue is that when the L0 is configured to be larger, the L1 should be configured to be at least as large as the L0 (L1 target size should be >= sizeof(SST) * num(L0 files). If not, then L0->L1 compaction will happen sooner than expected.
These graphs have QPS from the readwhilewriting benchmark for the LRU and AHCC block cache implementations where LRU is the original version with a sharded hash table and a mutex per shard while AHCC is the hyper clock cache (--cache_type=auto_hyper_clock_cache).
- QPS is much better with AHCC than LRU (~3.3X faster at 40 threads)
- QPS with AHCC scales linearly with the thread count
- QPS with LRU does not scale linearly and suffers from mutex contention
- There are some odd effects in the results for 1 thread
- QPS is often 1.1X to 1.3X larger with fanout=32 vs fanout=8
With an 8M/s background write rate and LRU, fanout=8 is faster at 1 thread but then fanout=32 is from 1.1X to 1.3X faster at 10 to 40 threads.
With a 32M/s background write rate and LRU, fanout=8 is ~2X faster at 1 thread but then fanout=32 is from 1.1X to 1.2X faster at 10 to 40 threads.
- With LRU
- QPS drops by up to ~15% as the background write rate grows from 2M/s to 32M/s
- QPS does not scale linearly and suffers from mutex contention
- With AHCC
- QPS drops by up to 13% as the background write rate grows from 2M/s to 32M/s
- QPS scales linearly with the thread count
- There are some odd effects in the results for 1 thread
Exploring Postgres's arena allocator by writing an HTTP server from scratch
This is an external post of mine. Click here if you are not redirected.