a curated list of database news from authoritative sources

June 25, 2026

Oracle FDW on Azure Database for PostgreSQL

When migrating from Oracle Database to PostgreSQL, the Oracle Foreign Data Wrapper enables PostgreSQL to connect to Oracle and import data as if it were a local table. This extension is not available on all managed PostgreSQL services. For Azure Database for PostgreSQL, setup is straightforward since it runs community PostgreSQL. The only requirement is that extensions must first be enabled via the azure-specific parameter azure.extensions:

Oracle_FDW extension in PostgreSQL

I can create the extension as the administrator and utilize its features:

postgres=> \du
                                List of roles

   Role name    |                         Attributes
----------------+------------------------------------------------------------
 azure_pg_admin | Cannot login
 azuresu        | Superuser, Create role, Create DB, Replication, Bypass RLS
 franck         | Create role, Create DB, Bypass RLS
 replication    | Replication

postgres=> \connect - franck

I am now connected to the database "postgres" as user "franck".

postgres=> create extension oracle_fdw;

CREATE EXTENSION

postgres=> select oracle_diag();
                                                oracle_diag
------------------------------------------------------------------------------------------------------------
 oracle_fdw 2.8.0, PostgreSQL 18.4, Oracle client 23.26.0.0.0, ORACLE_HOME=/opt/oracle/oracle_instantclient

It's important to understand how PostgreSQL will connect to the Oracle Database: oracle_fdw is installed with the Oracle Instant Client libraries.

An Oracle Database in a lab

I exposed an Oracle database to the internet for lab purposes, so I did not need to configure the network. I set up using the Docker image.


# Start Oracle and expose port 1521 to all interfaces

sudo docker run -d --name ora -p 1521:1521 -e ORACLE_PASSWORD=franck gvenzl/oracle-free:slim

# Open port 1521 in the firewall (you may have to add a rule in the inbound security rules)

sudo firewall-cmd --zone=public --permanent --add-port=1521/tcp
sudo firewall-cmd --reload

# Starting Oracle Database takes time, wait for it to be up

echo "Waiting for Oracle" ; until docker logs ora | 
 grep "Completed: Pluggable database FREEPDB1 opened read write"
do echo -n "." ; sleep 1 ; done

# Load the SCOTT/TIGER schema example

docker exec -it ora bash -c '
TWO_TASK=//localhost:1521/FREEPDB1 sqlplus system/franck @?/rdbms/admin/utlsampl.sql
'

# Allow SCOTT to EXPLAIN PLAN

docker exec -it ora bash -c '
 echo "GRANT SELECT_CATALOG_ROLE TO SCOTT;" |  
 TWO_TASK=//localhost:1521/FREEPDB1 sqlplus system/franck
'

# Show the connection string from internet

echo "You can connect as: SCOTT/TIGER@//$(curl -s ifconfig.me):1521/FREEPDB1"


Foreign Data Wrapper server and user

Now, back to Azure Database for PostgreSQL, I can specify the Oracle Database server via the connection string, as well as the user and password I use for connection:


postgres=> create server oracle_server
  foreign data wrapper oracle_fdw
  options (dbserver '//141.145.212.208:1521/FREEPDB1');

CREATE SERVER

postgres=> create user mapping for current_user
  server oracle_server
  options (user 'SCOTT', password 'TIGER');

CREATE USER MAPPING

I import the tables I need to query:


postgres=> IMPORT FOREIGN SCHEMA "SCOTT" 
           FROM SERVER oracle_server INTO public
;

postgres=> \d

             List of relations

 Schema |   Name   |     Type      | Owner
--------+----------+---------------+--------
 public | bonus    | foreign table | franck
 public | dept     | foreign table | franck
 public | emp      | foreign table | franck
 public | salgrade | foreign table | franck

(4 rows)

postgres=> select * from dept;

 deptno |   dname    |   loc
--------+------------+----------
     10 | ACCOUNTING | NEW YORK
     20 | RESEARCH   | DALLAS
     30 | SALES      | CHICAGO
     40 | OPERATIONS | BOSTON

(4 rows)

I can access the tables as if they were local, my PostgreSQL session being connected to the remote Oracle Database.

Performance and execution plan

The execution plan indicates whether the join has been pushed down, a decision made by the query planner:

postgres=> explain (verbose off)
           select * from dept join emp using(deptno)
;
                                                                                QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=30012.50..40190.00 rows=5000 width=206)
   Hash Cond: (dept.deptno = emp.deptno)
   ->  Foreign Scan on dept  (cost=10000.00..20000.00 rows=1000 width=92)
         Oracle query: SELECT /*62b5e8b0d88d7cfb*/ r1."DEPTNO", r1."DNAME", r1."LOC" FROM "SCOTT"."DEPT" r1
   ->  Hash  (cost=20000.00..20000.00 rows=1000 width=116)
         ->  Foreign Scan on emp  (cost=10000.00..20000.00 rows=1000 width=116)
               Oracle query: SELECT /*d6c3774a98386064*/ r2."EMPNO", r2."ENAME", r2."JOB", r2."MGR", r2."HIREDATE", r2."SAL", r2."COMM", r2."DEPTNO" FROM "SCOTT"."EMP" r2

(7 rows)

postgres=> explain (verbose off)
           select * from dept join emp using(deptno)
           where ename='KING'
;

                                                                                                                                 QUERY PLAN                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=10000.00..10250.00 rows=25 width=206)
   Oracle query: SELECT /*7a68726e19f9c7b7*/ r1."DEPTNO", r1."DNAME", r1."LOC", r2."EMPNO", r2."ENAME", r2."JOB", r2."MGR", r2."HIREDATE", r2."SAL", r2."COMM" FROM ("SCOTT"."DEPT" r1 INNER JOIN "SCOTT"."EMP" r2 ON (r1."DEPTNO" = r2."DEPTNO") AND (r2."ENAME" = 'KING'))
(2 rows)


When using the VERBOSE option, EXPLAIN executes an EXPLAIN PLAN on the remote server. (I granted SELECT_CATALOG_ROLE to prevent the no SELECT privilege on V$SQL error.)

postgres=> explain (verbose on, analyze, costs off)
           select * from dept join emp using(deptno)
           where ename='KING'
;
                                                                                                                                                                                                                                                                                                                                                  QUERY PLAN                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan (actual time=11.621..11.632 rows=1.00 loops=1)
   Output: dept.deptno, dept.dname, dept.loc, emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm
   Oracle query: SELECT /*7a68726e19f9c7b7*/ r1."DEPTNO", r1."DNAME", r1."LOC", r2."EMPNO", r2."ENAME", r2."JOB", r2."MGR", r2."HIREDATE", r2."SAL", r2."COMM" FROM ("SCOTT"."DEPT" r1 INNER JOIN "SCOTT"."EMP" r2 ON (r1."DEPTNO" = r2."DEPTNO") AND (r2."ENAME" = 'KING'))
   Oracle plan: SELECT STATEMENT
   Oracle plan:   HASH JOIN   (condition "R1"."DEPTNO"="R2"."DEPTNO")
   Oracle plan:     NESTED LOOPS
   Oracle plan:       NESTED LOOPS
   Oracle plan:         STATISTICS COLLECTOR
   Oracle plan:           TABLE ACCESS FULL EMP  (filter "R2"."ENAME"='KING')
   Oracle plan:         INDEX UNIQUE SCAN PK_DEPT (condition "R1"."DEPTNO"="R2"."DEPTNO")
   Oracle plan:       TABLE ACCESS BY INDEX ROWID DEPT
   Oracle plan:     TABLE ACCESS FULL DEPT
 Query Identifier: 3032555283609452836
 Planning Time: 35.288 ms
 Execution Time: 11.704 ms

(15 rows)

An EXPLAIN ANALYZE displays the timing of remote calls, which here is approximately 12 milliseconds:

This 12ms latency is expected because my PostgreSQL database is located in Amsterdam (Azure West Europe), while the Oracle Database is in Paris (OCI eu-paris-1).

Network troubleshooting

I can verify the location using the TCP/IP information in the Oracle Database listener.

docker exec -it ora bash -c 'curl -s ipinfo.io ; echo ; echo ; grep "PROGRAM=postgres" /opt/oracle/diag/tnslsnr/*/listener/trace/listener.log | tail -1'

{
  "ip": "141.145.212.208",
  "city": "Paris",
  "region": "Île-de-France",
  "country": "FR",
  "loc": "48.8534,2.3488",
  "org": "AS31898 Oracle Corporation",
  "postal": "75000",
  "timezone": "Europe/Paris",
  "readme": "https://ipinfo.io/missingauth"
}

24-JUN-2026 21:28:23:647 * (CONNECT_DATA=(SERVICE_NAME=FREEPDB1)(CID=(PROGRAM=postgres:?franck?postgres?152.67.90.237?35830??SELECT)(HOST=fea1c65ae6cf)(USER=azuredb))(CONNECTION_ID=VQdZFJxbFlngYwQBIQob2g==))(TARGET_LOCAL_INSTANCE=FREE) * (ADDRESS=(PROTOCOL=tcp)(HOST=51.124.207.22)(PORT=1024)) * establish * FREEPDB1 * 0

The IP address embedded in PROGRAM reflects the PostgreSQL client process name, not Oracle network metadata (I used psql in Zurich). The IP in HOST indicates the PostgreSQL client connecting to the Oracle listener, while the first IP from ipinfo.io shows the location of the Oracle Database host.

A latency of 12ms is expected between these two cloud regions:

If I need to troubleshoot, I can run tcpdump -i any port 1521 -nn on the Oracle Database host:

sudo tcpdump -i any port 1521 -nn -tttt -vvv

dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes

2026-06-24 22:01:44.583852 IP 51.124.207.22.1024 > 10.0.0.59.1521: Flags [S], seq 987172167, win 65535, options [mss 1440,sackOK,TS val 2030652947 ecr 0,nop,wscale 9], length 0
2026-06-24 22:01:44.583952 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [S], seq 987172167, win 65535, options [mss 1440,sackOK,TS val 2030652947 ecr 0,nop,wscale 9], length 0
2026-06-24 22:01:44.583960 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [S], seq 987172167, win 65535, options [mss 1440,sackOK,TS val 2030652947 ecr 0,nop,wscale 9], length 0
2026-06-24 22:01:44.583988 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [S.], seq 944233141, ack 987172168, win 65160, options [mss 1460,sackOK,TS val 2168564814 ecr 2030652947,nop,wscale 7], length 0
2026-06-24 22:01:44.583996 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [S.], seq 944233141, ack 987172168, win 65160, options [mss 1460,sackOK,TS val 2168564814 ecr 2030652947,nop,wscale 7], length 0
2026-06-24 22:01:44.584048 IP 10.0.0.59.1521 > 51.124.207.22.1024: Flags [S.], seq 944233141, ack 987172168, win 65160, options [mss 1460,sackOK,TS val 2168564814 ecr 2030652947,nop,wscale 7], length 0
2026-06-24 22:01:44.602184 IP 51.124.207.22.1024 > 10.0.0.59.1521: Flags [.], ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 0
2026-06-24 22:01:44.602200 IP 51.124.207.22.1024 > 10.0.0.59.1521: Flags [P.], seq 1:75, ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 74
2026-06-24 22:01:44.602201 IP 51.124.207.22.1024 > 10.0.0.59.1521: Flags [P.], seq 75:336, ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 261
2026-06-24 22:01:44.602241 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [.], ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 0
2026-06-24 22:01:44.602249 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [.], ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 0
2026-06-24 22:01:44.602252 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [P.], seq 1:75, ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 74
2026-06-24 22:01:44.602253 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [P.], seq 1:75, ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 74
2026-06-24 22:01:44.602254 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [P.], seq 75:336, ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 261
2026-06-24 22:01:44.602255 IP 51.124.207.22.1024 > 172.17.0.2.1521: Flags [P.], seq 75:336, ack 1, win 128, options [nop,nop,TS val 2030652960 ecr 2168564814], length 261
2026-06-24 22:01:44.602295 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [.], ack 75, win 509, options [nop,nop,TS val 2168564833 ecr 2030652960], length 0
2026-06-24 22:01:44.602303 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [.], ack 75, win 509, options [nop,nop,TS val 2168564833 ecr 2030652960], length 0
2026-06-24 22:01:44.602312 IP 10.0.0.59.1521 > 51.124.207.22.1024: Flags [.], ack 75, win 509, options [nop,nop,TS val 2168564833 ecr 2030652960], length 0
2026-06-24 22:01:44.602435 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [.], ack 336, win 507, options [nop,nop,TS val 2168564833 ecr 2030652960], length 0
2026-06-24 22:01:44.602446 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [.], ack 336, win 507, options [nop,nop,TS val 2168564833 ecr 2030652960], length 0
2026-06-24 22:01:44.602462 IP 10.0.0.59.1521 > 51.124.207.22.1024: Flags [.], ack 336, win 507, options [nop,nop,TS val 2168564833 ecr 2030652960], length 0
2026-06-24 22:01:44.619442 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [P.], seq 1:62, ack 336, win 507, options [nop,nop,TS val 2168564850 ecr 2030652960], length 61
2026-06-24 22:01:44.619470 IP 172.17.0.2.1521 > 51.124.207.22.1024: Flags [P.], seq 1:62, ack 336, win 507, options [nop,nop,TS val 2168564850 ecr 2030652960], length 61
2026-06-24 22:01:44.619493 IP 10.0.0.59.1521 > 51.124.207.22.1024: Flags [P.], seq 1:62, ack 336, win 507, options [nop,nop,TS val 2168564850 ecr 2030652960], length 61
2026-06-24 22:01:44.630863 IP 51.124.207.22.1024 > 10.0.0.59.1521: Flags [.], ack 62, win 128, options [nop,nop,TS val 2030652994 ecr 2168564850], length 0

Duplicate packets originating from 172.17.0.2 are from Docker networking and are captured on both the host and container bridges. This marks the start of a TCP connection from Azure Database for PostgreSQL to Oracle on port 1521, beginning with the TCP handshake. PostgreSQL then sends two TNS packets to Oracle, which acknowledges both. Oracle responds with a 61-byte payload. TCP flags include [S] (connection setup), [S.] (SYN+ACK), [.] (ACK), and [P.] (PSH + ACK). The absence of the U flag indicates Oracle Net isn't using TCP urgent data, resolving previous OOB issues with older clients.

Production connection string

Running the Oracle Database in a lab helps with troubleshooting, but the database you want to migrate is not publicly accessible, and the connection string may be a bit more complex. As the Oracle client is on a managed service, you cannot add files such as tnsnames.ora or an mTLS wallet. I tested with an Oracle Autonomous database.

You need to enable One-Way TLS (encryption without a wallet) by setting Mutual TLS (mTLS) authentication to not required, and add the IP address of the PostgreSQL database to the allow list:

I got this address from the listener log of my previous test.

Then I got the TLS connection string for the Oracle Autonomous database and used it to define the foreign data wrapper server:

create server oracle_autonomous
  foreign data wrapper oracle_fdw
  options (dbserver '(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1521)(host=adb.eu-madrid-1.oraclecloud.com))(connect_data=(service_name=g230b6cc64a62e6_mad_tp.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes)))');

create user mapping for current_user
  server oracle_autonomous
  options (user 'ADMIN', password '4ut0n0m0u5_Password');

After specifying the connection string and credentials, you can read or import tables from Oracle Autonomous into Azure Database for PostgreSQL.

Conclusion

Oracle FDW allows querying an Oracle database from PostgreSQL with minimal setup. On Azure Database for PostgreSQL, the only extra step is to enable the extension. Otherwise, it functions as standard PostgreSQL. This feature is useful for migrations to import or compare data between the two databases. Queries are straightforward, and the planner decides what runs remotely. Performance depends on pushdown to Oracle and network latency. Filters and joins may run remotely or locally, and if pushdown isn’t possible, large result sets may be transferred and joined locally. All this is visible from the execution plan.

Connecting to Oracle can sometimes be challenging. Make sure to run oracle_diag() and verify that both oracle_fdw and the Oracle client are current and compatible with your Oracle database, that the port is opened on the network between the two databases, and that you can connect with a connection string without additional files.

One Postgres cluster, many apps

Yo, I heard you liked databases, so we put logical databases in your database cluster so you can database while you database

June 24, 2026

Build a Spring Boot REST API with Amazon Aurora DSQL

In this post, you learn how to build a Spring Boot REST API that integrates with Aurora DSQL. You’ll configure the Aurora DSQL JDBC Connector for IAM authentication, implement optimistic concurrency control, and run the application across two regional nodes to observe active-active behavior.

5 Lessons at 50

Looking at my peak male physique, and my Keanu Reeves baby face, you would never suspect it, but I recently turned 50. As is the tradition, I thought about writing a post titled "50 Lessons at 50". Unfortunately, I don't have that kind of wisdom. The thing is, I still feel like I'm 18, same age as my son. Turns out this is the secret old guys have been hiding from us all along. You get older on the outside, but inside you still see yourself as the same young lad.

Still, fifty years should count for something. So what did I actually learn? How am I different mentally than my 18-year-old self. Here is my attempt to tally it up. 


1. Caution is warranted

I finally understand my parents. As you age, you accumulate battle scars, and the scars turn into habits. Anything that can go wrong will go wrong. You forget the cooktop on once, and suddenly you check it three times before leaving the house. True story. You stop diving head-first into a pile of leaves, because there might be a rake underneath. Another true story.

But the trap is overcorrecting. Common sense is good, but too much common sense is dangerous. It makes you overly cautious, predictable, and dull, and it talks you out of the leaf pile even when there is no rake. So, it is important to fight that entropy.


2. Realism helps you understand how the world runs

When I was younger I thought everyone was good, smart, rational, and reasonable. I was idealistic, and I was projecting my own motivations onto everyone else. As a professor, I rarely questioned the administrators. Surely everyone was trying to improve education for the students. Not really. In fact, many of the students were not particularly interested in improving their own education either.

This was a disappointing realization, but then it was also liberating. The world gets much easier to understand once you take off the rose-colored glasses and start watching the incentives instead of the stated intentions. You stop being surprised, and you stop being hurt. You get a model of the world that actually predicts the world. Recognizing where the incentives point doesn't obligate you to like them. You can decide to play the game, or refuse it, or even try to change it. 


3. Competence is bliss

I alluded to the Murphy's Law earlier. In 1949, an engineer named Edward Murphy got frustrated after a technician wired the sensors backwards during an Air Force rocket sled experiment. This led to the aphorism "if there is a way to do it wrong, someone will find it". People remember this as a line about bad luck, but it's actually about how easy it is to do things wrong, and how rare it is to do them right.

It took me a couple of decades to fully appreciate that competence is far scarcer than you'd expect. This is both bad news and good news. The bad news is obvious, and all of us are going to experience it many times, at the hands of incompetent parties. The good news is that competence stands out. When you meet genuinely competent people you notice immediately, and you appreciate them. They are reliable. They sweat the details and understand the tradeoffs. You start seeking them out, and when you find them, friendships form fast, because you both know how unusual the thing actually is.


4. Do what you like

I've always wanted an excuse to use the Bell Curve meme, so here it goes. 

Most people dramatically overestimate their ability to predict what will matter in ten years or even five years. The elaborate plans are mostly a waste of time. The best strategy is work on what you like, and what you find interesting and energizing, because curiosity, enthusiasm, and the craft compounds. The safest long-term strategy is often to become exceptionally good at something you genuinely enjoy.

"Find out who you are and do it on purpose." --Dolly Parton


5. Your attitude determines your success

I am not likely to turn into the old guy yelling at the cloud any time soon, as I am aware of the danger of learned pessimism and helplessness. So I care about this lesson more than the others. Your attitude determines your success, and the stories you tell yourself matter. Since our emotions drive our persistence and our willingness to keep going, it pays to keep a constructive narrative about setbacks and challenges.

I aim for cautious optimism. I am realistic about the risks, but still optimistic about the possibilities. I am jaded enough to see how the world works, but still idealistic enough to want to improve it. Years ago I wrote a post called Fool Yourself, about deliberately choosing the narrative that gives you energy and momentum. I still believe that. At 50 I'm a little wiser, certainly more cautious and aware of the challenges ahead, but I am still fooling myself into chasing things I find meaningful.

June 22, 2026

Vector Search with Filters: pgvector vs DiskANN on HorizonDB

In a previous post, I explained that using filtering with pgvector can decrease recall in approximate nearest neighbor (ANN) searches.

This post repeats the same experiment and dataset but compares two methods:

  • pgvector with HNSW, a popular PostgreSQL extension
  • HorizonDB with DiskANN, Microsoft’s vector index

The goal is to understand what happens when similarity search is combined with filtering.

Both setups were tested on a HorizonDB instance with 2 vCores and 16 GiB RAM in Azure public preview, where I activated the extensions by adding them to azure.extensions.

In real applications, SQL statements without a WHERE clause are rare, and typical queries combine a filter alongside a similarity search:

SELECT *
 FROM embeddings
 WHERE tenant_id = 42
 ORDER BY embedding <=> :vector_query
 LIMIT 10;

You don’t search the entire dataset. Instead, you focus on a specific subset, such as one tenant, recent data, or a category.

Experiment Setup

As in the previous post, I inserted 2 million rows of 512‑dimensional vectors with a simple metadata column (color, evenly distributed):

CREATE TABLE embeddings_table (
  id bigserial PRIMARY KEY,
  color text,
  embedding vector(512)
);

I generated random vectors to insert the embeddings, and three values for color:


create function random_embedding(dimensions int) returns vector as
$$
select
    array(
        select random()::real
        from generate_series(1, dimensions)
    )::vector
$$ language sql;

insert into embeddings_table (embedding,color)
 select random_embedding(512)
  ,(array['red', 'green', 'blue'])[generate_series(1, 2000000)%3+1]
;

This uniform distribution is intentional: it isolates the effect of filtering.

pgvector (HNSW)

Here's the HNSW index that I created in my previous post:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE INDEX ON embeddings_table 
 USING hnsw (embedding vector_cosine_ops)
;

It can take some time. Here is the query I use to monitor progress:

SELECT
  command,
  phase,
  CASE WHEN blocks_total > 0 THEN round(100.0 * blocks_done / blocks_total, 1) ELSE 0 END AS blocks_pct,
  CASE WHEN tuples_total > 0 THEN round(100.0 * tuples_done / tuples_total, 1) ELSE 0 END AS tuples_pct,
  relid::regclass AS table_name,
  index_relid::regclass AS index_name
FROM pg_stat_progress_create_index
;

I might have raised the maintenance_work_mem as indicated by the NOTICE after index creation — it took more than one hour.


NOTICE: hnsw graph no longer fits into maintenance_work_mem after 195123 tuples

CREATE INDEX

Total execution time: 01:20:03.861

I generate another embedding for my query, which I'll store as a psql variable with \gset:

select random_embedding(512) as query
\gset

Without index - Seq Scan and Exact Nearest Neighbor Search

I checked the behavior before the index is created, without using a vector index:

--EXPLAIN (ANALYZE, BUFFERS, VERBOSE off, COSTS off)
SELECT id , color, embedding <=> :'query' nn_cosine
 FROM embeddings_table 
 WHERE color = 'red'
 ORDER BY embedding <=> :'query'
 LIMIT 15
;

Without a vector index, pgvector performs an exact nearest neighbors (ENN) search:

postgres=> SELECT id , color, embedding <=> :'query' nn_cosine
 FROM embeddings_table
 WHERE color = 'red'
 ORDER BY embedding <=> :'query'
 LIMIT 15
;

   id    | color |      nn_cosine
---------+-------+---------------------
 1370302 | red   |  0.2639440793770883
 1048027 | red   | 0.23483715071295053
  804529 | red   | 0.24173628441472828
 1013860 | red   | 0.25945607630683887
 1011571 | red   | 0.23552944123421315
 1514713 | red   |  0.2324248794732996
 1411876 | red   | 0.21627272961247324
 1743091 | red   | 0.25835400937319386
  494461 | red   | 0.23541082932461943
 2128600 | red   |  0.2531862056756401
 2118649 | red   | 0.27055299026445434
 2151097 | red   | 0.25835806753276225
 1988278 | red   |  0.2341769026974998
 1259950 | red   | 0.21806901991480443
 1553203 | red   | 0.24358174125683563
(15 rows)

The execution plan indicates that the color = 'red' filter was applied to the scan result, and the distance was computed from the filtered data to sort and return the top-15.


 Limit (actual time=4088.254..4164.226 rows=15 loops=1)
   Buffers: shared hit=5348519
   ->  Gather Merge (actual time=4088.252..4164.222 rows=15 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=5348519
         ->  Sort (actual time=4058.895..4058.897 rows=11 loops=3)
               Sort Key: ((embedding <=> '[0.51098114,0.8720485,0.8568168,...,0.110820755]'::vector))
               Sort Method: top-N heapsort  Memory: 26kB
               Buffers: shared hit=5348519
               Worker 0:  Sort Method: top-N heapsort  Memory: 26kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 26kB
               ->  Parallel Seq Scan on embeddings_table (actual time=0.127..3949.246 rows=222222 loops=3)
                     Filter: (color = 'red'::text)
                     Rows Removed by Filter: 444445
                     Buffers: shared hit=5348445
 Planning Time: 0.078 ms
 Execution Time: 4164.252 ms
(18 rows)

This approach works when the filter is sufficiently selective to enable distance calculations and sorting on a small set of rows. However, with larger or expanding data sets, a quick neighbor search requires an index that can efficiently find neighbors.

With HNSW index - Approximate Nearest Neighbor Index Scan

Once the index is created, the query runs faster, but the result, compared to the previous execution, is incomplete:

postgres=> SELECT id , color, embedding <=> :'query' nn_cosine
 FROM embeddings_table
 WHERE color = 'red'
 ORDER BY embedding <=> :'query'
 LIMIT 15
;
   id    | color |      nn_cosine
---------+-------+---------------------
 1259950 | red   | 0.21806901991480443
  744973 | red   | 0.21565428393281894
  550210 | red   |  0.2085433809181183
 1860280 | red   | 0.24081963623331726
 1794061 | red   | 0.22253304456085543
 1056052 | red   |   0.232418043420476
 1247836 | red   | 0.22905966001837208
 1128376 | red   | 0.24222950580938118
  847906 | red   | 0.24276414827559645
 1782067 | red   |   0.233356563767256
 2023984 | red   | 0.24263831933162294
 1438981 | red   | 0.21360709574533177
(12 rows)

This is what happened:

  1. The HNSW index retrieves a fixed number of candidates
  2. PostgreSQL applies the WHERE filter afterward
  3. Only matching rows are returned, which can result in fewer rows than requested

For example, if the index examines about 40 candidates (ef_search = 40, the default) and the filter matches one third of the rows, we would expect about 13 matching rows on average. This is approximate because the index examines a local region of the vector space, where the metadata distribution may differ from the global distribution.

In this execution plan, 40 candidates were visited (12 returned, 28 filtered out), but only 12 matched the predicate, falling short of the query's LIMIT 15. These candidates are not necessarily the closest neighbors that satisfy the filter. The true nearest neighbors with color = 'red' might lie outside the explored region and were therefore not considered, because the region included other colors.

As a result, fewer than LIMIT rows may be returned, and even those may not be the actual nearest neighbors within the filtered subset. This explains the drop in recall, which results from post-filtering.

In the execution plan, this is evident as the condition on color appears as a Filter applied after the index scan:


 Limit (actual time=2.352..2.496 rows=12 loops=1)
   Buffers: shared hit=1775
   ->  Index Scan using i_hnsw on embeddings_table (actual time=2.351..2.493 rows=12 loops=1)
         Order By: (embedding <=> '[0.51098114,0.8720485,0.8568168,...,0.110820755]'::vector)
         Filter: (color = 'red'::text)
         Rows Removed by Filter: 28
         Buffers: shared hit=1775
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.054 ms
 Execution

The execution plan details show what happened: 12 returned + 28 filtered = 40 candidates, which is ef_search.

HNSW applies filtering after traversal, meaning the graph is searched globally before rows are discarded. This breaks the assumption that the closest neighbors are among the candidates returned by the index, especially when using selective filters, as they often are not.

The mitigations include:

  • Increasing ef_search. For instance, based on our data distribution, setting hnsw.ef_search = 200 could provide enough candidates so that at least 15 remain after filtering.
  • Using partial indexes when filtering on a small set of discrete, known values, so each is indexing a specific subset.
  • Enabling iterative scans to expand the search scope when filtering removes too many candidates. This improves recall but requires multiple scans.

These measures improve recall, though they come with higher costs and require manual calibration because filtering is handled separately from the ANN search.

DiskANN on HorizonDB

Now I create a DiskANN index, designed for filtering during the search:

postgres=> CREATE EXTENSION IF NOT EXISTS pg_diskann;

CREATE EXTENSION

postgres=> CREATE INDEX i_diskann ON embeddings_table
           USING diskann (embedding vector_cosine_ops)
;

CREATE INDEX

Total execution time: 00:42:40.800

In HorizonDB documentation, it is explained that DiskANN enables advanced filtering for combined vector and metadata queries. Depending on selectivity and planner estimates, it can apply filters during traversal or fall back to post-filtering

With DiskANN index - Approximate Nearest Neighbor Index Scan

After creating the index, I run the same query:

postgres=> SELECT id , color, embedding <=> :'query' nn_cosine
 FROM embeddings_table
 WHERE color = 'red'
 ORDER BY embedding <=> :'query'
 LIMIT 15
;

NOTICE:  pg_diskann: Filter selectivity too high (0.3325), skipping filtered vector scan

   id    | color |      nn_cosine
---------+-------+---------------------
 1968337 | red   | 0.24740531343440575
  545257 | red   | 0.22958854707242105
  828484 | red   | 0.24054936837179086
  744973 | red   | 0.21565428393281894
  307819 | red   | 0.23955190885002464
  557251 | red   |  0.2158213914722017
 1860280 | red   | 0.24081963623331726
 1794061 | red   | 0.22253304456085543
 1048324 | red   | 0.24138402291179228
  494461 | red   | 0.23541082932461943
  851149 | red   | 0.26081572380948004
  832813 | red   |  0.2240000332741019
  745516 | red   | 0.24130828891293765
  233791 | red   |  0.2324244758968873
 1247836 | red   | 0.22905966001837208
(15 rows)

The note is interesting: pre-filtering was not used. Still, the result is complete because DiskANN dynamically adapts the number of candidates explored.

This means the filter was not applied during traversal, so the query falls back to post-filtering, as with pgvector. However, unlike HNSW, the number of candidates is not fixed: the engine explores more vectors until enough rows are returned.

In the execution plan, we see that 42 rows were filtered out and 15 were returned, meaning about 57 candidates were explored:


 Limit (actual time=6.607..10.821 rows=15 loops=1)
   Buffers: shared hit=5772
   ->  Index Scan using i_diskann on embeddings_table (actual time=6.606..10.817 rows=15 loops=1)
         Order By: (embedding <=> '[0.51098114,0.8720485,0.8568168,0.3119626,...,0.110820755]'::vector)
         Filter: (color = 'red'::text)
         Rows Removed by Filter: 42
         Buffers: shared hit=5772
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.104 ms
 Execution Time: 10.893 ms
(11 rows)

By varying the LIMIT, we can see that even without pre-filtering, the results are complete and remain fast for common top-n queries:

LIMIT Rows Removed by Filter rows= Buffers Execution Time
1 1 1 3779 4.215 ms
2 1 2 3790 6.431 ms
5 22 5 4613 8.153 ms
10 29 10 5310 9.841 ms
15 42 15 5722 10.893 ms
100 222 100 15364 39.989 ms
1000 1879 1000 91585 826.303 ms
10000 20202 10000 602407 27827.608 ms

This distinction is crucial: pgvector uses a fixed search budget (ef_search), whereas DiskANN adjusts the number of explored candidates. Even with post-filtering, it can provide enough rows without manual tuning and without incremental scans.

Highly selective query - Exact Nearest Neighbor

If the filter is highly selective, there is no need to use an approximate search:

postgres=> SELECT id , color, embedding <=> :'query' nn_cosine
 FROM embeddings_table
 WHERE color = 'red' and id = 494461
 ORDER BY embedding <=> :'query'
 LIMIT 15
;

NOTICE:  pg_diskann: Filter selectivity too low (0.0000), using filter-only scan

   id   | color |      nn_cosine
--------+-------+---------------------
 494461 | red   | 0.23541082932461943
(1 row)

The execution plan shows that it didn't use the vector index but instead used the B-tree index on id with an additional Sort for exact nearest-neighbor ordering:


 Limit (actual time=0.038..0.039 rows=1 loops=1)
   Buffers: shared hit=12
   ->  Sort (actual time=0.037..0.038 rows=1 loops=1)
         Sort Key: ((embedding <=> '[0.51098114,0.8720485,0.8568168,0.311962,...,0.110820755]'::vector))
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=12
         ->  Index Scan using embeddings_table_pkey on embeddings_table (actual time=0.033..0.034 rows=1 loops=1)
               Index Cond: (id = 494461)
               Filter: (color = 'red'::text)
               Buffers: shared hit=12
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.140 ms
 Execution Time: 0.053 ms
(14 rows)

DiskANN doesn’t even participate when the filtering is highly selective. The planner uses a B-tree index and bypasses ANN.

With DiskANN index - Pre-filtered Approximate Nearest Neighbor

When filters are selective enough to be applied during the scan but not enough to avoid ANN search, there's no NOTICE:

postgres=> SELECT id , color, embedding <=> :'query' nn_cosine
 FROM embeddings_table
 WHERE color = 'red' and id <= 494461
 ORDER BY embedding <=> :'query'
 LIMIT 15
;

   id   | color |      nn_cosine
--------+-------+---------------------
 418054 | red   | 0.21522295998337881
 233791 | red   |  0.2324244758968873
 377470 | red   | 0.23508242287803505
 196009 | red   |  0.2356772244494152
 225253 | red   | 0.24119501226879858
 233026 | red   | 0.23055577367239388
 308998 | red   | 0.21652541614338927
 291283 | red   |  0.2319492505072145
 459442 | red   |  0.2448930593940325
 296956 | red   | 0.21969539585363562
 223876 | red   | 0.24664282569002072
 385669 | red   | 0.21339531139470636
 390397 | red   |  0.2484855378229266
 296467 | red   | 0.22426404565684388
 220285 | red   |  0.2179154018451408
(15 rows)

This is where the execution plan displays a Custom Scan (DiskANNFilteredScan), enabled by PostgreSQL's extensibility. It performs pre-filtering (Filter(IndexScan)) and Approximate Nearest Neighbor search (Vector) to yield the precise number of candidates in the filtered set (Rows Retrieved: 15 count):


                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Limit (actual time=25.842..28.876 rows=15 loops=1)
   Buffers: shared hit=11905
   ->  Custom Scan (DiskANNFilteredScan) (actual time=25.841..28.872 rows=15 loops=1)
         Strategy: Filter(IndexScan) -> Vector
         Rows Retrieved: 15 count
         TIDs Collected: 303062 count
         Buffers: shared hit=11905
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.091 ms
 Execution Time: 28.927 ms
(11 rows)

This plan is an ANN with pre-filtering, effective when both the column filter and the embedding filter are selective and need to be combined for efficient execution. The optimizer makes this decision dynamically.

TIDs Collected indicates the number of rows matching the indexable portion of the filter prior to the vector search. According to the cardinalities, the id condition was used for prefiltering:

postgres=> select count(*) FROM embeddings_table 
           WHERE id <= 494461;
 count
--------
 303063

This represents a selectivity of about 15% for the pre-filtered set. The remaining condition (color = 'red') is applied afterward, reducing the result further to about 5% of the table.

Why this matters

These experiments reveal a different execution model, not just performance differences. With pgvector and HNSW, search is drive... (truncated)

June 20, 2026

Our Collective Bike Shed Moment

In 1957, Parkinson postulated his "Law of Triviality" using a fictitious committee reviewing plans for a nuclear power plant. The reactor design gets 10 minutes because nobody understands it, so nobody argues. The bike shed gets 45 minutes because everyone has opinions about the paint color.

I feel like we are living this committee meeting at scale every day.

LLMs are already better engineers than most of us. They are better at formal methods, and better at reasoning under pressure than most people. They run at incredible speed and don't get tired. They improve continuously. But, some people keep moving the goalposts on LLMs. First they said LLMs couldn't code. Then they said they hallucinated too much. All of these barriers fell, but some people are still scoffing at these systems. What chutzpah!

If aliens landed in Central Park tomorrow, I don't think the reaction would be that different. With AI, an alien form of intelligence has already arrived in our laps, and we gave it a collective shrug and kept scrolling.

Psychologists call this the normalcy bias. We tend to assume things will continue roughly as they have, even when confronted with something that requires revising our picture of the future.

Douglas Adams introduces an adjacent phenomena, called the Somebody Else's Problem (SEP) field. The SEP field does not try to make something invisible directly, rather it makes your brain classify it as somebody else's problem, so it gets actively skipped. An alien spacecraft hovers over a crowded park unnoticed, because everyone has silently agreed it's someone else's problem.

The individual adaptation strategy makes sense of course. Learn the tools, and stay ahead of the curve to save your own ship. Why not? Stay afloat the next couple years. But what about the coming decades?

I am not saying everyone should engage with everything. I know about the circles of control, influence, and concern. But, unfortunately, even the people who should be debating this (technologists, researchers, economists, policymakers, ethicists, military planners, educators) are not engaging with this problem at the depth it deserves.

We are having the bike shed debate about the reactor, and unfortunately the reactor design doesn't get enough attention.

CPU-bound sysbench on a large server: Postgres 12 to 19 beta1

This has results from sysbench on a small server with Postgres versions 12 through 19 beta1. Sysbench is run with high concurrency (40 connections) and a cached database. The purpose is to search for changes in performance.

Postgres remains boring, it is hard to find performance regressions.

tl;dr for Postgres 17 to 19

  • there are no regressions
  • throughput on the read-only-count test improves by ~3X in 19 beta1 thanks to a better query plan

tl;dr for Postgres 12 to 19

  • there are few regressions, throughput might have dropped by up to 5% on a few range query tests
  • there are a few large improvements for read-only tests
  • there are many large improvements for write-heavy tests

Builds, configuration and hardware

I compiled Postgres from source for versions 12.22, 13.23, 14.23, 15.18, 16.14, 17.10, 18.4 and 19 beta1.

I used a 48-core server from Hetzner
  • an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
  • 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
  • 128G RAM
  • Ubuntu 24.04
Configuration files for Postgres:
  • the config file is named conf.diff.cx10a_c32r128 (x10a_c32r128) and is here for versions 12, 13, 14, 15, 16 and 17.
  • for Postgres 18 and 19 I used conf.diff.cx10b_c32r128 (x10b_c32r128) which is as close as possible to the Postgres 17 config and uses io_method=sync
Benchmark

I used sysbench and my usage is explained here. I now run 32 of the 42 microbenchmarks listed in that blog post. Most test only one type of SQL statement. Benchmarks are run with the database cached by Postgres.

The read-heavy microbenchmarks are run for 600 seconds and the write-heavy for 1200 seconds. The benchmark is run with 40 clients and 8 tables with 10M rows per table. The database is cached.

The purpose is to search for regressions from new CPU overhead and mutex contention. I use the small server with low concurrency to find regressions from new CPU overheads and then larger servers with high concurrency to find regressions from new CPU overheads and mutex contention.

The tests can be called microbenchmarks. They are very synthetic. But microbenchmarks also make it easy to understand which types of SQL statements have great or lousy performance. Performance testing benefits from a variety of workloads -- both more and less synthetic.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide charts below with relative QPS (rQPS). The relative QPS is the following:
(QPS for some version) / (QPS for base version)
When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. Values from iostat and vmstat divided by QPS are also provided here. These can help to explain why something is faster or slower because it shows how much HW is used per request.

Here, base version is either Postgres 12.23 or 17.10 and some version is a more recent version. I use 12.23 as the base version to identify regressions over a long period of time. And then I use 17.10 as the base version to confirm there aren't recent, large regressions.

I describe performance changes (changes to relative QPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.

Results: point queries, version 17 to 19

Summary:
  • there are no regressins
Relative to: PG 17.10
col-1 : PG 18.4
col-2 : PG 19 beta1

col-1   col-2
1.00    0.99    hot-points
1.01    1.01    point-query
1.00    1.00    points-covered-pk
0.98    0.99    points-covered-si
1.01    1.00    points-notcovered-pk
1.00    1.00    points-notcovered-si
1.01    1.01    random-points_range=10
1.02    1.00    random-points_range=100
1.00    1.00    random-points_range=1000

Results: point queries, version 12 to 19

Summary
  • there are no regressions
  • throughput for the hot-points test improves by ~2X in versions 17.10, 18.4 and 19beta
Relative to: PG 12.22
col-1 : PG 13.23
col-2 : PG 14.23
col-3 : PG 15.18
col-4 : PG 16.14
col-5 : PG 17.10
col-6 : PG 18.4
col-7 : PG 19 beta1

col-1   col-2   col-3   col-4   col-5   col-6   col-7
1.00    0.90    0.97    1.03    2.34    2.35    2.31    hot-points
1.00    1.01    1.03    1.04    1.03    1.04    1.03    point-query
1.02    1.04    1.04    1.07    1.04    1.04    1.04    points-covered-pk
1.01    1.07    1.04    1.04    1.04    1.03    1.04    points-covered-si
0.98    1.01    1.03    1.02    1.00    1.01    1.00    points-notcovered-pk
0.99    1.03    1.03    1.01    1.02    1.02    1.01    points-notcovered-si
0.99    1.01    1.03    1.03    1.00    1.01    1.01    random-points_range=10
0.99    1.02    1.04    1.04    1.01    1.03    1.01    random-points_range=100
1.00    1.02    1.02    1.03    1.01    1.02    1.01    random-points_range=1000

Results: range queries without aggregation, version 17 to 19

Summary
  • there are no regressions
  • while 19 beta1 has a better result on the scan test, that test has more variance with Postgres so I am reluctant to judge this without more results
Relative to: PG 17.10
col-1 : PG 18.4
col-2 : PG 19 beta1

col-1   col-2
0.98    0.99    range-covered-pk
0.97    0.99    range-covered-si
0.99    0.99    range-notcovered-pk
1.02    1.01    range-notcovered-si
0.96    1.07    scan

Results: range queries without aggregation, version 12 to 19

Summary
  • there are no regressions
  • scan throughput has improved a lot from version 12 to 19
Relative to: PG 12.22
col-1 : PG 13.23
col-2 : PG 14.23
col-3 : PG 15.18
col-4 : PG 16.14
col-5 : PG 17.10
col-6 : PG 18.4
col-7 : PG 19 beta1

col-1   col-2   col-3   col-4   col-5   col-6   col-7
0.99    1.03    1.04    1.04    1.03    1.00    1.02    range-covered-pk
0.99    1.04    1.04    1.04    1.03    1.00    1.03    range-covered-si
1.00    1.00    1.00    0.99    1.00    0.99    0.99    range-notcovered-pk
1.00    1.01    1.01    0.99    1.00    1.02    1.01    range-notcovered-si
1.09    1.27    1.10    1.21    1.19    1.14    1.28    scan

Results: range queries with aggregation, version 17 to 19

Summary
  • there are no regressions
  • throughput on the read-only-count test is ~3X better thanks to a new query plan. This improvement was also visible on my small server
Relative to: PG 17.10
col-1 : PG 18.4
col-2 : PG 19 beta1

col-1   col-2
1.03    3.30    read-only-count
1.02    0.99    read-only-distinct
1.00    0.97    read-only-order
0.99    0.99    read-only_range=10
0.99    0.99    read-only_range=100
1.01    1.00    read-only_range=10000
1.03    1.01    read-only-simple
1.03    1.01    read-only-sum

Results: range queries with aggregation, version 12 to 19

Summary
  • there might be a few small regressions, but losing 5% throughput from version 12 to 19 isn't a big deal
  • throughput on the read-only-count test is ~3X better thanks to a new query plan. This improvement was also visible on my small server
Relative to: PG 12.22
col-1 : PG 13.23
col-2 : PG 14.23
col-3 : PG 15.18
col-4 : PG 16.14
col-5 : PG 17.10
col-6 : PG 18.4
col-7 : PG 19 beta1

col-1   col-2   col-3   col-4   col-5   col-6   col-7
1.01    0.95    0.96    0.97    0.93    0.95    3.06    read-only-count
1.00    0.98    0.98    0.98    0.96    0.98    0.95    read-only-distinct
1.00    0.98    0.98    1.00    0.99    0.99    0.97    read-only-order
0.99    1.00    1.01    1.00    1.01    0.99    1.00    read-only_range=10
0.99    1.00    1.00    1.00    1.01    1.00    0.99    read-only_range=100
1.00    0.97    1.02    1.03    1.04    1.05    1.03    read-only_range=10000
1.00    0.97    0.99    0.97    0.95    0.98    0.96    read-only-simple
1.00    0.96    0.97    0.97    0.94    0.97    0.95    read-only-sum

Results: writes, version 17 to 19

Summary
  • there are no regressions
Relative to: PG 17.10
col-1 : PG 18.4
col-2 : PG 19 beta1

col-1   col-2
0.99    0.99    delete
1.02    1.02    insert
1.00    0.98    read-write_range=10
0.99    0.99    read-write_range=100
1.01    1.03    update-index
1.01    0.98    update-inlist
0.98    1.01    update-nonindex
1.01    1.03    update-one
1.00    1.00    update-zipf
0.97    0.99    write-only

Results: writes, version 12 to 19

Summary
  • there are no regressions
  • many large improvements arrived in version 17 and remain in 19 beta1
Relative to: PG 12.22
col-1 : PG 13.23
col-2 : PG 14.23
col-3 : PG 15.18
col-4 : PG 16.14
col-5 : PG 17.10
col-6 : PG 18.4
col-7 : PG 19 beta1

col-1   col-2   col-3   col-4   col-5   col-6   col-7
0.99    1.11    1.13    1.10    1.28    1.27    1.27    delete
1.02    1.17    1.16    1.19    1.23    1.25    1.25    insert
1.00    1.20    1.22    1.20    1.24    1.24    1.22    read-write_range=10
0.99    1.04    1.05    1.04    1.06    1.05    1.04    read-write_range=100
0.98    1.08    1.05    0.94    1.84    1.85    1.90    update-index
1.00    1.07    1.06    1.05    1.12    1.13    1.10    update-inlist
1.01    1.07    1.07    0.86    1.87    1.84    1.88    update-nonindex
1.04    0.96    0.96    1.10    1.39    1.41    1.43    update-one
1.01    1.05    1.07    0.96    1.63    1.62    1.63    update-zipf
0.99    1.11    1.13    1.09    1.41    1.37    1.40    write-only

June 17, 2026

PostgreSQL 18 on Amazon Aurora and Amazon RDS: Performance enhancements

This is Part 1 of a two-part series covering the key features in PostgreSQL 18. In this post, we focus on performance enhancements: skip scan optimization for multicolumn indexes, enhanced EXPLAIN output, automatic removal of unnecessary self-joins, and several vacuum and autovacuum improvements that help keep your database running efficiently.

PostgreSQL 18 on Amazon Aurora and Amazon RDS: Security, monitoring, and developer enhancements

In Part 1 of this series, we explored the performance enhancements in PostgreSQL 18, including skip scan optimization, enhanced EXPLAIN output, automatic self-join removal, and vacuum/autovacuum improvements. In this second part, we focus on security, monitoring, developer productivity, and logical replication enhancements that improve operational efficiency and the overall developer experience.

HorizonDB cache hierarchy: RAM, NVMe SSD, and multi-AZ storage behind PostgreSQL

HorizonDB extends PostgreSQL with disaggregated storage, making it appear to developers like PostgreSQL while offering cloud-native high availability, elasticity, and performance. Performance metrics, especially I/O, can reveal differences due to additional components such as local caches and remote storage along the read/write paths.

Three aspects can expose these internal workings at the PostgreSQL layer:

  • Cache utilization, observable via EXPLAIN BUFFERS, comparing shared buffer hits with read calls to retrieve pages.

  • Average read times from EXPLAIN or metrics help infer the storage layer: less than a microsecond for memory, tens of microseconds for local NVMe SSD cache, and about a millisecond for remote persistent storage.

  • Wait events, as I/O operations handled outside PostgreSQL's main code path, in extensions, may show different patterns in pg_stat_activity compared to traditional PostgreSQL setups.

This approach is similar to tests I previously performed on OCI and AWS. I insert 100 MB of data (~12,800 rows) into a PostgreSQL table every two minutes and run full table scans with EXPLAIN ANALYZE:


\timing on
-- create the table
drop table if exists large;
create table large ( filler text ) with (fillfactor=10);
-- insert 100 MB
insert into large
 select string_agg( chr((32+random()*94)::int) ,'' )
 from generate_series(1,1000) c, generate_series(1,12800) r
 group by r
\; 
-- check the size
select 'size', pg_table_size('large') , pg_size_pretty( pg_table_size('large') ) , now()
\; 
-- query with full table scan
explain (costs off, analyze, buffers) select * from large
-- repeat every two minutes
\watch i=120 c=1000

I log the execution plans, including the elapsed time for the sequential scan and the number of buffers accessed (shared buffer hits and reads). Some metrics are gathered in Azure, and I also run the PostgreSQL for VS Code extension dashboard, which shows other metrics exposed by PostgreSQL.

It starts with 100% shared buffer hits

The storage grows by 100 MB every two minutes as new inserts are added:

The inserted rows remain constant, but as the table size grows, the read workload increases because more rows are scanned during each execution:

Since this involves a full table scan and the table size grows, it reads an increasing number of blocks, all resulting in shared buffer hits without reading from files, as the inserted pages remain in shared buffers:

The sequential scan in PostgreSQL employs a small ring buffer, but this occurs only on cache misses. Since the pages are already present in the shared buffer from prior insertions, this does not apply here.

This continues, and the table size reaches the shared buffer size.

The inserts fill up the shared buffer (11 GiB)

I run on a HorizonDB instance with 16 GiB RAM, provisioned like in the previous post, and 11 GiB are allocated to the shared buffer:

postgres=> \dconfig shared*

                                        List of configuration parameters

            Parameter             |                                    Value
----------------------------------+------------------------------------------------------------------------------
 shared_buffers                   | 11241MB
 shared_memory_size               | 11776MB
 shared_memory_size_in_huge_pages | 5888
 shared_memory_type               | mmap

Until I added 11 GB, the sequential scan only involved shared hit buffers and did not perform any reads (1433600 x 8KB blocks equals 11,200 MB):


               Tue 16 Jun 2026 10:33:01 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |              now
----------+---------------+----------------+-------------------------------
 size     |   11747319808 | 11 GB          | 2026-06-16 10:33:01.555657+00
(1 row)

            Tue 16 Jun 2026 10:33:01 AM GMT (every 120s)

                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on large (actual time=0.005..507.198 rows=1433600 loops=1)
   Buffers: shared hit=1433600 dirtied=12800
 Planning Time: 0.013 ms
 Execution Time: 551.265 ms
(4 rows)

Time: 4724.370 ms (00:04.724)

The next run, with 100 MB more data, begins to show some buffers read (16071 x 8KB = 125MB):

               Tue 16 Jun 2026 10:35:01 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |              now
----------+---------------+----------------+-------------------------------
 size     |   11852201984 | 11 GB          | 2026-06-16 10:35:01.555654+00
(1 row)

This is consistent with the newly appended pages, 100 MB, displacing the same amount of older buffers once shared buffers were full.

            Tue 16 Jun 2026 10:35:01 AM GMT (every 120s)

                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on large (actual time=0.007..624.128 rows=1446400 loops=1)
   Buffers: shared hit=1430329 read=16071 dirtied=12800 written=753
 Planning Time: 0.022 ms
 Execution Time: 669.210 ms
(4 rows)

Time: 4667.335 ms (00:04.667)

In vanilla PostgreSQL, sequential scans use a bulk-read strategy with a small ring of shared buffers, typically 256KB, to avoid flooding the main shared-buffer cache. With buffered I/O, pages not found in shared buffers may still be served from the operating system page cache, which is the usual double-buffering behavior. In such an experiment, the read (cache misses) increase by 100 MB each time due to buffer evictions from newly inserted data. I observed that on other managed services, asymptotic throughput decayed while cache evictions increased:

In HorizonDB, where the caching layers are different, we do not observe the same gradual decay. After the table scan size exceeded the shared-buffer size, subsequent scans were mostly reported as reads by PostgreSQL, indicating that most older pages were no longer in shared buffers, while only the most recently inserted 100 MB remained as hits. The next run reads the full table via disk reads (1446401 x 8KB = 11300 MB), with only the last 100 MB served from the shared buffer (12799 x 8KB = 100 MB):

               Tue 16 Jun 2026 10:37:01 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |              now
----------+---------------+----------------+-------------------------------
 size     |   11957084160 | 11 GB          | 2026-06-16 10:37:01.556217+00
(1 row)

             Tue 16 Jun 2026 10:37:01 AM GMT (every 120s)

                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on large (actual time=2.825..82165.728 rows=1459200 loops=1)
   Buffers: shared hit=12799 read=1446401 dirtied=12949
 Planning Time: 0.019 ms
 Execution Time: 82219.965 ms
(4 rows)

Time: 86399.819 ms (01:26.400)

PostgreSQL BUFFERS only shows the shared-buffer view. A read is a miss from shared buffers, but in a disaggregated architecture, it may still be served by a local cache before reaching durable remote storage.

Let's do some maths to estimate the storage layer of these buffer reads, considering that most of the per-buffer processing time is dominated by access latency:

  • Shared buffer hits: reading 11 GB in 507ms results in roughly 0.00038 ms per I/O (~0.38 µs), indicative of RAM access.
  • Buffer reads: reading 11 GB in 1 minute and 26 seconds translates to 134 MB/s, approximately 16,000 IOPS with 8KB I/Os. Since the query is single-threaded and has no parallelism, the average I/O time is about 0.0568 ms (~57 µs). This reflects reading from the local NVMe SSD cache.

I've highlighted the execution plans at the key inflection point. We can review those metrics throughout the entire run using the VS Code dashboard.

Because this is the only table in the database, the total size increases by 100 MB every two minutes, reaching 11 GB:

As table scans increase, shared buffer hits or reads also rise. When the table size hits 11 GB, all reads are served by storage (read, no longer from shared buffers), replacing the earlier 100% cache hits.

The cache hit ratio shows when it falls below 100%:

The query execution time increases from 500ms with RAM to 1 minute with an SSD:

Here is another EXPLAIN showing when those metrics were taken:

               Tue 16 Jun 2026 10:55:01 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |              now
----------+---------------+----------------+-------------------------------
 size     |   12901031936 | 12 GB          | 2026-06-16 10:55:01.555546+00
(1 row)

             Tue 16 Jun 2026 10:55:01 AM GMT (every 120s)

                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on large (actual time=0.114..71725.275 rows=1574400 loops=1)
   Buffers: shared hit=12800 read=1561600 dirtied=12800
 Planning Time: 0.022 ms
 Execution Time: 71781.188 ms
(4 rows)

Time: 76089.679 ms (01:16.090)

Reading 12 GB in 1 minute and 12 seconds results in approximately 166 MB/s, corresponding to around 20,000 IOPS with 8 KB I/O operations and an average of 0.046 ms per I/O (~46 microseconds). This aligns with reads served from the local NVMe SSD cache rather than the remote durable storage path.

Even if these reads are not counted as cache hits by PostgreSQL metrics—which only track the shared buffers—they still constitute a highly effective local cache, achieving 10 GB reads in just one minute. This provides faster I/O without requiring larger compute instances.

Next cache level: local NVMe SSD

While the run was ongoing, I observed the query duration:

There were some peaks in response time that temporarily reduced throughput in inserted and fetched rows:

The same is visible from the block reads:

I gathered the execution plans from that period, showing a single run at 11:21 UTC (13:21 CET on the dashboard), with reduced throughput—reading 11 GB over 5 minutes instead of 1 minute.

             Tue 16 Jun 2026 11:17:01 AM GMT (every 120s)

                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on large (actual time=0.111..77440.938 rows=1715200 loops=1)
   Buffers: shared hit=14286 read=1700914 dirtied=12800 written=12555
 Planning Time: 0.022 ms
 Execution Time: 77503.429 ms
(4 rows)

Time: 81571.846 ms (01:21.572)

INSERT 0 12800

              Tue 16 Jun 2026 11:19:01 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |             now
----------+---------------+----------------+-----------------------------
 size     |   14159659008 | 13 GB          | 2026-06-16 11:19:01.5558+00
(1 row)

             Tue 16 Jun 2026 11:19:01 AM GMT (every 120s)

                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on large (actual time=0.109..62897.716 rows=1728000 loops=1)
   Buffers: shared hit=345599 read=1382401
 Planning Time: 0.022 ms
 Execution Time: 62958.979 ms
(4 rows)

Time: 66659.325 ms (01:06.659)

INSERT 0 12800

               Tue 16 Jun 2026 11:21:01 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |              now
----------+---------------+----------------+-------------------------------
 size     |   14264598528 | 13 GB          | 2026-06-16 11:21:01.555743+00
(1 row)

              Tue 16 Jun 2026 11:21:01 AM GMT (every 120s)

                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on large (actual time=4.852..288105.348 rows=1740800 loops=1)
   Buffers: shared hit=313478 read=1427322 dirtied=12800 written=22539
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.029 ms
 Execution Time: 288179.656 ms
(6 rows)

Time: 292242.942 ms (04:52.243)

INSERT 0 12800

               Tue 16 Jun 2026 11:25:53 AM GMT (every 120s)

 ?column? | pg_table_size | pg_size_pretty |              now
----------+---------------+----------------+-------------------------------
 size     |   14369480704 | 13 GB          | 2026-06-16 11:25:53.798728+00
(1 row)

              Tue 16 Jun 2026 11:25:53 AM GMT (every 120s)

                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on large (actual time=30.632..87584.215 rows=1753600 loops=1)
   Buffers: shared hit=14238 read=1739362 dirtied=12800 written=11000
 Planning Time: 0.015 ms
 Execution Time: 87647.389 ms
(4 rows)

Time: 91315.980 ms (01:31.316)

Doing the maths, 1,427,322 reads in 288,105 milliseconds yields an average of roughly 288,105 ms / 1,427,322 ≈ 0.202 ms per read. This indicates that the scan is no longer fully served by the fastest local SSD-cache path, and some reads fall through to a slower storage path.

I reviewed the HorizonDB compute instance metrics in the Azure portal. The Azure portal “Storage” metric shown here does not track the growing table size nor the apparent local read-cache behavior in this experiment, as it remains steady at 120 MB:

The inserted and returned tuples align with our observations. Variations are expected due to the bursty workload: the query executes for about a minute, then pauses for two minutes, causing the 10-second metric buckets to capture the active rate during execution and near-zero during idle times.

The pattern shifts when the query time matches the job's 2-minute wait time. Here is the query time:

The memory usage has been rising until the shared buffers reached their allocated size (Linux allocates lazily):

Finally, it is interesting to observe the network I/O from the compute instance, which relates to the read and write operations to the storage:

The slower run can be explained by some reads taking a lower cache tier or being delayed by contention/throttling. The correlation with network activity makes remote storage involvement plausible, possibly due to SSD cache misses.

Returning to the VS Code dashboard, we can also observe the statistics about vacuum, checkpoint, and WAL:

On the dashboard, the average active-session timeline is visible, along with wait-event names for reads. These suggest that reads are handled by an extension rather than the PostgreSQL core code, as they appear as HorizonDB_SSDCache_Read wait events under the Extension wait type.

Some observations of HorizonDB_Storage_Read indicate longer read times and increased network activity.

Conclusion

These experiments show that HorizonDB behaves like PostgreSQL at the SQL layer while exposing the effects of a multi-tier storage architecture through latency, throughput, and wait events.

With a monotonically increasing table size scanned, we have observed the following phases:

Phase PostgreSQL view Effective per 8 KiB block HorizonDB reality
Fits in shared buffers shared hit ~0.35–0.38 µs PostgreSQL shared-buffer residency
After shared-buffer overflow, normal runs mostly read ~46–57 µs Local low-latency cache below PostgreSQL
“Slower” peaks mostly read ~0.20 ms Slower storage path, including remote-storage reads

The configuration (large shared_buffers and effective_cache_size equal to it) shows that HorizonDB minimizes reliance on the OS page cache and instead uses a multi-tier caching strategy, with a fast NVMe SSD in the stateless compute node, to reduce reads from durable remote storage.

From PostgreSQL’s point of view, the scan is still a normal sequential scan, and BUFFERS still reports only the PostgreSQL shared-buffer state. When the table fits in shared buffers, the scan is served as shared hits and completes in a few hundred milliseconds. Once the table grows beyond shared buffers, PostgreSQL reports mostly read buffers. In a disaggregated storage architecture, those reads do not necessarily mean cold remote storage. They may be served from a lower cache tier beneath PostgreSQL.

The effective per-block timings show three distinct regimes in this test: sub-microsecond effective access times when pages are already in shared buffers, tens of microseconds per PostgreSQL block during normal post-overflow scans, and occasional slower periods of a few tenths of a millisecond per block. These slower periods correlate with longer query times and increased network activity, which makes lower-tier reads or storage-path delays plausible, although this experiment alone cannot distinguish cache misses from contention, throttling, or other synchronization effects.

The important lesson is that PostgreSQL cache-hit ratio alone is not enough to understand performance in a disaggregated system. A drop from shared hit to read may look dramatic in PostgreSQL metrics, but it means “not in shared buffers”, not necessarily “read from remote durable storage” in HorizonDB. The observed latency and wait events indicate an additional cache tier below PostgreSQL, where normal reads are served from a fast local NVMe SSD cache, while slower peaks use a slower storage path.

HorizonDB is in preview on Azure, so those metrics may improve, and your feedback is welcome.

Deep dive into Amazon Aurora PostgreSQL lock analysis with CloudWatch Database Insights

In this post, we show you how to use Amazon CloudWatch Database Insights for lock analysis in Amazon Aurora PostgreSQL. You learn how to enable the feature, interpret lock tree visualizations, resolve common lock-related issues, and maintain optimal database performance. This lock tree analysis feature also applies to Amazon RDS for PostgreSQL.

Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB

TL;DR: This advisory covers the two most important high-severity memory-safety vulnerabilities affecting MongoDB Community and our downstream Percona Server for MongoDB – CVE-2026-11933 and CVE-2026-9740. Both will be addressed in a single coordinated patch release, bundled with other recently revealed lower-scored CVE fixes: CVE-2026-9753, CVE-2026-9752, CVE-2026-9751, CVE-2026-9750, CVE-2026-9749, CVE-2026-9748, CVE-2026-9747, CVE-2026-9746, CVE-2026-9743, and CVE-2026-9741. Fixes land … Continued

The post Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB appeared first on Percona.

Fuzzy String Search for MySQL

Add fuzzy string matching to MySQL with VillageSQL. Learn to use trigrams for typos, Levenshtein distance for spell correction, and phonetic matching.

The insert benchmark on a small server, IO-bound workload : Postgres 19 beta1

This has results for Postgres versions 19 beta1, 18.4 and 17.10 with the Insert Benchmark on a small server using a cached and CPU-bound workload. I also used MySQL 8.4.8 to see where performance was different.

Postgres continues to be boring in a good way. It is hard to find performance regressions.

 tl;dr

  • create index (the l.x step) is faster in Postgres 19beta1. A Postgres expert told me that the sort algorithm was changed to be more CPU efficient
  • the write heavy steps (l.i1, l.i2) are 15% and 9% faster in 19 beta1 vs Postgres 17.10
  • the second write heavy step (l.i2) is more than 20X faster in MySQL 8.4.8 vs Postgres thanks to the CPU overhead from get_actual_variable_range. I have written about this before.

Builds, configuration and hardware

I compiled Postgres from source using -O2 -fno-omit-frame-pointer for versions 19 beta1, 18.4 and 17.10.

I compiled MySQL 8.4.8 from source as well.

The server is an Beelink SER7 with a Ryzen 7 7840HS CPU with 8 cores and AMD SMT disabled, 32G of RAM. Storage is one SSD for the OS and an NVMe SSD for the database using ext-4 with discard enabled. The OS is Ubuntu 24.04.

For 17.10 the config file is named conf.diff.cx10a_c8r32 (cx10a) and is here.

For Postgres 18 and 19 the config file is conf.diff.cx10b_c8r32 (cx10b) which is as similar as possible to the config for version 17.

For MySQL 8.4.8 the config file is my.cnf.cz12a_c8r32.

The Benchmark

The benchmark is explained here and is run with 1 client.

The point query (qp100, qp500, qp1000) and range query (qr100, qr500, qr1000) steps are run for 3600 seconds each.

The benchmark steps are:

  • l.i0
    • insert 800M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 4M rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions) and 1M rows are inserted and deleted per table.
    • Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
  • qr100
    • use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results

The performance summary with charts is here.

This table lists relative QPS per benchmark step and relative QPS is:
    (QPS for my version / QPS for Postgres 17.10)

The background in the table cells is blue for big improvements and yellow for regressions. There are no regressions here. 

The improvements here for Postgres 19 beta1 are similar to what I reported for the cached workload.

The index create (l.x) step is much faster in 19.10. I usually ignore results on this step but I am curious if something was done in 19.10 to improve index create. A Postgres expert told me that the sort algorithm for index create was changed in version 19 to be more CPU efficient.

For the write-heavy steps (l.i1, l.i2):
  • there are large improvements in 19 beta1 (15% and 9%). The CPU overhead is lower in 19 beta1 compared to 17.10 (see cpupq here).
  • throughput for the l.i2 step is more than 20X larger for MySQL than for Postgres. From vmstat I see that the CPU overhead (cpupq here) is more than 10X larger with Postgres vs MySQL. From flamegraphs the problem is the CPU overhead in get_actual_variable_range. I have written about this before (see here). The Postgres query planner uses too much CPU skipping old versions to figure out selectivity for a query and there are too many old versions because Postgres doesn't collect them ASAP, vacuum takes time. The flamegraphs are in subdirectories here.
For the range query steps (qr100, qr500, qr1000) throughput is ~3% less in 19 beta1 vs 17.10 and ~1% less in 18.4 vs 17.10. For 19 beta1 there is a small increase in CPU overhead (see cpupq here, here and here). I already have flamegraphs for MySQL 8.4.8 and Postgres 19 beta1, soon I will have them for Postgres 17.10 and 18.4 to try and explain this.

dbmsl.i0l.xl.i1l.i2qr100qp100qr500qp500qr1000qp1000
PG 17.101.001.001.001.001.001.001.001.001.001.00
PG 18.41.011.031.001.000.981.000.990.990.990.99
PG 19 beta11.011.151.051.090.971.010.961.010.971.00
MySQL 8.4.80.770.890.7621.620.611.070.660.930.850.84

June 16, 2026

Our Italy trip

We spent 10 days in Italy in early April. It was a lot of fun.

Back when I was a grad student, I stayed in Pisa for a month for a "summer school on mobile computing" in 2003. Mobile computing was the next "big" thing back then. (In retrospect, the research was misguided about the local/distributed approach to it, as most of mobile computing reconciled to cloud backends as that is often more efficient.) When I was in Pisa, I didn't travel around much, but I now realize I should have spent every weekend traveling. (In my defense, without Internet enabling trip planning/execution, and without GPS on the phone, traveling was very cumbersome, yes, back in 2003.)


Rome

Rome is amazing! The history is incredibly well preserved. Kudos to the Italians.

Rome is very walkable. The buses, on the other hand, were packed solid. My daughters were genuinely surprised by this exotic new form of transportation that does not exist in the US.

The art in the Vatican was exquisite. All the masters, all in one place. But the lines were horrendous.

Tiramisu is truly wonderful. But (please don't stone me for this) I don't like the coffee. I had promised myself I would drink coffee everywhere, and I tried. But the coffee, even the espresso, just didn't feel that good to me. It was mostly lukewarm. I've gotten used to hot coffee, so I quickly gave up my quest to find my coffee high in Italy. 

As I broke the news to you in this blog post (a post about Jensen Huang, of all things), I got pickpocketed at the Trevi Fountain.

"Here's the strange part. After the initial shock and the credit-card cancellations, I found myself thinking: this person was unbelievably good at their job. I felt nothing. No bump, no distraction, nothing. The wallet disappeared from the deep front pocket of my jeans like a magic trick. Respect. I remember thinking: if only I were that good at my own work."


Venice

Venice is simply beautiful. The buildings, the canals, the whole scene were beautiful. The old town doesn't even allow motorcycles, so everything is foot-powered. Everything is well preserved.



Lake Como

Lake Como looked out of this world. It was trippy. 


But there were lines everywhere. The ferries ran very inefficiently, and 2-hour wait times were the norm. And this wasn't even peak season. Italians are not the best at organizing/running things. In Italy there are lines everywhere: cafes, restaurants, ferries. Everywhere. A good chunk of my time in Italy was spent waiting in lines.


Turin

Turin is bigger and more metropolitan. It is very clean. I didn't get to see much of it, unfortunately.


The night before our early flight back to the States, we stayed near the Milan airport, in a little town called Somma Lombardo. Even that town was beautiful, with buildings dating from the 1500s, all in remarkably well-preserved shape.


So, what happened to Italy?

Italy was very active at the start of the tech revolution. Olivetti could have been the center of gravity for the personal computer revolution, you know. And then there are these magnificent brands from Italy, including Ferrari, Lamborghini, Fiat, Alfa Romeo. So why don't we see any more of that innovation

Improve query performance with EXPLAIN plans in Amazon Aurora DSQL

In this post, we show you how to use EXPLAIN plans to diagnose and improve query performance in Amazon Aurora DSQL. We introduce a three-layer filter model as a practical framework for understanding where your predicates are evaluated, and walk through the architecture differences that make Aurora DSQL plans unique, the anatomy of an EXPLAIN output, access method selection, and a step-by-step query improvement workflow.

Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement

Managing data retention policies is one of the most common operational tasks in MySQL. Applications continuously generate transactional, audit, logging, telemetry, and event data. Over time, these tables can grow to billions of rows, causing: Larger backups Longer recovery times Reduced buffer pool efficiency Slower index maintenance Increased storage costs Degraded query performance To address … Continued

The post Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement appeared first on Percona.

The feedback loops behind Kubernetes

Kubernetes is a framework for feedback controllers: write down what you want, observe what exists, make the next change, and repeat.

June 15, 2026

Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency

Overview When building high-availability MySQL environments, the choice between MySQL Group Replication (GR) and Percona XtraDB Cluster (PXC) often comes down to how they handle the eternal database dilemma: data consistency versus performance.        While both provide “synchronous-like” replication, they approach the problem of stale reads—reading data that has been committed on one node but not … Continued

The post Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency appeared first on Percona.

The Failover Brownout: Rethinking High Availability in MySQL Group Replication

It is time to talk again about Flow control and group replication. This time with a special eye on the use of Group Replication in the Kubernetes context. In this article we will dig a bit on how it works and what are the various side effects.    The problem Recently I was refining the … Continued

The post The Failover Brownout: Rethinking High Availability in MySQL Group Replication appeared first on Percona.

See what your database is doing right now with Connections

Connections lets you monitor and manage all active connections to Postgres and Vitess databases. See active sessions, identify locking patterns, and keep debugging even when normal application connections are exhausted.

ClickHouse healthcare analytics

Healthcare data is high-volume, high-stakes, and structurally complex: ADT events, lab results, vitals streams, medication records, and billing codes all arrive continuously and need to be queryable in real time for clinical operations, quality reporting, and population health. ClickHouse handles the volume and latency that healthcare analytics requires.

ClickHouse IoT data

IoT deployments generate time-series telemetry at a rate that most databases can't ingest and query simultaneously. Thousands of devices reporting every second, anomaly detection that needs to run in real time, and dashboards that need to show fleet-wide metrics without scanning terabytes of raw readings. ClickHouse is built for exactly this pattern.

Postgres vs SQL Server

Postgres and SQL Server are both mature relational databases capable of handling production OLTP workloads, but they make fundamentally different trade-offs on licensing, ecosystem, SQL dialect, and analytics. Choosing between them depends on your organization's existing stack, budget, and where your data needs to go next.

Postgres vs SQLite

Postgres and SQLite are both relational databases, but they solve different problems. SQLite is an embedded library for local storage. Postgres is a client-server database for concurrent, networked applications. Choosing between them is usually straightforward once you understand the architectural difference.