a curated list of database news from authoritative sources

February 27, 2026

From Relational Algebra to Document Semantics

The relational model was designed not to mirror application structures, but to optimize reasoning about data dependencies independently of access patterns. By restricting data to relations in First Normal Form (1NF) and enforcing closure, relational algebra makes query behavior mathematically derivable, so correctness, equivalence, and optimization follow from the model itself rather than implementation-specific rules.

Document databases take the opposite approach: they optimize representation for applications, network, and storage. Domain concepts such as ownership, lifecycle, and cardinality are embedded directly in the data’s shape. Without assuming value atomicity, algebra no longer applies, so semantics must be defined explicitly. This is what MongoDB did when defining query behavior in its document database.

Applications rarely start from relations

Most applications do not model data as relations. They model aggregates. It's obvious in Domain-Driven Design and Object-Oriented analysis, but it was true long before object‑oriented languages. Data structures in applications have always been hierarchical, with sub-structures and repeating groups:

  • Record-based models include nested tables, like in the COBOL data division where relationships are expressed through containment and repetition:
01 EMPLOYEE.
   05 NAME        PIC X(20).
   05 SKILLS.
      10 SKILL    PIC X(10) OCCURS 5 TIMES.
  • Object‑oriented models are more flexible but follow the same pattern with embedded objects, arrays, or lists:
@Entity
class Employee {
    @Id Long id;
    String name;
    @ElementCollection
    List<String> skills;
}

These models express semantics that do not exist in relational algebra:

  • ownership: the skills belong to the employee
  • lifecycle: the root aggregate is deleted with all its elements
  • cardinality: this model supposes a short and bounded list of skills per employee, but a large and growing number of employees
  • locality: employee's skills are queried with the employee, and stored together on disk

Those models are optimized for application behavior, not for algebraic reasoning. This is not new. It is how applications have always been written. Why do relational systems require data to be represented as flat relations at the logical level?

Why the relational model exists

When E. F. Codd published “Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks” in 1969, his objective was not to help developers write applications, but to establish a formal mathematical foundation for data management.

He based the relational model and relational algebra on mathematics:

  • Set theory for relations, with operations such as (union), (intersection), (difference), and × (Cartesian product)
  • First‑order predicate logic for constraints and querying: selection (σ) corresponds to logical predicates, and joins correspond to conjunction with implicit existential quantification () over the join attributes
  • A closed algebraic query language at the logical level, where every operation produces another relation

Within this framework, a relation is defined as:

  • a set of tuples (unordered, with no duplicates)
  • where all tuples share the same attributes
  • and every attribute value is drawn from a simple (atomic) domain

These properties are not modeling advice. They are the definition.

First normal form

First Normal Form (1NF) is often presented as a design guideline. In Codd’s original work, it is not. It is a mandatory constraint to apply the first‑order predicate logic.

Without atomic attribute values, relations cease to be relations as defined by Codd, and the standard relational algebra no longer applies. Comparisons become ambiguous, predicates are no longer boolean, and algebraic equivalence rules break down.

Relational algebra is a closed system where inputs and outputs are relations. Its operators—selection (σ), projection (π), join (⨝)—all assume that:

  • attributes can be compared using equality
  • predicates evaluate to true or false
  • each comparison involves exactly one value

This is what enables equivalence rules, join reordering, and provable optimizations, and the maths is defined for atomic values only.

Let's see some examples with SQL, which is inspired by the relational model (but is not itself a pure relational language).

1NF to apply mathematical operations to relations

Here is an example with a table of employees' skills:

CREATE TABLE employee_skill (
    employee_name TEXT,
    skill TEXT
);

INSERT INTO employee_skill VALUES
   ('Ann', 'SQL'),
   ('Ann', 'Java'),
   ('Ann', 'Python'),
   ('Bob', 'Java')
;

A simple selection involves a predicate comparing a column to a value:

SELECT DISTINCT *
   FROM employee_skill
   WHERE skill = 'Java'
;

It returns another relation with only the facts that verify the predicate:


 employee_name | skill
---------------+-------
 Ann           | Java
 Bob           | Java

This works because skill is atomic, equality has one meaning, and the result is still a relation.

This collapses without 1NF

Here is another model, simple, but that violates 1NF:

CREATE TABLE employee_array (
    name TEXT,
    skills TEXT[]
);

INSERT INTO employee_array VALUES
   ('Ann', ARRAY['SQL', 'Java', 'Python']),
   ('Bob', ARRAY['Java'])
;

The same predicate no longer applies:

SELECT DISTINCT *
   FROM employee_array
   WHERE skills = 'Java'
;


ERROR:  malformed array literal: "Java"
LINE 3: WHERE skills = 'Java';
                       ^
DETAIL:  Array value must start with "{" or dimension information.

PostgreSQL requires new operators:

SELECT DISTINCT *
   FROM employee_array
   WHERE 'Java' = ANY(skills)
;

 name |      skills
------+-------------------
 Ann  | {SQL,Java,Python}
 Bob  | {Java}

(2 rows)

SELECT DISTINCT *
   FROM employee_array
   WHERE skills @> ARRAY['Java']
;

 name |      skills
------+-------------------
 Ann  | {SQL,Java,Python}
 Bob  | {Java}

(2 rows)

These operators encode membership and containment. They are not part of relational algebra, and their exact semantics and syntax are vendor‑specific in SQL systems.

SQL/JSON does not restore relational semantics

JSON and JSONB datatypes do not change this:

CREATE TABLE employee_json (
    doc JSONB
);

INSERT INTO employee_json VALUES
  ('{"name": "Ann", "skills": ["SQL", "Java", "Python"]}'),
  ('{"name": "Bob", "skills": ["Java"]}')
;

Selections rely on path navigation and containment, with special operators specific to the datatype:

SELECT DISTINCT *
   FROM employee_json
   WHERE doc->'skills' ? 'Java'
;

                         doc
------------------------------------------------------
 {"name": "Ann", "skills": ["SQL", "Java", "Python"]}
 {"name": "Bob", "skills": ["Java"]}

(2 rows)

SELECT DISTINCT *
   FROM employee_json
   WHERE doc->'skills' @> '["Java"]'
;

                         doc
------------------------------------------------------
 {"name": "Ann", "skills": ["SQL", "Java", "Python"]}
 {"name": "Bob", "skills": ["Java"]}

(2 rows)

This involves again different datatypes, operators, semantic, and also indexing. A different type of index is required to serve those predicates, a GIN index in PostgreSQL, with different syntax and different capabilities.

JSON has been added to the SQL standard as SQL/JSON but this doesn't unify the semantics. For example, an SQL array starts at 1 and a JSON array starts at 0:

SELECT 
 skills[0] "0",
 skills[1] "1",
 skills[2] "2",
 skills[3] "3"
FROM employee_array
;

 0 |  1   |  2   |   3
---+------+------+--------
   | SQL  | Java | Python
   | Java |      |

(2 rows)

SELECT 
 doc->'skills'->0 "0",
 doc->'skills'->1 "1",
 doc->'skills'->2 "2",
 doc->'skills'->3 "3"
FROM employee_json
;

   0    |   1    |    2     | 3
--------+--------+----------+---
 "SQL"  | "Java" | "Python" |
 "Java" |        |          |

(2 rows)

JSON support in RDBMS extends SQL beyond relational algebra and introduces datatype‑specific semantics that are not algebraically closed. This is expected and was foreseen when enforcing the first normal form. Codd’s insight was that once attributes stop being atomic, mathematics no longer dictates behavior. Meaning must be defined explicitly.

MongoDB’s added semantics

MongoDB embraces the document model directly to match the data representation in the domain model and application structures:


db.employees.insertMany([
  { name: "Bob", skills: "Java" },
  { name: "Ann", skills: ["SQL", "Java", "Python"] }
]);

This is intentionally not 1NF because multiple entities and values may belong to the same aggregate. The relational operations cannot simply use the mathematical definition.

Selection resembles the relational operation, but when applied to a non‑1NF collection, MongoDB defines an explicit, extended semantics:


db.employees.find({ skills: "Java" })
;

[
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c2'),
    name: 'Bob',
    skills: 'Java'
  },
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c3'),
    name: 'Ann',
    skills: [ 'SQL', 'Java', 'Python' ]
  }
]

The same predicate applies to scalars and arrays. The document matches if the value or any array element satisfies the filtering condition. The same in SQL would require a union between a query using the SQL selection and another using the JSON containment, and casting the final result to the same datatype.

With MongoDB, indexes, comparisons, and sorting follow the same rule, as confirmed by execution plans. I create an index on skills that can index scalar values as well as array items, with one index type, and a generic syntax:


db.employees.createIndex({ skills: 1 })
;

It is easy to verify that the index is used to find the documents on a multi-key path (which means including an array in the path):

db.employees.find(
 { skills: "Java" }
).explain().queryPlanner.winningPlan

{
  isCached: false,
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { skills: 1 },
    indexName: 'skills_1',
    isMultiKey: true,
    multiKeyPaths: { skills: [ 'skills' ] },
    isUnique: false,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: { skills: [ '["Java", "Java"]' ] }
  }
}

The multi-key index has one index entry for each item when it is an array. When a comparison operator is applied to an array field, MongoDB compares the query value to each array element individually.

I used an equality predicate but the indexBounds in the execution plan show that the same can apply to a range. The same index is used for non-equality predicates:

db.employees.find({ skills: { $gt: "M"} })
;

[
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c3'),
    name: 'Ann',
    skills: [ 'SQL', 'Java', 'Python' ]
  }
]

db.employees.find({ skills: { $lt: "M"} })
;

[
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c2'),
    name: 'Bob',
    skills: 'Java'
  },
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c3'),
    name: 'Ann',
    skills: [ 'SQL', 'Java', 'Python' ]
  }
]

Only Ann has a skill with a name after 'M' in the alphabet. Both Ann and Bob have a skill with a name before 'M' in the alphabet.

When sorting on an array field, MongoDB uses the minimum array element for ascending sort and the maximum array element for descending sort, according to BSON comparison order:

db.employees.find().sort({ skills: 1 })
;

[
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c2'),
    name: 'Bob',
    skills: 'Java'
  },
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c3'),
    name: 'Ann',
    skills: [ 'SQL', 'Java', 'Python' ]
  }
]

db.employees.find().sort({ skills: -1 })
;

[
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c3'),
    name: 'Ann',
    skills: [ 'SQL', 'Java', 'Python' ]
  },
  {
    _id: ObjectId('69a0ccaece22bf6640d4b0c2'),
    name: 'Bob',
    skills: 'Java'
  }
]

Here, 'Java' is the first in alphabetical order, so both employees are at the same rank in an ascending sort, but 'SQL' is the last in alphabetical order so 'Ann' appears first in a descending sort.

Again, the index is used:

db.employees.find().sort({ skills: 1 }).explain().queryPlanner.winningPlan
;

{
  isCached: false,
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { skills: 1 },
    indexName: 'skills_1',
    isMultiKey: true,
    multiKeyPaths: { skills: [ 'skills' ] },
    isUnique: false,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: { skills: [ '[MinKey, MaxKey]' ] }
  }
}

MongoDB is optimized for scalable OLTP, index access is a must for equality and range predicates as well as sorting for pagination. In SQL databases, inverted indexes such as GIN are typically specialized for containment and equality predicates and offer more limited support for range ordering and pagination than B‑tree indexes.

Not forcing First Normal Form allows storage and indexing to remain efficient:

  • compound index may include fields from multiple entities within one aggregate
  • storage involves a single disk I/O per aggregate

By deviating from 1NF, closure is not guaranteed—by design. An explicit $unwind operation in an aggregation pipeline can normalize the result to a relation, in its mathematical sense, if needed.

With MongoDB, we can list the skills of employees who have a 'Java' skills, with all their skill, as a relational result:

db.employees.aggregate([  
  { $match: { skills: "Java" } },  
  { $unwind: "$skills" },
  { $project: { _id: 0, name: "$name", skill: "$skills" } }
])
;

[
  { name: 'Bob', skill: 'Java' },
  { name: 'Ann', skill: 'SQL' },
  { name: 'Ann', skill: 'Java' },
  { name: 'Ann', skill: 'Python' }
]

This simple query in MongoDB would be much more complex with the PostgreSQL examples above:

-- Correlated semi-join (EXISTS) over a normalized relation
SELECT DISTINCT es.employee_name AS name, es.skill  
  FROM employee_skill es  
  WHERE EXISTS (  
    SELECT 1  
    FROM employee_skill j  
    WHERE j.employee_name = es.employee_name  
      AND j.skill = 'Java'
;  

-- Existential quantification (ANY) over a non-1NF attribute (ARRAY) with explicit normalization (UNNEST)
SELECT DISTINCT ea.name, s.skill  
  FROM employee_array ea  
  CROSS JOIN LATERAL unnest(ea.skills) AS s(skill)  
  WHERE 'Java' = ANY (ea.skills)
;  

-- JSON containment predicate (@>) with explicit normalization (jsonb_array_elements)
SELECT DISTINCT doc->>'name' AS name, skill  
  FROM employee_json  
  CROSS JOIN LATERAL jsonb_array_elements_text(doc->'skills') AS s(skill)  
  WHERE doc->'skills' @> '["Java"]'
;  

Those queries must return all employees's skill, not only the one used for the filter, because they are part of the same aggregate. With those SQL queries, the object-relational mapping (ORM) must regroup those rows to build the aggregate.

In practice, the MongoDB query will not even $unwind to mimic a relation as it gets directly the aggregate:

db.employees.aggregate([  
  { $match: { skills: "Java" } },  
  { $project: { _id: 0, name: 1, skills: 1 } }
])
;

[
  { name: 'Bob', skills: 'Java' },
  { name: 'Ann', skills: [ 'SQL', 'Java', 'Python' ] }
]

With this query, MongoDB returns the binary BSON object directly to the application driver, instead of converting it into records or JSON like most SQL databases.

Conclusion

We exposed the enhanced semantics for selection over a non-1NF collection, as an example. MongoDB does more than enhance selection. All relational operations are extended with a document semantics:

  • Selection works over scalars and arrays
  • Projection reshapes documents
  • Sort semantics are defined over arrays
  • Indexes apply uniformly to scalars and array elements
  • Joins exist ($lookup), and the semantics is defined even if the key is an array.

Relational theory is independent of physical implementation, but most RDBMS map each relation to a single table, and an index can cover the columns from a single table. Relational databases stem from mathematics and use normalization to structure storage. In contrast, applications center on aggregates, and MongoDB preserves these aggregates down to the storage layer.

First Normal Form (1NF) is required by the relational model—and therefore by relational SQL databases—because relational algebra assumes atomic attributes. In contrast, document databases relax these algebraic constraints in favor of explicit, document-oriented semantics, enabling similar operations to be performed directly on documents. So, when you hear that “data is relational” or feel you must apply normal forms to your domain model, recognize that this perspective is tied to a specific relational database implementation, not the nature of your data. The same data can support the same application with a document model.

Both models are valid but optimize for different abstraction layers: relational algebra offers a logical model for reasoning about data independently of the domain, while document databases model data as applications already do.

February 26, 2026

Security Advisory: A Series of CVEs Affecting Valkey

A series of vulnerabilities has been identified that affect all versions of Valkey. Below is the summary of each vulnerability: The patches for these CVEs had been released in newer versions of valkey-server and valkey-bloom. Please consider upgrading to these versions as soon as possible: valkey-server 9.0.3 valkey-server 8.1.6 valkey-server 8.0.7 valkey-server 7.2.12 valkey-bloom 1.0.1 […]

Replicate spatial data using AWS DMS and Amazon RDS for PostgreSQL

In this post, we show you how to migrate spatial (geospatial) data from self-managed PostgreSQL, Amazon RDS for PostgreSQL, or Amazon Aurora PostgreSQL-Compatible Edition to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL using AWS DMS. Spatial data is useful for applications such as mapping, routing, asset tracking, and geographic visualization. We walk through setting up your environment, configuring AWS DMS, and validating the successful migration of spatial datasets.

February 25, 2026

Percona Operator for MongoDB 1.22.0: Automatic Storage Resizing, Vault Integration, Service Mesh Support, and More!

The latest release of the Percona Operator for MongoDB, 1.22.0 is here. It brings automatic storage resizing, HashiCorp Vault integration for system user credentials, better integration with service meshes, improved backup and restore options, and more. This post walks through the highlights and how they can help your MongoDB deployments on Kubernetes. Percona Operator for MongoDB […]

Writing Away From the Screen

I had written earlier that the first step of my paper reading process is actually printing the paper. I like to physically touch the paper and handwrite and doodle in the margins. For years, a Pilot Metropolitan fountain pen loaded with Waterman blue-black ink was my weapon of choice for wrestling with the papers. For thinking hard and for getting things out of my chest (exploring how I feel about something), that I also relied on that fountain pen.

This past year, I served as a Program Committee member for SOSP, OSDI, NSDI, and ATC. This meant reviewing about 15 heavy-duty papers for each conference. My HP laser printer is 20 years old and has finally gotten glitchy (the paper feed is broken, requiring me to feed pages carefully lest it jams). Facing the review workload, I looked for a solution that could save my sanity, and some trees as well.

So I bought a reMarkable Paper Pro (RMPP). This is an e-ink reader/writer with color and 11.8 inch screen. I knew what I was getting into:  muted colors, screen flashing, and UI slowness. There are no software apps on the tablet; it is strictly a digital paper substitute.

The claim is that this forced minimalism is the appeal of RMPP compared to an iPad, which comes feature-maxxed and distraction-heavy. After a year of using the RMPP, I agree.

Beyond focus, the writing feel is a selling point. The RMPP does well here, whereas the iPad still feels like writing on glass (though I hear that PaperLike screen protector for iPad improves the experience by providing paper-like friction). Does the RMPP fully replace the soul of a Pilot Metropolitan? No, it doesn't. But it gets close enough for doing some longhand thinking, and it is editable and searchable (unlike my fountain pen).  

Battery life is another interesting trade-off. E-ink is incredibly power efficient, but since the RMPP screen is somewhat dark (far from a bright paper-white background), I have to use it with the backlight. This reduces the battery duration from weeks to a couple days, depending on use. Well, a standard iPad would also last through a full day of work as well, which makes it also acceptable. While e-ink may be better on the eyes compared to the iPad's LCD screen, and works well under sunlight, let's be honest, how many times are you actually going to use the RMPP outside?

Overall, I can't complain about the RMPP. It solved my paper reviewing use case, and serves as a scratchpad during Zoom meetings. It has also taken over as my notebook for longhand writing. But at the end of the day, it remains a niche product. Considering the cost of RMPP versus iPad, an iPadAir with 13 inch screen is likely a better deal (if you can de-claw its distractions). The RMPP gels really well with some people, but its limitations will rub many the wrong way.

So, do your own research before you get a tablet. But as I wrote in my previous post, having a plan to escape the computer screen is a smart move. It is getting harder to do deep work in front of a glowing monitor, and securing an "analog" escape (be it a digital tablet, or a physical notebook) is important for doing more focused and centered thinking.

I started a software research company

I quit my job at EnterpriseDB hacking on PostgreSQL products last month to start a company researching and writing about software infrastructure. I believe there is space for analysis that is more focused on code than TechCrunch or The Register, more open to covering corporate software development than LWN.net, and (as much as I love some of these folks) less biased than VCs writing about their own investments.

I believe that more than ever there is a need for authentic and trustworthy analysis and coverage of the software we depend on.

This company, The Consensus, will talk about databases and programming languages and web servers and everything else that is important for experienced developers to understand and think about. It is independent of any software vendor and independent of any particular technology.

Some people were surprised (in a positive way) to see me cover MySQL already, for example. But that is exactly the point.

I don't want The Consensus to be just "Phil's thoughts". I have already started working with a number of experienced developers who will be writing, and paid to write, for The Consensus.

I also hope that this is another way, beyond the many communities I already run, to give back to the community such as in highlighting the work of open-source developers (the first interview with a DataFusion developer is coming soon), and highlighting compelling events and jobs in the software infrastructure world.

The Consensus is entirely bootstrapped and will depend on the support of subscribers and, potentially, sponsors. The first few subscribers signed up just this past week.

You can read more about the background and goals here, you can read about how contributors will work with The Consensus here, and you can get a sense for where this is going by browsing the homepage of The Consensus already.

Thank you for your support in advance! Thank you to the folks who have subscribed already despite very little fanfare. Feedback is very welcome. I'm very excited and having quite a bit of fun already. We're all going to learn a lot.

February 23, 2026

MariaDB innovation: vector index performance

Last year I shared many posts documenting MariaDB performance for vector search using ann-benchmarks. Performance was great in MariaDB 11 and this blog post explains that it is even better in MariaDB 12. This work was done by Small Datum LLC and sponsored by the MariaDB Foundation. My previous posts were published in January and February 2025.

tl;dr

  • Vector search recall vs precision in MariaDB 12.3 is better than in MariaDB 11.8
  • Vector search recall vs precision in Maria 11.8 is better than in Postgres 18.2 with pgvector 0.8.1
  • The improvements in MariaDB 12.3 are more significant for larger datasets
  • MariaDB 12.3 has the best results because it use less CPU per query
Benchmark

This post has much more detail about my approach. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit.  The ann-benchmarks config files are here for MariaDB and for Postgres.

This time I used the dbpedia-openai-X-angular tests for X in 100k, 500k and 1000k.

For hardware I used a larger server (Hetzner ax162-s) with 48 cores, 128G of RAM, Ubuntu 22.04 and HW RAID 10 using 2 NVMe devices. 

For databases I used:
  • MariaDB versions 11.8.5 and 12.3.0 with this config file. Both were compiled from source. 
  • Postgres 18.2 with pgvector 0.8.1 with this config file. These were compiled from source. For Postgres tests were run with and without halfvec (float16).
I had ps and vmstat running during the benchmark and confirmed there weren't storage reads as the table and index were cached by MariaDB and Postgres.

The command lines to run the benchmark using my helper scripts are:
    bash rall.batch.sh v1 dbpedia-openai-100k-angular c32r128
    bash rall.batch.sh v1 dbpedia-openai-500k-angular c32r128
    bash rall.batch.sh v1 dbpedia-openai-1000k-angular c32r128

Results: dbpedia-openai-100k-angular

Summary
  • MariaDB 12.3 has the best results
  • the difference between MariaDB 12.3 and 11.8 is smaller here than it is below for 500k and 1000k
Results: dbpedia-openai-500k-angular

Summary
  • MariaDB 12.3 has the best results
  • the difference between MariaDB 12.3 and 11.8 is larger here than above for 100k
Results: dbpedia-openai-1000k-angular

Summary
  • MariaDB 12.3 has the best results
  • the difference between MariaDB 12.3 and 11.8 is larger here than it is above for 100k and 500k


How to Unsubscribe from Modern Luxury

A few years ago I started getting issues of Modern Luxury in the mail. I had no idea why they started coming, and I tried to get them to stop. This should have been easy, and was instead hard. Here’s my process, in case anyone else is in the same boat.

First, if you use it, try to unsubscribe via PaperKarma. This is convenient and works for a decent number of companies. PaperKarma kept reporting they’d successfully unsubscribed me, but Modern Luxury kept coming.

Second, write to subscriptions@modernluxury.com. I got no response.

Third, call any numbers you can find associated with the company. Leave voicemails on anything that claims to be Modern Luxury related. Along this path I wound up discovering a Borgesian labyrinth of sketchy offers for life-alert style emergency devices and other things that felt vaguely like elder abuse; long story short, this did not work.

Fourth, Modern Luxury’s email format is [first initial][last name]@modernluxury.com. Start writing emails to a few names from your local edition that seem relevant, like the local publisher and editor. When they don’t respond, expand your emails to include everyone listed in the magazine. Start digging through corporate filings of their parent company, Cumulus Media, and emailing people there. Start short and simple; when that doesn’t work, try humor. This didn’t work either, but it was fun to write:

I love me some esoteric rich people nonsense. Fabergé eggs! Ominous lawn obelisks! Having oneself taxidermied and wheeled out for council meetings of University College London! Unfortunately, Modern Luxury contains nothing like this; perhaps rich people have forgotten how to be interesting. In any event, I would like you to stop. If you can figure out how to stop sending me magazines, I promise to stop sending you emails about it, and we can all go on to live happy lives.

Contraluxuriantly,

Kyle Kingsbury

Finally, cut out a suitable article from an issue of the magazine. Look up up the home address of the regional group publisher in city records. Mail the article back to the publisher, along with a letter asking them to stop.

Dear Mr. Uslan,

As the regional group publisher of Modern Luxury magazine, I would like you to stop publishing Modern Luxury to my home each month. I never asked for it, and I have been trying to unsubscribe for years. E-mails, phone calls, Paper Karma: nothing works. I appreciate your most recent column, entitled “Spirit of Generosity”, but please: it is possible to be too generous. Kindly stop sending these magazines.

Exhaustedly,

Kyle Kingsbury

This actually seems to have worked.


I think a lot about this idea of the Annoyance Economy—that modern life places ordinary people in contact with a dizzying array of opaque, nonresponsive bureaucracies, and that those bureaucracies have financial incentives to ignore you. This is why it’s so hard to replace a CPAP or get paid back when movers break things. This is why Redplum (one of those advertising/coupon mailers) ignored my unsubscribe requests for years, and only stopped when I started e-mailing the entire C-suite about it. I try to pick and choose these battles, but sometimes it’s hard to let it go. And goshdarnit, if nobody pushes back then bureaucratic indifference works, and we all have to live with it.

I don’t want to bother people like this; I think it’s unreasonably rude. I still start with the official support channels and escalate gradually. I like Patrick McKenzie’s strategy of presenting oneself as a boring, dangerous professional. However, I have also found that in the Annoyance Economy, one of the ways to get things done is to find specific people with power, and annoy them right back.

I hope this whole misadventure convinced Modern Luxury to build and document an easy unsubscribe process. If not, you know what to do.

February 22, 2026

February 21, 2026

Read‑your‑writes on replicas: PostgreSQL WAIT FOR LSN and MongoDB Causal Consistency

In databases designed for high availability and scalability, secondary nodes can fall behind the primary. Typically, a quorum of nodes is updated synchronously to guarantee durability while maintaining availability, while remaining standby instances are eventually consistent to handle partial failures. To balance availability with performance, synchronous replicas acknowledge a write only when it is durable and recoverable, even if it is not yet readable.

As a result, if your application writes data and then immediately queries another node, it may still see stale data.

Here’s a common anomaly: you commit an order on the primary and then try to retrieve it from a reporting system. The order is missing because the read replica has not yet applied the write.

PostgreSQL and MongoDB tackle this problem in different ways:

  • PostgreSQL 19 introduces a WAIT FOR LSN command, allowing applications to explicitly coordinate reads after writes.
  • MongoDB provides causal consistency within sessions using the afterClusterTime read concern.

Both approaches track when your write occurred and ensure subsequent reads observe at least that point. Let’s look at how each database does this.

PostgreSQL: WAIT FOR LSN (PG19)

PostgreSQL records every change in the Write‑Ahead Log (WAL). Each WAL record has a Log Sequence Number (LSN): a 64‑bit position, typically displayed as two hexadecimal halves such as 0/40002A0 (high/low 32 bits).

Streaming replication ships WAL records from the primary to standbys, which then:

  1. Write WAL records to disk
  2. Flush them to durable storage
  3. Replay them, applying changes to data files

The write position determines what can be recovered after a database crash. The flush position defines the recovery point for a compute instance failure. The replay position determines what queries can see on a standby.

WAIT FOR LSN allows a session to block until one of these points reaches a target LSN:

  • standby_write → WAL written to disk on the standby (not yet flushed)
  • standby_flush → WAL flushed to durable storage on the standby
  • standby_replay (default) → WAL replayed into data files and visible to readers
  • primary_flush → WAL flushed on the primary (useful when synchronous_commit = off and a durability barrier is needed)

A typical flow is to write on the primary, commit, and then fetch the current WAL insert LSN:

pg19rw=*# BEGIN;

BEGIN

pg19rw=*# INSERT INTO orders VALUES (123, 'widget');

INSERT 0 1

pg19rw=*# COMMIT;

COMMIT

pg19rw=# SELECT pg_current_wal_insert_lsn();

 pg_current_wal_insert_lsn
---------------------------
 0/18724C0

(1 row)

That LSN is then used to block reads on a replica until it has caught up:


pg19ro=# WAIT FOR LSN '0/18724C0'
  WITH (MODE 'standby_replay', TIMEOUT '2s');

This LSN‑based read‑your‑writes pattern in PostgreSQL requires extra round‑trips: capturing the LSN on the primary and explicitly waiting on the standby. For many workloads, reading from the primary is simpler and faster.

The pattern becomes valuable when expensive reads must be offloaded to replicas while still preserving read‑your‑writes semantics, or in event‑driven and CQRS designs where the LSN itself serves as a change marker for downstream consumers.

MongoDB: Causal Consistency

While PostgreSQL reasons in WAL positions, MongoDB tracks causality using oplog timestamps and a hybrid logical clock.

In a replica set, each write on the primary produces an entry in local.oplog.rs, a capped collection. These entries are rewritten to be idempotent (for example, $inc becomes $set) so they can be safely reapplied. Each entry carries a Hybrid Logical Clock (HLC) timestamp that combines physical time with a logical counter, producing a monotonically increasing cluster time. Replica set members apply oplog entries in timestamp order.

Because MongoDB allows concurrent writes, temporary “oplog holes” can appear: a write with a later timestamp may commit before another write with an earlier timestamp. A naïve reader scanning the oplog could skip the earlier operation.

MongoDB prevents this by tracking an oplogReadTimestamp, the highest hole‑free point in the oplog. Secondaries are prevented from reading past this point until all prior operations are visible, ensuring causal consistency even in the presence of concurrent commits.

Causal consistency in MongoDB is enforced by attaching an afterClusterTime to reads:

  • Drivers track the operationTime of the last operation in a session.
  • When a session is created with causalConsistency: true, the driver automatically includes an afterClusterTime equal to the highest known cluster time on subsequent reads.
  • The server blocks the read until its cluster time has advanced beyond afterClusterTime.

With any read preference that allows reading from secondaries as well as the primary, this guarantees read‑your‑writes behavior:


// Start a causally consistent session
const session = client.startSession({ causalConsistency: true });

const coll = db.collection("orders");

// Write in this session
await coll.insertOne({ id: 123, product: "widget" }, { session });

// The driver automatically injects afterClusterTime into the read concern
const order = await coll.findOne({ id: 123 }, { session });

Causal consistency is not limited to snapshot reads. It applies across read concern levels. The key point is that the session ensures later reads observe at least the effects of earlier writes, regardless of which replica serves the read.

Conclusion

Here is a simplified comparison:

Feature PostgreSQL WAIT FOR LSN MongoDB Causal Consistency
Clock type Physical byte offset in the WAL (LSN) Hybrid Logical Clock (HLC)
Mechanism Block until replay/write/flush LSN reached Block until afterClusterTime is visible
Tracking Application captures LSN Driver tracks operationTime
Granularity WAL record position Oplog timestamp
Replication model Physical streaming Logical oplog application
Hole handling N/A (serialized WAL) oplogReadTimestamp
Failover handling Error unless NO_THROW Session continues, bounded by replication state

Both PostgreSQL’s WAIT FOR LSN and MongoDB’s causal consistency ensure reads can observe prior writes, but at different layers:

  • PostgreSQL offers manual, WAL‑level precision.
  • MongoDB provides automatic, session‑level guarantees.

If you want read‑your‑writes semantics to “just work” without additional coordination calls, MongoDB’s session‑based model is a strong fit. Despite persistent myths about consistency, MongoDB delivers strong consistency in a horizontally scalable system with a simple developer experience.

End of Productivity Theater

I remember the early 2010s as the golden age of productivity hacking. Lifehacker, 37signals, and their ilk were everywhere, and it felt like everyone was working on jury-rigging color-coded Moleskine task-trackers and web apps into the perfect Getting Things Done system.

So recently I found myself wondering: what happened to all that excitement? Did I just outgrow the productivity movement, or did the movement itself lose stream?

After poking around a bit, I think it's both. We collectively grew out of that phase, and productivity itself fundamentally changed.


The Trap of Micro-Optimizations

Back then, the underlying promise of productivity culture was about outputmaxxing (as we would now call it). We obsessed over efficiency at the margins: how to auto-sync this app with that one, or how to shave 5 seconds off an email reply. We accumulated systems, hacks, and integrations like collectors.

Eventually, the whole thing got exhausting. I think we all realized that tweaking task managers wasn't helping the bottom line. We were doing a lot of organizing, but that organizing wasn't reflecting in actually getting the work done.

The reason is simple: not all tasks matter equally. Making some tasks faster does not move the bottomline if the core task remains the serial bottleneck. Amdahl’s Law says that speeding up one part of a system improves overall performance only in proportion to the time that part consumes. If the hard, irreducible core is untouched, optimizations elsewhere are just noise. 

Painting the deck of a sinking ship faster doesn't help anyone. Productivity should be about making sure we are working on the right things in the first place. The main thing is to keep the main thing the main thing. 


Away From the Glowing Rectangle

For more than 15 years, I've relied on Emacs org-mode to run my life. It's the ultimate organization system, that has survived every software trend of the past decade and a half. But despite having this powerful writing system at my fingertips, my best ideas never arrive while I'm staring at a screen. Almost without exception, my hard thinking happens away from the screen. That's where the ideas come from.

If I'm being rational about it: I should be paid for the time I spend thinking hard, not for the time I spend managing my inbox, or doing trivial office work, or wrangling text on a screen.

So that's how I try to work now. I do my deep thinking, messy brainstorming, and wrestling-with-ideas completely away from the screen. Then I plan my next 45 minutes or so (what I'm going to do, in what order, and why) and only then do I go to my laptop to execute it. In other words, I arrive at the screen with a plan.

(OK, let's first take a moment to appreciate my self-restraint for not mentioning AI until this late in to the post. But here it comes.)

What does productivity even mean in the age of AI? What are we actually here to contribute? Are we supposed to be architects or butlers to LLMs?

If AI absorbs all the shallow work, the only things left that genuinely require a human are the core parts that demands genuine creativity, judgment, taste, and the type of thinking that can't be prompted away. This raises the stakes considerably, and changes what "a productive day" even means.

That kind of deep creative work is best done away from the glowing rectangle.


I recently launched a free email newsletter for the blog. Subscribe here to get these essays delivered to your inbox, along with behind-the-scenes commentary. 

February 19, 2026

Top-K queries with MongoDB search indexes (BM25)

A document database is more than a JSON datastore. It must also support efficient storage and advanced search: equality and range predicates, fuzzy text search, ranking, pagination, and limited sorted results (top‑k). BM25 indexes, which combine an inverted index and columnar doc values, are ideal for this, with mature open‑source implementations like Lucene (used by MongoDB) and Tantivy (used by ParadeDB).

ParadeDB brings Tantivy indexing to PostgreSQL via the pg_search extension and recently published an excellent article showing where GIN indexes fall short and how BM25 bridges the gap. Here, I’ll present the MongoDB equivalent using its Lucene‑based search indexes. I suggest reading ParadeDB’s post first, as it clearly explains the problem and the solution:

How We Optimized Top K in Postgres | ParadeDB

How ParadeDB uses principles from search engines to optimize Postgres' Top K performance.

paradedb.com

I'll be lazy and use the same dataset, index and query.

MongoDB with search indexes

You can use BM25 indexes on MongoDB in several environments: the cloud-managed service (MongoDB Atlas), its local deployment (Atlas Local), on-premises MongoDB Enterprise Server, and the open-source MongoDB Community edition. The mongot engine that powers MongoDB Search is in public preview, with its source available at github.com/mongodb/mongot.

I started a local Atlas deployment on my laptop with Atlas CLI and connected automatically:


atlas deployments setup  mongo --type local --connectWith mongosh --force 

Dataset generation

I generated 100,000,000 documents similar to ParadeDB's benchmark:


const batchSize = 10000;
const batches   = 10000;

const rows      = batches * batchSize;
print(`Generating ${rows.toLocaleString()} documents`);

db.benchmark_logs.drop();

const messages = [ 'The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.', 'The research facility analyzed samples from ancient artifacts, revealing breakthrough findings about civilizations lost to the depths of time.', 'The research station monitored weather patterns across mountain peaks, collecting data about atmospheric changes in the remote depths below.', 'The research observatory captured images of stellar phenomena, peering into the cosmic depths to understand the mysteries of distant galaxies.', 'The research laboratory processed vast amounts of genetic data, exploring the molecular depths of DNA to unlock biological secrets.', 'The research center studied rare organisms found in ocean depths, documenting new species thriving in extreme underwater environments.', 'The research institute developed quantum systems to probe subatomic depths, advancing our understanding of fundamental particle physics.', 'The research expedition explored underwater depths near volcanic vents, discovering unique ecosystems adapted to extreme conditions.', 'The research facility conducted experiments in the depths of space, testing how different materials behave in zero gravity environments.', 'The research team engineered crops that could grow in the depths of drought conditions, helping communities facing climate challenges.' ];

const countries = [ 'United States', 'Canada', 'United Kingdom', 'France', 'Germany', 'Japan', 'Australia', 'Brazil', 'India', 'China' ];

const labels = [ 'critical system alert', 'routine maintenance', 'security notification', 'performance metric', 'user activity', 'system status', 'network event', 'application log', 'database operation', 'authentication event' ];

let batch = [];
const startDate = new Date("2020-01-01T00:00:00Z");
for (let i = 0; i < rows; i++) {

  batch.push({
    message: messages[i % 10],
    country: countries[i % 10],
    severity: (i % 5) + 1,
    timestamp: new Date(startDate.getTime() + (i % 731) * 24 * 60 * 60 * 1000),
    metadata: {
      value: (i % 1000) + 1,
      label: labels[i % 10]
    }
  });

  if (batch.length === batchSize) {
    db.benchmark_logs.insertMany(batch);
    batch = [];
  }

}

I checked the document schema and counts:

print(`Done!
 \nSample: ${EJSON.stringify(  db.benchmark_logs.find().limit(1).toArray(),  null,  2  )}
 \nDocument count: ${db.benchmark_logs.countDocuments().toLocaleString()}
`);

Sample: [
  {
    "_id": {
      "$oid": "6997580679ab8450f81ff93c"
    },
    "message": "The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.",
    "country": "United States",
    "severity": 1,
    "timestamp": {
      "$date": "2020-01-01T00:00:00Z"
    },
    "metadata": {
      "value": 1,
      "label": "critical system alert"
    }
  }
]

Document count: 100,000,000

With 100 million documents, this is a large dataset. Because many fields can be queried, we can’t create every compound index combination. A single search index will make queries on this collection efficient.

Search index creation

I created the search index similar to the one used on ParadeDB (here):

const mapping = {
  mappings: {
    // Equivalent to: USING bm25 Atlas Search uses Lucene BM25 by default
    dynamic: false,
    fields: {
      // Equivalent to: bm25(id, message, ...) Standard full-text field scored by BM25
      message: { type: "string" },
      // Equivalent to: text_fields = {  "country": {  fast: true, tokenizer: { type: "raw", lowercase: true } } } // fast = true → implicit in Atlas Search; docValues optional in cloud
      country: { type: "string", analyzer: "keywordLowercase" },
      // Equivalent to:numeric field indexed for filtering
      severity: { type: "number", representation: "int64" },
      // Equivalent to:timestamp field included in the BM25 index
      timestamp: { type: "date" },
      // Equivalent to: json_fields = { "metadata": { fast: true, tokenizer: raw } }
      metadata: {
        type: "document",
        fields: {
          value: {
            type: "number",
            representation: "int64"
          },
          // Equivalent to: metadata tokenizer = raw + lowercase
          label: {
            type: "string",
            analyzer: "keywordLowercase"
          }
        }
      }
    }
  },
  analyzers: [
    {
      // Equivalent to: tokenizer = raw, lowercase = true
      name: "keywordLowercase",
      tokenizer: { type: "keyword" },
      tokenFilters: [{ type: "lowercase" }]
    }
  ]
};

db.benchmark_logs.createSearchIndex(
  "benchmark_logs_idx",
  mapping
);

The index is created asynchronously and updated via change stream operations.

Query and result

The query combines text search, range filter, sort by score, and limit for Top-K:

query = [
  {
    $search: {
      index: "benchmark_logs_idx",
      compound: {
        must: [{ text: { query: "research team", path: "message" } }],
        filter: [{ range: { path: "severity", lt: 3 } }]
      },
      sort: { score: { $meta: "searchScore" } }
    }
  },
  { $limit: 10 },
  {
    $project: {
      message: 1,
      country: 1,
      severity: 1,
      timestamp: 1,
      metadata: 1,
      rank: { $meta: "searchScore" }
    }
  }
]

const start = Date.now();

print(EJSON.stringify(db.benchmark_logs.aggregate(query).toArray(),null,2));

const end = Date.now();
print(`\nExecution time: ${end - start} ms`);

It is important that the sort is part of $search because an additional $sort stage would not be pushed down. This allows Atlas Search to run the query in Lucene’s Top‑K mode, enabling block‑max WAND (BMW) pruning via competitive score feedback during collection.

Here is the result and timing:


[{"_id":{"$oid":"699757049ce6a7c42c65d105"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-01-11T00:00:00Z"},"metadata":{"value":11,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d10f"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-01-21T00:00:00Z"},"metadata":{"value":21,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d119"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-01-31T00:00:00Z"},"metadata":{"value":31,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d123"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-02-10T00:00:00Z"},"metadata":{"value":41,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d12d"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-02-20T00:00:00Z"},"metadata":{"value":51,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d137"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-03-01T00:00:00Z"},"metadata":{"value":61,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d141"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-03-11T00:00:00Z"},"metadata":{"value":71,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d14b"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-03-21T00:00:00Z"},"metadata":{"value":81,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d155"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-03-31T00:00:00Z"},"metadata":{"value":91,"label":"critical system alert"},"rank":0.6839379072189331},{"_id":{"$oid":"699757049ce6a7c42c65d15f"},"message":"The research team discovered a new species of deep-sea creature while conducting experiments near hydrothermal vents in the dark ocean depths.","country":"United States","severity":1,"timestamp":{"$date":"2020-04-10T00:00:00Z"},"metadata":{"value":101,"label":"critical system alert"},"rank":0.6839379072189331}]

Execution time: 1850 ms

On my laptop, this search over 100 million documents returns results in under two seconds, with no tuning. It performs a broad text match, and the high‑frequency terms "research" and "team" generate tens of millions of candidate documents. The additional severity filter and scoring require comparing tens of millions of scores, which has been heavily parallelized to stay within the two‑second budget.

Performance breakdown (explain)

Because the execution plan is long, I’ve packed it into a short string that you can easily copy and paste into your preferred AI chatbot:

EJSON.stringify(
db.benchmark_logs.aggregate(query).explain("executionStats")
);

{"explainVersion":"1","stages":[{"$_internalSearchMongotRemote":{"mongotQuery":{"index":"benchmark_logs_idx","compound":{"must":[{"text":{"query":"research team","path":"message"}}],"filter":[{"range":{"path":"severity","lt":3}}]},"sort":{"score":{"$meta":"searchScore"}}},"explain":{"query":{"type":"BooleanQuery","args":{"must":[{"path":"compound.must","type":"BooleanQuery","args":{"must":[],"mustNot":[],"should":[{"type":"TermQuery","args":{"path":"message","value":"research"},"stats":{"context":{"millisElapsed":1.273251,"invocationCounts":{"createWeight":2,"createScorer":87}},"match":{"millisElapsed":0},"score":{"millisElapsed":1292.607756,"invocationCounts":{"score":40000011}}}},{"type":"TermQuery","args":{"path":"message","value":"team"},"stats":{"context":{"millisElapsed":0.292666,"invocationCounts":{"createWeight":2,"createScorer":87}},"match":{"millisElapsed":0},"score":{"millisElapsed":379.190071,"invocationCounts":{"score":10000011}}}}],"filter":[],"minimumShouldMatch":0},"stats":{"context":{"millisElapsed":2.268162,"invocationCounts":{"createWeight":2,"createScorer":87}},"match":{"millisElapsed":0},"score":{"millisElapsed":3838.859709,"invocationCounts":{"score":40000011}}}}],"mustNot
                                    
                                    
                                    
                                    
                                

A Guide to Accelerating Your Application with Valkey: Caching Database Queries and Sessions

Modern applications often rely on multiple services to provide fast, reliable, and scalable responses. A common and highly effective architecture involves an application, a persistent database (like MySQL), and a high-speed cache service (like Valkey). In this guide, we’ll explore how to integrate these components effectively using Python to dramatically improve your application’s performance. Understanding […]

How We Built Tinybird's TypeScript SDK for ClickHouse

How we built the Tinybird TypeScript SDK: phantom types for compile-time inference, esbuild for schema loading, and a dev workflow that connects your app and data layer.