a curated list of database news from authoritative sources

April 28, 2025

PostgreSQL aborts the transactions on error

You may be surprised by this in PostgreSQL:

postgres=!# commit;
ROLLBACK
postgres=#

Yes, I issued a COMMIT but got a ROLLBACK!
I'll demo how it happened and how to avoid it. In short, the transaction was already rolled back and the only possible command to run is a ROLLBACK, which is implicit when terminating the transaction.

I created a table, started a transaction and inserted one row:

postgres=# create table demo ( id int primary key );
CREATE TABLE
postgres=# begin transaction;
BEGIN
postgres=*# insert into demo values (1);
INSERT 0 1
postgres=*#

The * in the prompt (which comes from the %x in the default %/%R%x%# PROMPT1) shows that I'm still in a transaction

I try to add the same row, with the same key, which violates the primary key constraint:

postgres=*# insert into demo values (1);
ERROR:  duplicate key value violates unique constraint "demo_pkey"
DETAIL:  Key (id)=(1) already exists.
postgres=!#

The ! in the prompt shows that the transaction failed.

I can check its status from another transaction (trying it within a failed transaction would have raised ERROR: current transaction is aborted, commands ignored until end of transaction block):

postgres=!# \! psql -c "select pid, application_name, state from pg_stat_activity where wait_event='ClientRead'"

  pid  | application_name |             state             
-------+------------------+-------------------------------
 66420 | psql             | idle in transaction (aborted)
(1 row)

postgres=!#

As the transaction has been aborted, it has released all locks. To verify this, I insert the same tow from another session:

postgres=!# \! psql -c "insert into demo values (1)"

INSERT 0 1

postgres=!

The only thing I can do is ending the transaction block, with ABORT, ROLLBACK or even COMMIT:

postgres=!# commit;
ROLLBACK
postgres=# 

If you're used to Oracle Database, you might find it surprising that in an interactive transaction, you must restart from the beginning even after completing previous statements and only one failed. Oracle Database automatically creates a savepoint before each user call, rolling back to this savepoint in case of an error, so that the user can continue with another statement once aware of the error.

In PostgreSQL, creating implicit savepoints is the client's or driver's responsibility.

For instance, PgJDBC can enable autosave=on to achieve this. However, it's important to note that using savepoints in PostgreSQL may be more resource-intensive compared to other databases.

Another example is a PL/pgSQL statement with an exception block that creates an implicit savepoint to roll back the main block before running the exception block. This differs from Oracle Database, which rolls back only the statement that failed when continuing to the exception block of PL/SQL.

If the exceptions are managed by your application code, you must use savepoints to achieve the same.

With an interactive user interface, like PSQL, it might be preferrable to create an implicit savepoint before each statement, and this is possible with ON_ERROR_ROLLBACK. Here is an example:

postgres=# drop table demo;
DROP TABLE

postgres=# create table demo ( id int primary key );
CREATE TABLE

postgres=# \set ON_ERROR_ROLLBACK on

postgres=# begin transaction;
BEGIN

postgres=*# insert into demo values (1);
INSERT 0 1

postgres=*# insert into demo values (1);
ERROR:  duplicate key value violates unique constraint "demo_pkey"
DETAIL:  Key (id)=(1) already exists.

postgres=*# \! psql -c "select pid, application_name, state from pg_stat_activity where wait_event='ClientRead'"  

  pid  | application_name |        state
-------+------------------+---------------------
 66461 | psql             | idle in transaction
(1 row)

postgres=*# insert into demo values (2);
INSERT 0 1

postgres=*# commit;
COMMIT

In this interactive transaction, with ON_ERROR_ROLLBACK set to on, I was able to continue with another value when I got the information that the one I tried to insert was a duplicate one.

When managing exceptions in your application, such as implementing retry logic for serializable errors, consider creating a savepoint before executing a statement. This allows you to continue with the same transaction if an exception is caught. However, be cautious as it is not always the right solution. In cases of deadlocks, one transaction must abort to release its locks. The scope of rollback on errors depends on what has been executed before, so it makes sense that the application controls it rather than relying on defaults.

In interactive usage with PSQL, setting ON_ERROR_ROLLBACK is advisable to prevent rolling back all previous work due to a simple typo causing an error. While it is unnecessary if you do not start a transaction explicitly, and rely on autocommit, running interactive commands without the ability to verify outcomes before committing changes is not recommended.

April 27, 2025

Index Only Scan on JSON Documents, with Covering and Multi-Key Indexes in MongoDB

Storing JSON documents in a SQL database does not make it a document database. The strength of any database lies in its indexing capabilities, and SQL databases, even with JSON datatypes, lack the flexibility of document databases, particularly when dealing with arrays in an embedded One-to-Many relationships.

In SQL databases, normalized tables are first-class citizens, with single-key indexes optimized for handling equality and range filters (WHERE), sorting (ORDER BY), and projection coverage (SELECT) for individual tables. To avoid joining before filtering, a denormalized model is preferred when those operations involve multiple tables.
While SQL databases can store documents, including arrays for One-to-Many relationships, working with arrays in JSON necessitates the use of inverted indexes like PostgreSQL's GIN, which do not cover range filters, sorting, and projection like regular indexes.

In contrast, MongoDB treats documents as the core of its data model. Its indexing mechanisms naturally extend to handle documents and their arrays, retaining functionality for filtering, sorting, and projection coverage, as regular indexes can hangle multiple keys per documents.

In previous posts, we examined how a multi-key index supports sort operations. Now, let's explore the conditions under which a query projection is covered by the index, eliminating the need to fetch the document from the collection.

Here is the table of content of the tests:

  • With default projection: IXSCAN ➤ FETCH
  • Partial projection: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE
  • Full projection: IXSCAN ➤ PROJECTION_COVERED
  • Query on a single-key entry: IXSCAN ➤ PROJECTION_COVERED
  • Query on a multi-key entry: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE
  • Projection of "_id" : IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE
  • Projection of "$recordId" : IXSCAN ➤ PROJECTION_DEFAULT <!-- TOC end -->

Here is a collection of friends, with their first name, last name, and phone number.

db.friends.insertMany([
  { firstName: "Rachel", lastName: "Green", phoneNumber: "555-1234" },
  { firstName: "Ross", lastName: "Geller", phoneNumber: "555-2345" },
  { firstName: "Monica", lastName: "Geller", phoneNumber: "555-3456" },
  { firstName: "Chandler", lastName: "Bing", phoneNumber: "555-4567" },
  { firstName: "Joey", lastName: "Tribbiani", phoneNumber: "555-6789" },
  { firstName: "Janice", lastName: "Hosenstein", phoneNumber: "555-7890" },
  { firstName: "Gunther", lastName: "Centralperk", phoneNumber: "555-8901" },
  { firstName: "Carol", lastName: "Willick", phoneNumber: "555-9012" },
  { firstName: "Susan", lastName: "Bunch", phoneNumber: "555-0123" },
  { firstName: "Mike", lastName: "Hannigan", phoneNumber: "555-1123" },
  { firstName: "Emily", lastName: "Waltham", phoneNumber: "555-2234" }
])

In any database, relational or document, implementing new use cases often requires an index for effective access patterns. For instance, for a reverse phone directory, I create an index where the key starts with the phone number. I add the names to the key in order to allow for index-only scans to benefit from the O(log n) scalability of B-Tree indexes:

db.friends.createIndex(
 { phoneNumber:1, firstName:1, lastName:1 } 
)

To confirm that an index-only scan is occurring, I examine the execution plan specifically for PROJECTION_COVERED rather than FETCH

With default projection: IXSCAN ➤ FETCH

Due to its flexible schema, MongoDB cannot assume that all fields in every document within a collection are covered by the index. As a result, it must fetch the entire document:

db.friends.find( 
 { phoneNumber:"555-6789" } 
).explain('executionStats').executionStats


{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    docsExamined: 1,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
      indexName: 'phoneNumber_1_firstName_1_lastName_1',
      isMultiKey: false,
      multiKeyPaths: { phoneNumber: [], firstName: [], lastName: [] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        phoneNumber: [ '["555-6789", "555-6789"]' ],
        firstName: [ '[MinKey, MaxKey]' ],
        lastName: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 1,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

Looking at the result, I can see the "_id" which is stored in the document:

db.friends.find( 
 { phoneNumber:"555-6789" } 
)

[
  {
    _id: ObjectId('680d46a1672e2e146dd4b0c6'),
    firstName: 'Joey',
    lastName: 'Tribbiani',
    phoneNumber: '555-6789'
  }
]

I can remove it from the projection as I don't need it.

Partial projection: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE

I add a projection to exclude the "_id" from the result, but it doesn't remove the FETCH that gets the document with all fields:

db.friends.find( 
 { phoneNumber:"555-6789" } 
 , { "_id":0 }
).explain('executionStats').executionStats


{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_SIMPLE',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0 },
    inputStage: {
      stage: 'FETCH',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 2,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
        indexName: 'phoneNumber_1_firstName_1_lastName_1',
        isMultiKey: false,
        multiKeyPaths: { phoneNumber: [], firstName: [], lastName: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          phoneNumber: [ '["555-6789", "555-6789"]' ],
          firstName: [ '[MinKey, MaxKey]' ],
          lastName: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  }
}

Even if I know that my documents have no other fields, the query planner doesn't know it and must plan to get the document.

Full projection: IXSCAN ➤ PROJECTION_COVERED

When the projection declares all fields, and they are in the index key, there's no need to fetch the document as the projection is covered:

db.friends.find( 
 { phoneNumber:"555-6789" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
).explain('executionStats').executionStats

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 0,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_COVERED',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0, firstName: 1, lastName: 1, phoneNumber: 1 },
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
      indexName: 'phoneNumber_1_firstName_1_lastName_1',
      isMultiKey: false,
      multiKeyPaths: { phoneNumber: [], firstName: [], lastName: [] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        phoneNumber: [ '["555-6789", "555-6789"]' ],
        firstName: [ '[MinKey, MaxKey]' ],
        lastName: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 1,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

Such plan is an index only scan, optimal as it doesn't need to read documents.

Adding an array instead of a scalar value

Now that we have examined how a query is covered by a single-key index, where each document has a unique index entry, let's explore the implications of a multi-key index. In MongoDB, a field can contain a single value in one document and an array of values in another. I add such a document, where one of the friends has three phone numbers:

db.friends.insertOne({
  firstName: "Phoebe",
  lastName: "Buffay",
  phoneNumber: ["555-3344", "555-4455", "555-5566"]
})

We refer to the index as a multi-key index, but in reality, it remains the same index in MongoDB. The distinction lies in its capacity to hold multiple entries per document, rather than solely single-key entries.

Query on a single-key entry: IXSCAN ➤ PROJECTION_COVERED

When I query the same single-key document as before, nothing changes and the projection is covered:

db.friends.find( 
 { phoneNumber:"555-6789" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
).explain('executionStats').executionStats

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_SIMPLE',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0, firstName: 1, lastName: 1, phoneNumber: 1 },
    inputStage: {
      stage: 'FETCH',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 2,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
        indexName: 'phoneNumber_1_firstName_1_lastName_1',
        isMultiKey: true,
        multiKeyPaths: { phoneNumber: [ 'phoneNumber' ], firstName: [], lastName: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          phoneNumber: [ '["555-6789", "555-6789"]' ],
          firstName: [ '[MinKey, MaxKey]' ],
          lastName: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 1,
        dupsDropped: 0
      }
    }
  }
}

A key advantage of MongoDB's flexible document model is that changes in structure, as the business evolves, do not impact existing documents. This is more agile than SQL databases where changing a One-to-One relationship to a One-to-Many requires complete refactoring of the model and extensive non-regression testing.

Query on a multi-key entry: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE

There's a difference when I query the document with an array of values, visible with isMultiKey: true in the IXSCAN, and a FETCH stage:

db.friends.find( 
 { phoneNumber:"555-4455" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
).explain('executionStats').executionStats

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_SIMPLE',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0, firstName: 1, lastName: 1, phoneNumber: 1 },
    inputStage: {
      stage: 'FETCH',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 2,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
        indexName: 'phoneNumber_1_firstName_1_lastName_1',
        isMultiKey: true,
        multiKeyPaths: { phoneNumber: [ 'phoneNumber' ], firstName: [], lastName: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          phoneNumber: [ '["555-4455", "555-4455"]' ],
          firstName: [ '[MinKey, MaxKey]' ],
          lastName: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 1,
        dupsDropped: 0
      }
    }
  }
}

Understanding the behavior is simplified by recognizing that there is one index entry for each key, with only one entry being read (keysExamined: 1). However, the projection requires access to all associated values. Even if a single value is used to locate the document, the result must display all relevant values:

db.friends.find( 
 { phoneNumber:"555-4455" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
)

[
  {
    firstName: 'Phoebe',
    lastName: 'Buffay',
    phoneNumber: [ '555-3344', '555-4455', '
                                    
                                    
                                    
                                    
                                

Index Only Scan on JSON Documents, with Covering and Multi-Key Indexes in MongoDB

Storing JSON documents in a SQL database does not make it a document database. The strength of any database lies in its indexing capabilities, and SQL databases, even with JSON datatypes, lack the flexibility of document databases, particularly when dealing with arrays in an embedded One-to-Many relationships.

In SQL databases, normalized tables are first-class citizens, with single-key indexes optimized for handling equality and range filters (WHERE), sorting (ORDER BY), and projection coverage (SELECT) for individual tables. To avoid joining before filtering, a denormalized model is preferred when those operations involve multiple tables.
While SQL databases can store documents, including arrays for One-to-Many relationships, working with arrays in JSON necessitates the use of inverted indexes like PostgreSQL's GIN, which do not cover range filters, sorting, and projection like regular indexes.

In contrast, MongoDB treats documents as the core of its data model. Its indexing mechanisms naturally extend to handle documents and their arrays, retaining functionality for filtering, sorting, and projection coverage, as regular indexes can hangle multiple keys per documents.

In previous posts, we examined how a multi-key index supports sort operations. Now, let's explore the conditions under which a query projection is covered by the index, eliminating the need to fetch the document from the collection.

Here are the execution plans tested:

  • With default projection: IXSCAN ➤ FETCH
  • Partial projection: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE
  • Full projection: IXSCAN ➤ PROJECTION_COVERED
  • Query on a single-key entry: IXSCAN ➤ PROJECTION_COVERED
  • Query on a multi-key entry: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE
  • Projection of "_id" : IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE
  • Projection of "$recordId" : IXSCAN ➤ PROJECTION_DEFAULT

Here is a collection of friends, with their first name, last name, and phone number.

db.friends.insertMany([
  { firstName: "Rachel", lastName: "Green", phoneNumber: "555-1234" },
  { firstName: "Ross", lastName: "Geller", phoneNumber: "555-2345" },
  { firstName: "Monica", lastName: "Geller", phoneNumber: "555-3456" },
  { firstName: "Chandler", lastName: "Bing", phoneNumber: "555-4567" },
  { firstName: "Joey", lastName: "Tribbiani", phoneNumber: "555-6789" },
  { firstName: "Janice", lastName: "Hosenstein", phoneNumber: "555-7890" },
  { firstName: "Gunther", lastName: "Centralperk", phoneNumber: "555-8901" },
  { firstName: "Carol", lastName: "Willick", phoneNumber: "555-9012" },
  { firstName: "Susan", lastName: "Bunch", phoneNumber: "555-0123" },
  { firstName: "Mike", lastName: "Hannigan", phoneNumber: "555-1123" },
  { firstName: "Emily", lastName: "Waltham", phoneNumber: "555-2234" }
])

In any database, relational or document, implementing new use cases often requires an index for effective access patterns. For instance, for a reverse phone directory, I create an index where the key starts with the phone number. I add the names to the key in order to allow for index-only scans to benefit from the O(log n) scalability of B-Tree indexes:

db.friends.createIndex(
 { phoneNumber:1, firstName:1, lastName:1 } 
)

To confirm that an index-only scan is occurring, I examine the execution plan specifically for PROJECTION_COVERED rather than FETCH

With default projection: IXSCAN ➤ FETCH

Due to its flexible schema, MongoDB cannot assume that all fields in every document within a collection are covered by the index. As a result, it must fetch the entire document:

db.friends.find( 
 { phoneNumber:"555-6789" } 
).explain('executionStats').executionStats


{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    docsExamined: 1,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
      indexName: 'phoneNumber_1_firstName_1_lastName_1',
      isMultiKey: false,
      multiKeyPaths: { phoneNumber: [], firstName: [], lastName: [] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        phoneNumber: [ '["555-6789", "555-6789"]' ],
        firstName: [ '[MinKey, MaxKey]' ],
        lastName: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 1,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

Looking at the result, I can see the "_id" which is stored in the document:

db.friends.find( 
 { phoneNumber:"555-6789" } 
)

[
  {
    _id: ObjectId('680d46a1672e2e146dd4b0c6'),
    firstName: 'Joey',
    lastName: 'Tribbiani',
    phoneNumber: '555-6789'
  }
]

I can remove it from the projection as I don't need it.

Partial projection: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE

I add a projection to exclude the "_id" from the result, but it doesn't remove the FETCH that gets the document with all fields:

db.friends.find( 
 { phoneNumber:"555-6789" } 
 , { "_id":0 }
).explain('executionStats').executionStats


{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_SIMPLE',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0 },
    inputStage: {
      stage: 'FETCH',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 2,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
        indexName: 'phoneNumber_1_firstName_1_lastName_1',
        isMultiKey: false,
        multiKeyPaths: { phoneNumber: [], firstName: [], lastName: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          phoneNumber: [ '["555-6789", "555-6789"]' ],
          firstName: [ '[MinKey, MaxKey]' ],
          lastName: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  }
}

Even if I know that my documents have no other fields, the query planner doesn't know it and must plan to get the document.

Full projection: IXSCAN ➤ PROJECTION_COVERED

When the projection declares all fields, and they are in the index key, there's no need to fetch the document as the projection is covered:

db.friends.find( 
 { phoneNumber:"555-6789" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
).explain('executionStats').executionStats

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 0,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_COVERED',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0, firstName: 1, lastName: 1, phoneNumber: 1 },
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
      indexName: 'phoneNumber_1_firstName_1_lastName_1',
      isMultiKey: false,
      multiKeyPaths: { phoneNumber: [], firstName: [], lastName: [] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        phoneNumber: [ '["555-6789", "555-6789"]' ],
        firstName: [ '[MinKey, MaxKey]' ],
        lastName: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 1,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

Such plan is an index only scan, optimal as it doesn't need to read documents.

Adding an array instead of a scalar value

Now that we have examined how a query is covered by a single-key index, where each document has a unique index entry, let's explore the implications of a multi-key index. In MongoDB, a field can contain a single value in one document and an array of values in another. I add such a document, where one of the friends has three phone numbers:

db.friends.insertOne({
  firstName: "Phoebe",
  lastName: "Buffay",
  phoneNumber: ["555-3344", "555-4455", "555-5566"]
})

We refer to the index as a multi-key index, but in reality, it remains the same index in MongoDB. The distinction lies in its capacity to hold multiple entries per document, rather than solely single-key entries.

Query on a single-key entry: IXSCAN ➤ PROJECTION_COVERED

When I query the same single-key document as before, nothing changes and the projection is covered:

db.friends.find( 
 { phoneNumber:"555-6789" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
).explain('executionStats').executionStats

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_SIMPLE',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0, firstName: 1, lastName: 1, phoneNumber: 1 },
    inputStage: {
      stage: 'FETCH',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 2,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
        indexName: 'phoneNumber_1_firstName_1_lastName_1',
        isMultiKey: true,
        multiKeyPaths: { phoneNumber: [ 'phoneNumber' ], firstName: [], lastName: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          phoneNumber: [ '["555-6789", "555-6789"]' ],
          firstName: [ '[MinKey, MaxKey]' ],
          lastName: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 1,
        dupsDropped: 0
      }
    }
  }
}

A key advantage of MongoDB's flexible document model is that changes in structure, as the business evolves, do not impact existing documents. This is more agile than SQL databases where changing a One-to-One relationship to a One-to-Many requires complete refactoring of the model and extensive non-regression testing.

Query on a multi-key entry: IXSCAN ➤ FETCH ➤ PROJECTION_SIMPLE

There's a difference when I query the document with an array of values, visible with isMultiKey: true in the IXSCAN, and a FETCH stage:

db.friends.find( 
 { phoneNumber:"555-4455" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
).explain('executionStats').executionStats

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'PROJECTION_SIMPLE',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 2,
    advanced: 1,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    transformBy: { _id: 0, firstName: 1, lastName: 1, phoneNumber: 1 },
    inputStage: {
      stage: 'FETCH',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      docsExamined: 1,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        works: 2,
        advanced: 1,
        needTime: 0,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        keyPattern: { phoneNumber: 1, firstName: 1, lastName: 1 },
        indexName: 'phoneNumber_1_firstName_1_lastName_1',
        isMultiKey: true,
        multiKeyPaths: { phoneNumber: [ 'phoneNumber' ], firstName: [], lastName: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          phoneNumber: [ '["555-4455", "555-4455"]' ],
          firstName: [ '[MinKey, MaxKey]' ],
          lastName: [ '[MinKey, MaxKey]' ]
        },
        keysExamined: 1,
        seeks: 1,
        dupsTested: 1,
        dupsDropped: 0
      }
    }
  }
}

Understanding the behavior is simplified by recognizing that there is one index entry for each key, with only one entry being read (keysExamined: 1). However, the projection requires access to all associated values. Even if a single value is used to locate the document, the result must display all relevant values:

db.friends.find( 
 { phoneNumber:"555-4455" } 
 , { "_id":0 , firstName:1, lastName:1, phoneNumber:1 }
)

[
  {
    firstName: 'Phoebe',
    lastName: 'Buffay',
    phoneNumber: [ '555-3344', '555-4455', '555-5566
                                    
                                    
                                    
                                    
                                

April 25, 2025

pgvector: The Critical PostgreSQL Component for Your Enterprise AI Strategy

You’re likely racing to enhance your applications with more intelligent, data-driven capabilities, whether through AI-powered models (which have moved into “must implement now!” territory), advanced search functions, real-time fraud detection, or geospatial analysis. As these demands grow, you face a significant challenge: efficiently storing, managing, and querying high-dimensional vector data within your existing database infrastructure. […]

April 24, 2025

What It Takes to Be PostgreSQL Compatible

What It Takes to Be PostgreSQL Compatible

We, and many other database enthusiasts, are big fans of PostgreSQL. Even though we built a database system from scratch, we believe there are many good reasons for using PostgreSQL. In fact, we like PostgreSQL so much that we made sure to build CedarDB to be compatible with PostgreSQL.

Because of PostgreSQL’s popularity, we were not the first to develop a PostgreSQL compatible database system. CMU’s “Database of Databases” lists over 40 database systems that claim to be PostgreSQL compatible. Among them you can find database systems from large cloud vendors such as AlloyDB from Google or Aurora from AWS.

April 23, 2025

Set Up Transactional Email in Convex with Bluefox and AWS SES

Learn how to integrate Bluefox—a modern developer-first email API built on AWS SES—with Convex, the reactive backend database. This guide covers everything from sending transactional and triggered emails to setting up real-time webhooks and preparing for production. You’ll walk away knowing how to configure your environment, build reusable email clients, handle errors safely, and manage multiple deployment environments with best practices for SES authentication, bounce handling, and email tracking.

Does FCV Have Any Impact on MongoDB Performance?

It is generally known and established that different database versions provide different sets of features and performance rates. Along with the binary versions, MongoDB has a property called Feature Compatibility Value, which allows administrators to run a binary version without enabling any features that are incompatible with the previous one. This property is generally helpful […]

April 22, 2025

Multi-Grained Specifications for Distributed System Model Checking and Verification

This EuroSys 2025 paper wrestles with the messy interface between formal specification and implementation reality in distributed systems. The case study is ZooKeeper. The trouble with verifying something big like ZooKeeper is that the spec and the code don’t match. Spec wants to be succinct and abstract; code has to be performant and dirty. 

For instance, a spec might say, “this happens atomically.” But the actual system says, “yeah, buddy, right.” Take the case of FollowerProcessNEWLEADER: the spec bundles updating the epoch and accepting the leader’s history into a single atomic step. But in the real system, those steps are split across threads and separated by queuing, I/O, and asynchronous execution. Modeling them as atomic would miss real, observable intermediate states, and real bugs.

To bridge this model-code gap, the authors use modularization and targeted abstraction. Don’t write one spec, write multi-grained specs. Different parts of the system are modeled at different granularities, depending on what you're verifying. Some parts are modeled in fine detail; others are coarse and blurry. Like reading a novel and skimming the boring parts. These modules, written at different levels of abstraction, are composed into a mixed-grained spec, tailored to the verification task at hand.

Central to the method is the interaction-preserving principle, a key idea borrowed from a 2022 paper by some of the same authors. Coarsening is allowed, but only if you preserve interaction variables--shared state that crosses module boundaries. To abstract away things, modules can internally lie to themselves, but not to each other. Other modules should not be able to distinguish whether they are interacting with the original or a coarsened module. The paper formalizes this using dependency relations over actions and their enabling conditions. This sounds a lot like rely-guarantee, but encoded for TLA+.

What makes this paper stand out is its application. This is a full-on, hard-core TLA+ verification effort against a real-world messy codebase. It’s a good guide for how to do formal work practically, without having to rewrite the system in a verification-oriented language. This paper is not quite a tutorial as it skips a lot of the details, but the engineering is real, and the bugs are real too.


Technical details

The system decomposition follows the Zab protocol phases: Election, Discovery, Synchronization, Broadcast. This is a clever hack. Rather than inventing new module boundaries (say through roles or layers), the authors simply split along the natural phase boundaries in the protocol. The next-state action in the TLA+ spec is already a disjunction across phase actions; this decomposition just formalizes and exploits that.

They focus verification on log replication involving the Synchronization and Broadcast phases, while leaving Election and Discovery mostly coarse. They say that's where the bugs are. They also mention that the leader election is a tangle of nondeterminism and votes-in-flight, and models take days to check. Log replication, by contrast, is where local concurrency hits hard, and where many past bugs and failed fixes appear.

Table 1 summarizes the spec configurations: baseline (system spec), coarsened, and fine-grained. The fine-grained specs are where the action is: they capture atomicity violations, thread interleavings, and missing transitions. These bugs don’t show up when model checking with the baseline spec. It is all about how deep granularity you are willing to go for capturing implementation. Targeted fine-granularity checking really helps keep this practical.

Table 5 shows model performance. Coarse specs are fast but miss bugs. Fine specs find bugs but blow up. Mixed specs get the best of both. For instance, mSpec-3 finds deep bugs in seconds, while the full system spec doesn’t terminate in 24 hours. The bottleneck is the leader election spec, which proves again that targeted coarsening pays when it preserves interaction.

Bug detection is driven by model checking with TLC. Violations are found in model traces when TLC reports a violation of an invariant. The violations found are then confirmed via code-level deterministic replay. The authors built a system (Remix) to instrument ZooKeeper, inject RPC hooks via AspectJ, and replay TLA-level traces using a centralized coordinator that schedules and controls the interleaving of code-level actions using the developer-provided mapping from model actions to code events. This deterministic replay allows them to validate that the model-level bug can actually manifest in real runs.

Figure 8 ties back to the core atomicity problem in FollowerProcessNEWLEADER. Many of these bugs arise when the update of epoch and the logging of history are not properly sequenced or observed. The spec treats them as one unit, but the implementation spreads them across threads and queues. By splitting the spec action into three, the authors model each thread’s contribution separately, and capturing the inter-thread handoff. With this, they catch data loss, inconsistency, and state corruption bugs that had previously escaped detection, or worse, had been "fixed" and re-broken.

The authors didn’t stop at finding bugs. They submitted fixes upstream to ZooKeeper, validating them with the same fine-grained specifications. To fix existing bugs and also make it easy to implement correctly, they removed the atomicity requirement of the two updates from the Zab protocol but require their order: the follower updates its history before updating its epoch. The patched version passed their model checks, and the PR was merged. This tight loop (spec, bug, fix, verify) is what makes this method stand out: formal methods not as theory, but as a workflow.


Conclusions

The results back the thesis: fine-grained modeling is essential to catch real bugs. Coarsening is essential to make model checking scale. Modularity and compositionality is a feasible way to manage verification of a complex, concurrent, evolving system. It's not push-button verification. But it's doable and useful.

The work opens several directions. Could this technique support test generation from the mixed model? If you already have deterministic replay and instrumentation hooks, generating inputs for fault-injection testing seems within reach.

More speculatively, are there heuristics to suggest good modular cuts? The current modularization is protocol-phase aligned, but could role-based, data-centric, or thread-boundary modularization give better results? That seems like a good area for exploration.

Behind the Scenes: How Percona Support Diagnosed a MongoDB FTDC Freeze

One of our customers recently reported that MongoDB’s diagnostic metrics—collected via FTDC (Full-Time Diagnostic Data Capture)—had stopped updating. As a result, no metrics were being collected, either through diagnostic data files or the getDiagnosticData command. Key metrics such as uptime were no longer progressing. While the cluster remained fully operational, the lack of observability introduced a […]

April 21, 2025

Speeding Up Percona XtraDB Cluster State Transfers with Kubernetes Volume Snapshots

When using the Percona Operator for MySQL based on Percona XtraDB Cluster (PXC), it’s common to encounter scenarios where cluster nodes request a full State Snapshot Transfer (SST) when rejoining the cluster. One typical scenario where a State Snapshot Transfer (SST) is required is when a node has been offline long enough that the GCache […]