March 13, 2025
March 12, 2025
HAProxy-Patroni Setup Using Health Check Endpoints and Debugging
March 11, 2025
Ready for the Enterprise: New Solution Updates from Percona
Announcing PlanetScale Metal
PlanetScale Metal: There’s no replacement for displacement
Upgrading Query Insights to Metal
March 10, 2025
MongoDB to Convex: A Step-by-Step Data Migration Script
Extending MySQL 5.7: Percona Releases Post-End-of-Life Source Code to the Community
Publish and Perish: Why Ponder Stibbons Left the Ivory Tower
(With apologies to Terry Pratchett)
Ponder Stibbons sat in the shadowy recesses of the Uncommon Room, clutching a lukewarm cup of tea and the last vestiges of his patience. Across from him, the Dean, wrapped in his usual self-satisfaction, puffed his pipe and surveyed the room as if it were all a great and glorious joke that he alone understood.
"Leaving?" the Dean spluttered, when Ponder finally managed to slide the conversation in that direction. "What do you mean, leaving? Where would you even go?"
"Industry," said Ponder, trying not to make it sound like a curse. "Databases. Big ones."
The Dean blinked. "But we have databases here. The Archchancellor’s hat alone contains centuries of magical indexing..."
"Yes, but in the real world, we use them to actually store and retrieve information, not just argue about what information is worth storing in the first place," Ponder snapped. "And I’ll be paid properly."
"Pah," scoffed the Dean. "The true academic mind isn’t in it for the money."
"Yes, I noticed that," Ponder said. "But, funnily enough, the people running the place seem to be. We have more suits than I can count, and none of them seem to know how to count. We used to do research, but now we do ‘strategic impact planning’ and ‘value-oriented magical development initiatives’."
The Dean made a valiant effort at looking wise. "Well, you have to admit, Stibbons, there’s no point in just doing magic for magic’s sake."
"Then what’s the point of this place?" Ponder snapped. "If we’re not here to push the boundaries of knowledge, then we’re just a very exclusive, very inefficient vocational school for people too dim-witted to teach themselves. The students don’t want to learn magic, they want to finish a course of study in three semesters so they can get a cushy post enchanting chairs for the Patrician’s office."
The Dean waved a hand vaguely. "That’s always been the case."
"It wasn’t always quite this blatant," Ponder retorted. "Back in the day, we had time to actually think about things, research things, discuss things. Now it’s all quotas, reviews, assessments! And if the students don’t like you, they file complaints! Whining course reviews! As if the study of magic were meant to be… comfortable! We used to throw fireballs at people! Now we have to ‘create a nurturing environment for knowledge acquisition’."
"Ah," said the Dean, puffing his pipe. "Well, times change."
"Not here, they don’t," said Ponder bitterly. "That’s the whole problem. The world outside moves forward while we shuffle our papers and complain that we don’t have enough funding to investigate basic transmutation spells."
The Dean chuckled. "You always were an idealist, Stibbons. Research takes time. Why, I myself have been preparing to write a paper on the comparative metaphysical mass of a thaum for—"
"Twenty years!" snapped Ponder. "And you’ve produced nothing!"
"Ah, but when I do…" The Dean waggled his fingers mysteriously.
"And even if you did, it wouldn’t get published," Ponder continued. "Do you know what happens to papers these days? You send them off, wait a year, and then get them back covered in cryptic, punishing comments from anonymous reviewers who seem to hate the very idea of scholarship."
"Ah yes, peer review!" said the Dean enthusiastically. "Ensuring the highest quality!"
"Ensuring nobody gets anything done!" Ponder snapped. "It’s not review, it’s ritualized academic hazing. Half the time the reviewers are just competitors trying to torpedo your work so their own gets published first. It’s like trying to hold a debate where the other side gets to set your chair on fire before you start speaking."
"Ah, well," the Dean said philosophically. "At least we all suffer equally."
"No, we don’t!" Ponder threw up his hands. "It’s a zero-sum game! Everyone hoards their ideas because collaboration just means giving someone else ammunition to shoot down your next grant proposal! We’re supposed to be discovering new knowledge, and instead we spend all our time writing meticulously worded rebuttals to complaints about our font choice."
"Well, I suppose that’s just how academia works," the Dean said, smiling in what he clearly thought was a reassuring way.
Ponder sighed. "You know what’s coming, don’t you? The golems are already doing spellwork. Automated spell matrices. Soon, students won’t even need to be here. They’ll just buy the knowledge in convenient pre-packaged modules."
"Nonsense," the Dean huffed. "You can’t replace a wizard with a machine."
"That’s exactly what I thought," said Ponder. "Until I have seen them in the wild. It turned out you very well can."
The Dean froze. "What?"
"Oh yes. A spell-engine that automates theorem derivation and spell stabilization. Does in minutes what takes us years. Accurate, tireless, and best of all, doesn’t insist on being called ‘Professor’."
The Dean was pale. "But… but the prestige, Stibbons! The robes! The titles! The long lunches!"
"Industry has long lunches too," said Ponder. "And they reimburse you in two days instead of two months."
The Dean looked positively ill. "And you’re going to… what? Spend your days solving real problems? Making a tangible difference?"
"Yes."
"But… how will you cope?" The Dean’s voice was faint. "Without the endless meetings? The unread grant proposals? The departmental infighting over whose name goes first on a paper no one will read?"
Ponder stood, dusted off his robes, and picked up his satchel. "I think I’ll manage."
March 09, 2025
Comparing Execution Plans: MongoDB vs. Compatible APIs
MongoDB is the standard API for document databases, and some cloud providers have created services with a similar API. AWS and Azure call theirs 'DocumentDB', and Oracle provides the MongoDB API as a proxy on top of its SQL database. These databases offer a subset of features from past versions of MongoDB, but user experience and performance are also crucial.
Oracle claims better performance, but their tests lack real-world queries. I will utilize their "autonomous" managed service to compare execution plans and identify which minimizes unnecessary reads in a common OLTP use case - where clause and pagination.
TL;DR: MongoDB has better performance, and more indexing possibilities.
Document Model (Order - Order Detail)
I used a simple schema of orders and order lines, ideal for a document model. I illustrated it with UML notation to distinguish strong and weak entities, representing the association as a composition (⬧-).
In a SQL database, a one-to-many relationship between orders and line items requires two tables because the the first normal form. In MongoDB, a composition relationship allows the weak entity (Order Detail) to be embedded within the strong entity (Order) as an array, simplifying data management and enhancing performance, as we will see when indexing it.
I will insert orders with few attributes. The country and creation date are fields in Order. The line number, product, and quantity are fields of Detail, which is an embedded array in Order:
+--------------------------+
| Order |
+--------------------------+
| country_id: Number |
| created_at: Date |
| details: Array |
| +----------------------+ |
| | Detail | |
| +----------------------+ |
| | line: Number | |
| | product_id: Number | |
| | quantity: Number | |
| +----------------------+ |
+--------------------------+
Sample Data
I generated one million documents for my example. I'll focus on predictable metrics like the number of documents examined rather than execution time, so that it can be easily reproduced with a small dataset. To simulate products with fluctuating popularity, I use a randomized logarithmic value to create product IDs:
const bulkOps = [];
for (let i = 0; i < 1000000; i++) {
const orderDetails = [];
for (let line = 1; line <= 10; line++) {
orderDetails.push({
line: line,
product_id: Math.floor(Math.log2(1 + i * Math.random())),
quantity: Math.floor(100 * Math.random()),
});
}
bulkOps.push({
insertOne: {
document: {
country_id: Math.floor(10 * Math.random()),
created_at: new Date(),
order_details: orderDetails
}
}
});
}
db.orders.bulkWrite(bulkOps).insertedCount;
Access Pattern and ESR Index
Users seek insights into product usage through a query for the most recent orders that include a specific product, in a specific country. Following the ESR rule, I created an index with equality fields in front of the key, followed by the fields for ordering results.
db.orders.createIndex( {
"country_id": 1,
"order_details.product_id": 1,
"created_at": -1
});
Query the 10 last orders for a product / country
I queried the ten last orders in country 1 including product 5:
print(
db.orders.find({
country_id: 1,
order_details: { $elemMatch: { product_id: 5 } }
}).sort({ created_at: -1 }).limit(10)
);
The user can analyze this data to understand the last orders for this product.
Execution plan for MongoDB API on Oracle Database
I query the execution plan for this query:
print(
db.orders.find( {
country_id: 1,
order_details: { $elemMatch: { product_id: 5 } }
}).sort({ created_at: -1 }).limit(10)
.explain("executionStats")
);
When using the "MongoDB API for Oracle Database," the query is re-written to SQL since the collection resides in SQL tables with OSON data type, employing internal functions to simulate MongoDB BSON document. The explain() method reveals the executed queries:
This is interresting to understand how storing JSON in SQL databases is different from using MongoDB and requires an emulation layer that generates complex non-standard SQL. Unfortunately, this does not show execution statistics.
To gain more insights, I gathered the SQL statement from V$SQL and ran it with the MONITOR hint to generate a SQL Monitor report:
select /*+ FIRST_ROWS(10) MONITOR */ "DATA",rawtohex("RESID"),"ETAG"
from "ORA"."orders"
where JSON_EXISTS("DATA"
,'$?( (@.country_id.numberOnly() == $B0) &&
( exists(@.order_details[*]?( (@.product_id.numberOnly() == $B1) )) ) )' passing 1 as "B0", 5 as "B1" type(strict))
order by JSON_QUERY("DATA", '$.created_at[*].max()') desc nulls last
fetch next 10 rows only
;
Here is the SQL Monitor report:
- 276 rows have been read from the index (INDEX RANGE SCAN). The access predicates are internal virtual columns and undocumented functions to apply the equality conditions:
"orders"."SYS_NC00005$" = SYS_CONS_ANY_SCALAR(1, 3) AND "orders"."SYS_NC00006$" = SYS_CONS_ANY_SCALAR(5, 3). - The index entries go though a deduplication step (HASH UNIQUE).
- 276 documents are fetched from the SQL table (TABLE ACCESS BY ROWID).
- They are finally sorted for Top-k (SORT ORDER BY STOPKEY) to return 31 documents, from which 10 are fetched to provide the result.
More operations occur to transform this result into MongoDB-compatible documents, but this happens on 10 documents as it occurs after the limit (COUNT STOPKEY). What is more problematic is what happens before, unnecessary work: read, hash, and sort the rows that do not participate to the result. It is 276 instead of 10 in this small example, but can be more in a larger database. The SORT ORDER BY is a blocking operation that must read all rows before being able to output one.
Oracle Database's index usage does not adhere to the MongoDB ESR (Equality, Sort, Range) rule. It only utilized index scans for the equality predicates, country_id and product_id, rendering the created_at field ineffective and failing to prevent a blocking sort operation. This occurs on 276 rows in this small example, but it can impacts more in a production database.
Since the index is only part of the filtering process, the execution plan may switch to another index. For example, an index starting with created_at could assist with sort().limit(), but may read too many countries and products.
The result from Oracle Database is compatible with MongoDB, but the performance and scalability is not.
Execution plan for MongoDB
Here is the execution plan on a real MongoDB database - I've run it on an Atlas free cluster:
db> print(
... db.orders.find( {
... country_id: 1,
... order_details: { $elemMatch: { product_id: 5 } }
... }).sort({ created_at: -1 }).limit(10)
... .explain("executionStats").executionStats
... );
{
executionSuccess: true,
nReturned: 10,
executionTimeMillis: 0,
totalKeysExamined: 10,
totalDocsExamined: 10,
executionStages: {
isCached: false,
stage: 'LIMIT',
nReturned: 10,
executionTimeMillisEstimate: 0,
works: 11,
advanced: 10,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 1,
limitAmount: 10,
inputStage: {
stage: 'FETCH',
filter: {
order_details: { '$elemMatch': { product_id: { '$eq': 5 } } }
},
nReturned: 10,
executionTimeMillisEstimate: 0,
works: 10,
advanced: 10,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
docsExamined: 10,
alreadyHasObj: 0,
inputStage: {
stage: 'IXSCAN',
nReturned: 10,
executionTimeMillisEstimate: 0,
works: 10,
advanced: 10,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
keyPattern: {
country_id: 1,
'order_details.product_id': 1,
created_at: -1
},
indexName: 'country_id_1_order_details.product_id_1_created_at_-1',
isMultiKey: true,
multiKeyPaths: {
country_id: [],
'order_details.product_id': [ 'order_details' ],
created_at: []
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
country_id: [ '[1, 1]' ],
'order_details.product_id': [ '[5, 5]' ],
created_at: [ '[MaxKey, MinKey]' ]
},
keysExamined: 10,
seeks: 1,
dupsTested: 10,
dupsDropped: 0
}
}
}
}
Here MongoDB didn't read more rows than necessary:
- Index scan (
stage: 'IXSCAN') with a single access (seeks: 1) to the values of the equality condition. - Read only the ten index entries (
keysExamined: 10) needed for the result. - No sort operation, the ten documents (nReturned: 10) are read (stage: 'FETCH') sorted on the index key.
This is summarized by:
executionSuccess: true,
nReturned: 10,
totalKeysExamined: 10,
totalDocsExamined: 10,
When the number of keys examined matches the number of documents returned, it indicates optimal execution with no unnecessary operations.
This alignment ensures efficiency in processing, as all examined keys are relevant to the returned documents.
You can also look at the visual execution plan in MongoDB Compass:
Using document data modeling and MongoDB indexes allows you to access only the necessary data, ensuring that query costs are directly related to the results obtained.
Conclusion on Documents (vs. relational) and MongoDB (vs. emulations)
In a SQL database, the Orders - Order Details example requires two tables and a join to filter results. The join itself may not be too expensive, but SQL databases lack multi-table indexes. They do unnecessary work reading and joining rows that will be discarded later.
The document model, with embedded entities, allows for comprehensive indexing, offering optimal access unlike normalized tables in relational databases. MongoDB shines with indexes that follow Equality, Sort, and Range, and they can cover documents and sub-documents, with multiple keys per document.
While some SQL databases have copied the MongoDB API to provide better developer experience to those who are not SQL experts, they do not gain the same benefits as MongoDB, provide fewer indexing possibilities, and incur additional operations when executing queries.
March 07, 2025
Long-term backup options for Amazon RDS and Amazon Aurora
How to run load tests in real-time data systems
Dedicated Poolers
March 06, 2025
Authentication Best Practices: Convex, Clerk and Next.js
Simplify Valkey Adoption with Percona Support and Services
To B or not to B: B-Trees with Optimistic Lock Coupling
To B or not to B: B-Trees with Optimistic Lock Coupling
B-Trees are one of the few specialized data structures which truly stood the test of time, being over 50 years old. They were invented in year zero of unix time, on a system with a whopping 7.25MB of disk storage — Not DRAM! They are only one month younger than the idea of relational databases itself and witnessed the birth of multi-core architectures, complex cache hierarchies, fast network attached storage and many more groundbreaking changes. However, while most technologies would show their age and can comfortably retire, B-Trees are still ubiquitous for data storage.