June 09, 2026
Percona Operator for MySQL (PXC) 1.20.0: Automatic Storage Resizing, TLS Certificate Rotation, and ARM64 Support
Percona Operator for MySQL PXC 1.20.0 is out today, and it addresses three long-requested operational headaches: storage that grows on its own before it fills up, TLS certificates that rotate without cluster downtime, and images that run natively on ARM64. Disk-full incidents on PXC clusters often arrive at 2 AM when monitoring alerts fire, and … Continued
The post Percona Operator for MySQL (PXC) 1.20.0: Automatic Storage Resizing, TLS Certificate Rotation, and ARM64 Support appeared first on Percona.
RUM—Storing More in the Index
This series of posts traces the evolution from GIN to RUM to Extended RUM, showing how a single architectural idea—store more in the index to do less work at query time—unlocks major performance improvements at each step.
RUM was created by Alexander Korotkov, Oleg Bartunov, and Teodor Sigaev at Postgres Professional. It started as a direct response to the GIN limitations that users kept hitting in production — particularly in full-text search scenarios where ranking and ordering dominate query time.
The name is a nod: GIN (the drink) → RUM (a stronger drink). The index is stronger, too.
RUM's innovation is exactly one architectural change: each entry in a posting list can carry an additional datum alongside the TID:
GIN posting list entry: [TID]
RUM posting list entry: [TID, addInfo]
A 2016 thread on pgsql-general captures the exact pain point. Andreas Krogh was building a web-based email system — millions of messages, users expecting millisecond response — and wanted a single index that could:
- Match full-text search terms (
fts_all @@ to_tsquery(...)) - Filter by folder (
folder_id = ANY(ARRAY[2,3])) - Sort by timestamp (
ORDER BY sent DESC) - Stop early (
LIMIT 101)
With GIN, steps 1 and 2 worked (via btree_gin), but step 3 always required a separate Sort node — the index couldn't deliver results in timestamp order. The planner had to scan all matching TIDs, fetch every heap tuple, and sort them. For a mailbox with 100K matches, that's unacceptable.
Oleg Bartunov's response was direct: "We are [working] hard on our internal version of rum." RUM was designed to solve exactly this class of problem.
RUM extends the GIN framework with an order_by_attach option:
CREATE INDEX idx ON documents USING rum (tsv rum_tsvector_addon_ops, created_at)
WITH (attach = 'created_at', to = 'tsv', order_by_attach = true);
This tells RUM: "for every TID in the tsv posting list, also store the corresponding created_at value." The posting list entries become:
"postgresql" → [(0,1, 2024-01-15), (3,7, 2024-02-20), (5,2, 2024-03-01)]
Now the posting list is sorted by addInfo (the timestamp), not by the physical order of TIDs.
What RUM Enables Over GIN:
- Ordered results without sorting. The index walks the posting list in
addInfoorder and stops after N entries. No Sort node, no full scan. - Distance-based ordering with
<=>. RUM introduces distance operators (<=>,<=|,|=>). The<=>computesABS(a - b)— nearest-first retrieval. The<=|and|=>variants restrict to one direction. For full-text ranking, RUM's<=>operator has a built-in ranking function that combinests_rankandts_rank_cdsemantics and handles OR queries better than either function alone. - Depth-first traversal: first results immediately. Unlike GIN's bitmap approach (collect all TIDs, then access the heap), RUM performs a depth-first traversal. It can return first results immediately — critical for
LIMITqueries. - Phrase search without recheck. RUM's
rum_tsvector_opsstores word positions asaddInfo. The index verifies phrase adjacency during the scan — no heap rechecks needed.
Here are the operator classes:
-
rum_tsvector_opsstores lexemes with positional information, supports ordering (<=>) by relevance -
rum_tsvector_hash_opsstores hashed lexemes with positions, supports ordering (<=>) with no prefix search -
rum_tsvector_addon_opsstores lexemes + any attached column, supports ordering (<=>) on attached column -
rum_anyarray_opsstores array elements + array length, supports ordering (<=>) by similarity -
rum_anyarray_addon_opsstores array elements + attached column, supports ordering (<=>) on attached column -
rum_timestamp_opsstores scalar values, supports ordering (<=>)
Using the same articles table from the previous post, I add a RUM index that adds the published timestamp to tsv:
postgres=# CREATE EXTENSION IF NOT EXISTS rum;
postgres=# CREATE INDEX idx_rum_addon ON articles
USING rum (tsv rum_tsvector_addon_ops, published)
WITH (attach = 'published', to = 'tsv');
I searched for the five most recent articles by publication date using the same text query as in the previous post.
postgres=# EXPLAIN (COSTS OFF, ANALYZE, BUFFERS, VERBOSE)
SELECT id, title, published,
published <=> '2020-06-01'::timestamp AS distance
FROM articles
WHERE tsv @@ to_tsquery('simple', 'postgresql & article')
ORDER BY published <=> '2020-06-01'::timestamp
LIMIT 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (actual time=89.101..89.552 rows=5 loops=1)
Output: id, title, published, ((published <=> '2020-06-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=1233 read=1514, temp read=1303 written=1303
-> Index Scan using idx_rum_addon on documentdb_core.articles (actual time=89.097..89.542 rows=5 loops=1)
Output: id, title, published, (published <=> '2020-06-01 00:00:00'::timestamp without time zone)
Index Cond: (articles.tsv @@ '''postgresql'' & ''article'''::tsquery)
Order By: (articles.published <=> '2020-06-01 00:00:00'::timestamp without time zone)
Buffers: shared hit=1233 read=1514, temp read=1303 written=1303
Planning:
Buffers: shared hit=3
Planning Time: 0.401 ms
Execution Time: 89.747 ms
(12 rows)
No Sort node. No Bitmap. A true Index Scan with Order By pushed into the index. Stops after 5 results.
I add a RUM index on tsv that adds the position, for phrase search without recheck:
postgres=# CREATE INDEX idx_rum_pos ON articles USING rum (tsv rum_tsvector_ops);
postgres=# EXPLAIN (COSTS OFF, ANALYZE, BUFFERS, VERBOSE)
SELECT id, title
FROM articles
WHERE tsv @@ to_tsquery('simple', 'postgresql <-> article')
LIMIT 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (actual time=116.196..116.198 rows=0 loops=1)
Output: id, title
Buffers: shared hit=661
-> Index Scan using idx_rum_pos on documentdb_core.articles (actual time=116.192..116.193 rows=0 loops=1)
Output: id, title
Index Cond: (articles.tsv @@ '''postgresql'' <-> ''article'''::tsquery)
Buffers: shared hit=661
Planning:
Buffers: shared hit=3
Planning Time: 0.223 ms
Execution Time: 117.838 ms
(11 rows)
No "Rows Removed by Index Recheck". Positions stored in the index enable direct verification of adjacency. The same idea applies beyond full‑text search, for example with JSONB, where RUM can store element positions to reduce rechecks and improve performance.
The same index is used for relevance ranking:
postgres=# EXPLAIN (COSTS OFF, ANALYZE, BUFFERS, VERBOSE)
SELECT id, title,
tsv <=> to_tsquery('english', 'postgresql | optimization') AS rank
FROM articles
WHERE tsv @@ to_tsquery('simple', 'postgresql | article')
ORDER BY tsv <=> to_tsquery('english', 'postgresql | optimization')
LIMIT 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (actual time=543.958..544.006 rows=10 loops=1)
Output: id, title, ((tsv <=> '''postgresql'' | ''optim'''::tsquery))
Buffers: shared hit=783, temp read=13289 written=15244
-> Index Scan using idx_rum_pos on documentdb_core.articles (actual time=543.954..543.977 rows=10 loops=1)
Output: id, title, (tsv <=> '''postgresql'' | ''optim'''::tsquery)
Index Cond: (articles.tsv @@ '''postgresql'' | ''article'''::tsquery)
Order By: (articles.tsv <=> '''postgresql'' | ''optim'''::tsquery)
Buffers: shared hit=783, temp read=13289 written=15244
Planning:
Buffers: shared hit=3
Planning Time: 0.198 ms
Execution Time: 548.884 ms
(12 rows)
Ranking is computed directly during the index scan. No Sort node, no heap access for computing the score, enabling early termination with LIMIT. This contrasts with GIN, where ranking is computed after fetching all matching rows from the heap and sorting them.
Early versions of RUM allowed multi-column indexes, but only the main inverted column participated in the ordered scan (2017 thread). Additional columns were applied as post-filters, which limited the benefit of LIMIT queries. Currently filter columns can participate in the index scan itself, allowing pruning during traversal of the ordered posting list — but ordering is still driven by a single attached column. Here is an example:
postgres=# CREATE INDEX idx_rum_multi ON articles
USING rum (tsv rum_tsvector_addon_ops, category, published)
WITH (attach = 'published', to = 'tsv');
postgres=# EXPLAIN (COSTS OFF, ANALYZE, BUFFERS, VERBOSE)
SELECT id, title, published
FROM articles
WHERE tsv @@ to_tsquery('english', 'postgresql')
AND category = 'tech'
ORDER BY published <=> '2020-06-01'::timestamp
LIMIT 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (actual time=50.212..51.269 rows=5 loops=1)
Output: id, title, published, ((published <=> '2020-06-01 00:00:00'::timestamp without time zone))
Buffers: shared hit=117 read=112, temp read=550 written=550
-> Index Scan using idx_rum_multi on documentdb_core.articles (actual time=50.209..51.258 rows=5 loops=1)
Output: id, title, published, (published <=> '2020-06-01 00:00:00'::timestamp without time zone)
Index Cond: ((articles.tsv @@ '''postgresql'''::tsquery) AND (articles.category = 'tech'::text))
Order By: (articles.published <=> '2020-06-01 00:00:00'::timestamp without time zone)
Buffers: shared hit=117 read=112, temp read=550 written=550
Planning:
Buffers: shared hit=37 read=9 dirtied=3
Planning Time: 7.805 ms
Execution Time: 51.667 ms
(12 rows)
Here, category = 'tech' is in Index Cond, not a Filter, so pre-filtering on multiple columns is possible. RUM’s <=> operator uses a ranking model (ts_score) that combines ts_rank and ts_rank_cd, addressing their respective limitations (logical operators and OR queries).
The RUM Data Structure includes addInfo and may be sorted on it instead of TID:
RUM Posting List (leaf page):
┌─────────────────────────────────────────────┐
│ [TID₁, addInfo₁] [TID₂, addInfo₂] ... │
│ sorted by addInfo (when order_by_attach=on) │
│ or by TID (when order_by_attach=off) │
└─────────────────────────────────────────────┘
One drawback of the RUM index is that it has slower build and insert times than GIN. This is because RUM stores additional information alongside keys and utilizes generic Write-Ahead Logging (WAL) records, which can significantly increase WAL volume compared to GIN. It fits workloads where data is written once, like append-only or event-driven use cases, and queried many times later. Unlike GIN, RUM does not implement a pending list for fast updates, which contributes to its higher write and build costs. RUM performs best when dealing with highly repetitive keys, such as those found in natural language text, or denormalized documents. However, for high-cardinality unique keys, like UUIDs, the inverted index structure offers little advantage over a B-tree.
The next post will cover Extended RUM, which enables additional indexing and ordering capabilities in DocumentDB.
GIN: Understanding PostgreSQL's Inverted Index and Its Limitations
This series of posts traces the evolution from GIN to RUM to Extended RUM, showing how a single architectural idea—to store more in the index to do less work at query time—unlocks major performance improvements at each step.
Generalized Inverted Index (GIN) is PostgreSQL's answer to the question "how do I index composite values?" — including arrays, JSONB documents, tsvector, and any data type that decomposes into multiple keys per row.
The structure is conceptually simple:
Entry Tree (B-tree of keys)
├── "alice" → Posting List: [(0,1), (3,7), (5,2)]
├── "bob" → Posting List: [(0,2), (4,1)]
├── "carol" → Posting Tree (when list becomes too large for in-line storage)"
│ └── B-tree of ItemPointers
└── ...
For each indexed value, GIN calls extractValue to decompose it into keys. Each key gets an entry in the B-tree. Each entry points to a posting list — a sorted array of heap TIDs (tuple identifiers) that contain that key.
At query time, GIN calls extractQuery to decompose the search condition into keys, then intersects or unions the posting lists using a consistent function.
What GIN Gets Right:
- Space efficiency: posting lists are compressed (variable-byte encoding), and duplicate keys are stored once.
- Fast containment queries:
@>,<@,@@— any "does this value contain these keys?" pattern is natural. - Extensible operator classes: anyone can plug in new data types via
extractValue/extractQuery/consistent.
Limitations of GIN:
- Posting lists lack ordering info. GIN's lists only contain TIDs sorted by physical location, with no space for extra data.
- Cannot perform ordered scans. GIN doesn’t support
ORDER BYpushdown, and queries with sorting require an explicit sort after the index scan. - No extra payload per TID. Posting lists hold only 6-byte TIDs, without timestamps, scores, or additional info.
- Only bitmap scans are supported, not true index scans. GIN generates and processes bitmaps of matching TIDs.
Implications:
- Ranking needs heap access. For ranked searches like
ts_rank(), GIN finds TIDs but lacks term frequency or position info, so it must fetch and sort matching heap tuples. - Phrase searches need reprocessing. GIN can locate tuples containing specific words, but it cannot confirm if these words are adjacent based only on the index. It identifies possible matches and then rechecks each heap tuple to ensure the words are in sequence. As clarified by Artur Zakirov (Postgres Professional), "GIN supports it, but it requires additional bitmap heap scan and so it is slower."
- No
LIMITpushdown — must scan the entire posting list even for a small number of results. - Cursor-based iteration isn't supported.
I created the following tables that I'll use for the examples in this blog post and the next ones:
CREATE EXTENSION IF NOT EXISTS btree_gin;
DROP TABLE IF EXISTS articles;
CREATE TABLE articles (
id serial PRIMARY KEY,
title text NOT NULL,
body text NOT NULL,
category text NOT NULL,
published timestamp NOT NULL,
score int NOT NULL,
tsv tsvector GENERATED ALWAYS AS (
to_tsvector('simple', title || ' ' || body)
) STORED
);
INSERT INTO articles (title, body, category, published, score)
SELECT
'Article about ' || words[1 + (i % 10)] || ' and ' || words[1 + ((i*7) % 10)],
'This is the body discussing ' || words[1 + ((i*3) % 10)] ||
' in the context of ' || words[1 + ((i*11) % 10)] ||
'. We also mention ' || words[1 + ((i*13) % 10)] || ' here.',
categories[1 + (i % 4)],
'2020-01-01'::timestamp + (i || ' hours')::interval,
(i * 17) % 100
FROM generate_series(1, 1e6) AS s(i),
LATERAL (SELECT ARRAY['postgresql','indexing','performance','database',
'query','optimization','storage','replication',
'vacuum','analytics'] AS words) w,
LATERAL (SELECT ARRAY['tech','science','blog','news'] AS categories) c;
ANALYZE articles;
CREATE INDEX idx_gin_tsv ON articles USING gin (tsv);
Here is a Full-Text Search — Bitmap Only:
postgres=# EXPLAIN (COSTS OFF, ANALYZE, BUFFERS)
SELECT id, title, score
FROM articles
WHERE tsv @@ to_tsquery('simple', 'postgresql & article')
LIMIT 10
;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (actual time=24.049..24.104 rows=10 loops=1)
Buffers: shared hit=299 read=6
-> Bitmap Heap Scan on articles (actual time=24.045..24.087 rows=10 loops=1)
Recheck Cond: (tsv @@ '''postgresql'' & ''article'''::tsquery)
Heap Blocks: exact=6
Buffers: shared hit=299 read=6
-> Bitmap Index Scan on idx_gin_tsv (actual time=16.165..16.165 rows=100000 loops=1)
Index Cond: (tsv @@ '''postgresql'' & ''article'''::tsquery)
Buffers: shared hit=299
Planning:
Buffers: shared hit=33 read=2 dirtied=7
Planning Time: 0.314 ms
Execution Time: 24.138 ms
(13 rows)
GIN uses Bitmap Index Scan → Bitmap Heap Scan. It never produces a true Index Scan. Even with LIMIT 10, it builds the full bitmap first on one million rows before discarding most of them to return only ten.
An ORDER BY requires a Sort operation:
postgres=# EXPLAIN (COSTS OFF, ANALYZE)
SELECT id, title, published
FROM articles
WHERE tsv @@ to_tsquery('simple', 'postgresql & article')
ORDER BY published DESC
LIMIT 5;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (actual time=1838.940..1853.064 rows=5 loops=1)
-> Gather Merge (actual time=1838.937..1853.054 rows=5 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (actual time=1812.430..1812.435 rows=4 loops=3)
Sort Key: published DESC
Sort Method: top-N heapsort Memory: 26kB
Worker 0: Sort Method: top-N heapsort Memory: 26kB
Worker 1: Sort Method: top-N heapsort Memory: 26kB
-> Parallel Bitmap Heap Scan on articles (actual time=113.247..1784.517 rows=33333 loops=3)
Recheck Cond: (tsv @@ '''postgresql'' & ''article'''::tsquery)
Heap Blocks: exact=19968
-> Bitmap Index Scan on idx_gin_tsv (actual time=127.531..127.532 rows=100000 loops=1)
Index Cond: (tsv @@ '''postgresql'' & ''article'''::tsquery)
Planning Time: 1.290 ms
Execution Time: 1853.445 ms
(16 rows)
The Sort node is unavoidable. GIN collected all matching TIDs, accessed the heap, sorted them, and retained the top 5.
Phrase search with distance — Recheck Required:
postgres=# EXPLAIN (COSTS OFF, ANALYZE)
SELECT id, title
FROM articles
WHERE tsv @@ to_tsquery('simple', 'postgresql <-> article')
LIMIT 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (actual time=357.799..357.803 rows=0 loops=1)
-> Bitmap Heap Scan on articles (actual time=357.795..357.797 rows=0 loops=1)
Recheck Cond: (tsv @@ '''postgresql'' <-> ''article'''::tsquery)
Rows Removed by Index Recheck: 100000
Heap Blocks: exact=55556
-> Bitmap Index Scan on idx_gin_tsv (actual time=48.745..48.746 rows=100000 loops=1)
Index Cond: (tsv @@ '''postgresql'' <-> ''article'''::tsquery)
Planning Time: 0.561 ms
Execution Time: 357.873 ms
(9 rows)
With Rows Removed by Index Recheck, GIN found candidates containing both words but had to access the heap to verify adjacency. No positional information in the index.
What's Next
These three limitations — no ordering, no positions, bitmap-only scans — are exactly what RUM was designed to solve. In the next post of this series, we'll see how a single architectural change (adding addInfo to each posting list entry) eliminates all three.
Writing Code vs. Shipping Code: Productivity Effects Across Generations of AI Coding Tools
The transformative power of LLMs in coding has been irrefutable, and it feels like we are living through a magical computing renaissance. On the socials, we hear impressive numbers of lines of code generated, features delivered, and bugs fixed. But, the macroeconomic indicators seem to be still lagging. Heck, if you talk with an engineering manager, you find that their product shipping dates haven't miraculously compressed by a factor of five, either.
This paper just landed 10 days ago. It is from MIT and Wharton by Mert Demirer, Leon Musolff, and Liyuan Yang. Their study attempts to provide a structured economic model for evaluating actual productivity obtained from AI coding tools. By pairing confidential Microsoft telemetry with the public footprints of over 100,000 GitHub developers (tracking everything from open-source utilities to web app repositories), the authors show significant systemic friction downstream of AI code generation.
Of course, I do my usual skeptical critic of the paper. In this case, this is especially heightened because these are economists peeking into the messy non-linear world of software engineering and trying to impose a "production hierarchy" abstraction onto it. But if we reconsider their analysis from a different perspective, it becomes possible to translate their complex production functions into Amdahl's Law terms, and then we can start doing our own evaluations and draw our own conclusions as I discuss below.
The Monotonic Decay
The core of the paper rests on this monotonic decay argument. The sheer task-level velocity gains we see from AI coding tools start to bleed out as they move up the production hierarchy. The authors break down AI tool adoption into three distinct generational tiers:
- Autocomplete (intelligent text prediction),
- Synchronous (Sync) Agents (interactive, real-time code modifiers like Claude Code or Cursor),
- Asynchronous (Async) Agents (autonomous async agents).
When we look at the task-level velocity of these tools, we see impressive numbers. The paper's abstract claims that autocomplete, interactive sync agents, and autonomous async agents increase overall commit activity by cumulative totals of 40%, 140%, and 180% respectively. But as that work climbs toward an official production milestone, the improvements decay significantly.
For Autocomplete, the +228.2% explosion in raw lines of code bleeds out layer-by-layer until becomes a meager +10.2% increase in actual shipped software releases. For Sync agents, a gigantic +741.3% surge in code syntax reduces down to a modest +20.3% final weekly releases.
My immediate reaction to this vertical hierarchy (Lines -> Files -> Commits -> PRs -> Repositories -> Releases) is skepticism. Treating code production like a neat production line feels superficial. Software engineering is not a linear conveyor belt, as coding is highly nonlinear, and a single commit routinely alters fifty files. However, giving the authors the benefit of doubt and reading onwards, I find that there is still value to this naive abstraction, as it points to human gatekeeping and coordination overhead at higher levels of the CI/CD pipeline. AI can write lines of code instantly, not being bogged down by the code syntax at the lower layers. But as that work climbs toward an official production milestone, the structural constraints of the system and human bottlenecks take over, and the massive improvements at the task level decay down to nearly nothing.
Let's dive deeper on the mathematical modeling behind this. By taking only the performance of Autocomplete into account (because it operates exclusively at the code-writing level), the authors chose parameters that minimized the differences between their model's predictions and actual developer behavior. Through this exercise, they extracted a local Upstream Output Elasticity ($\theta$) of approximately 0.75. In this layered production model, $\theta = 0.75$ acts as a vertical pass-through metric. It means that at any given stage (say, turning raw commits into clean pull requests), 75% of that layer's success leans entirely on the upstream technical assets flowing into it from the layer below, while a remaining fraction represents the local human effort added at that layer. Because they model software development as a vertically sequential aggregation process, that human intervention operates like a compounding efficiency tax. A massive initial code productivity surge at the bottom layer gets relentlessly multiplied by 0.75 over and over again as it attempts to climb the hierarchy, mathematically forcing the steep vertical attenuation we see in the empirical data.
Now, the sync and async agents aren't just typing lines inside an editor. They operate at a level where they directly manage files, commits, and pull requests. This expanded layer coverage allows agentic workflows to drop their productivity contributions closer to the finish line. By short-circuiting the early stages of the vertical decay chain, agents handle the work more efficiently, doubling the final impact on shipped software compared to autocomplete as seen in Figure 1.
Translating the Economics formulas to Amdahl’s Law
To model what happens inside each of these individual layers, the paper transitions to a nested Constant Elasticity of Substitution (CES) production function. Here, they extract an Elasticity of Substitution ($\sigma$) between AI-generated code and human review effort of a rigid 0.25. In economic lingo, an elasticity of substitution well below $1.0$ means the inputs are "strong complements". That means they are tied/dependent together like a car chassis and tires. It doesn't matter if an automated factory line can manufacture tires 10,000% faster; if you don't speed up the production of the chassis, you don't get more cars.
This of course looks a whole lot like Amdahl’s Law, which dictates that the overall speedup of a system is strictly limited by its sequential un-parallelizable bottlenecks:
$S_{\text{total}} = \frac{1}{(1-P) + \frac{P}{S_{\text{task}}}}$
where $S_{\text{task}}$ is the speedup achieved at the automated task level, and $P$ is the "Global Parallelizable Fraction" of the entire system workload.
When the elasticity of substitution ($\sigma$) between machine output and human validation drops to 0.25, the economic model behaves almost exactly like a Leontief production function. (A Leontief production function describes a strict, zero-flexibility production process where inputs must be combined in exact, unalterable proportions, meaning an excess of one ingredient cannot substitute for a shortage of another.) This dictates that human code review is a non-negotiable strictly sequential bottleneck ($1 - P$). If $\sigma$ were infinite, you could completely substitute human verification with raw AI text volume effectively parallelizing the entire layer. But because $\sigma = 0.25$, throwing an infinite mountain of automated code ($S_{\text{task}} \to \infty$) at the problem does nothing to diminish the fixed, sequential time investment required for a human to review it.
If we run the paper's real-world empirical findings through this equation (isolating the global parallel fraction $P$ for commits against final releases), we find the following:
- Autocomplete: $S_{\text{task}} = 1.359\times$, $S_{\text{total}} = 1.102\times \implies \mathbf{P \approx 35.0\%}$
- Sync Agents: $S_{\text{task}} = 2.091\times$, $S_{\text{total}} = 1.202\times \implies \mathbf{P \approx 32.2\%}$
- Async Agents: $S_{\text{task}} = 2.800\times$, $S_{\text{total}} = 1.300\times \implies \mathbf{P \approx 35.9\%}$
Huh! Even going from a simple inline autocomplete tool to fully autonomous agents that clone repositories and run test suites out-of-band, the global parallelizable fraction ($P$) refuses to budge and hovers around 35% across all three generations of AI!
Then how come Sync and Async agents manage to squeeze out 2-3x more final software releases than Autocomplete? While Autocomplete notches a modest 10.2% release expansion, Sync agents push it to 20.3%, and the cumulative stack of Async tools lifts the final output baseline up by 30%. If the global parallelizable envelope ($P$) is locked at 35% from the formulas above, how is the system actually accelerating? This is because according to Amdahl's Law, a system has two entirely separate levers for optimization. You can either increase $P$, or you can aggressively push harder and increase $S_{\text{task}}$. Autocomplete achieved a commit-level speedup ($S_{\text{task}}$) of just $1.359\times$ relative to releases. But Sync agents drive that localized task speedup to $2.091\times$, and Async agents push $S_{\text{task}}$ to $2.800\times$. But this hits a wall of diminishing returns quickly. When the parallelizable footprint ($P$) is pinned to 35%, scaling the localized task speedup ($S_{\text{task}}$) toward infinity yields a hard asymptotic ceiling. Mathematically, the absolute maximum total speedup this configuration can ever achieve is $1 / (1 - 0.35)$, which works out to a hard cap of a 53% overall increase ($1.53\times$) in shipped software. So, no matter how fast an autonomous bot can process a commit, the remaining 65% human sequential bottleneck ($1 - P$) acts as a hard stop.
Is P=0.35 sensible? I think that 35% parallel fraction passes the smell test. If you ask any developer what percentage of their week is spent actually writing code, they'll give you a number right in this ballpark, around 20% to 40%. The remaining 65% of the developers' time is consumed by finding/defining the task, planning, and paying the inevitable human communication tax of meetings and team alignment.
This is why that flatlining 35% profile across all three tool generations makes sense to me. Generative AI can supercharge the coding sandbox by churning out code at high speed (maxing out $S_{\text{task}}$), but it can't parallelize the systemic reasoning, organizational consensus, and deep problem/model comprehension that dominates the rest of the job. Until we find a way to automate those parts as well, shipping software will remain an inherently human-throttled process.
June 08, 2026
Automate Oracle PL/SQL to PostgreSQL migration with Amazon Bedrock and Strands Agents
Building Python applications with SQLAlchemy and Aurora DSQL
A Case for Simulation-Driven Resilience in Agentic Data Systems
As I mentioned in my previous post, I traveled to San Jose at the end of May for the ACM CAIS conference. On Day 0, I gave a very short talk at the Supporting our AI Overlords (SAO) workshop. This post is the promised summary of our paper, "A Case for Simulation-Driven Resilience in Agentic Data Systems", joint work with Aleksey Charapko (University of New Hampshire) and Akshat Vig (MongoDB).
Metastability is critical for building the next generation of distributed systems
Our story starts with metastability. Metastability is the failure mode where the mechanisms built to protect the system (retries, queues, timeouts, load shedding) turn into amplifiers. Even after the trigger that caused the overload goes away, the system stays behind, churning through busy work, perpetually trying to catch up with the remnants of failed and behind-schedule tasks. It's a bit like missing some foundational math in high school. You spend so long backfilling the old gaps that you never keep up with the new material piling on top, so you stay permanently behind. The catching up work is what keeps you behind, which requires catching up work later, which keeps you behind. (This is presumably why I'm not a rich machine learning scientist today at Deep Mind.)
Avoiding and tolerating metastable failures is critical to building the next generation of reliable distributed systems. Aleksey's Metastable Failures in the Wild study (OSDI'22) cataloged these failures from real production incidents. They matter so much because they are the hard faults that remain. We have largely learned to deal with the straightforward ones (crashes, corruptions, dropped packets), which leaves emergent performance failures as the final boss. Metastable failures are responsible for a disproportionate share of critical cloud unavailability incidents, with no single broken part to point at and no obvious way to fix/reset them when they emerge. The cloud economics make the problem worse. Providers and operators have every incentive to run with the absolute minimum of excess capacity and to trim slack and "waste". But this thin margin is exactly where metastability thrives.
Agents supercharge the feedback loops and put metastability on steroids
As AI agents are replacing humans as the primary clients of modern data systems, they are bringing a qualitative shift in workload characteristics. Agents retry aggressively while mutating the query on each attempt. They fan out into bursty parallel sub-tasks. They hold transactions open while they wait on an external LLM to provide the next step. The "Supporting our AI overlords" paper showed that agents create ~20x more branches and perform ~50x more rollbacks than humans do. These behaviors violate assumptions baked into every layer of a modern data system. Execution control assumes stationary arrivals. Caching assumes temporal locality. Concurrency control assumes bounded hold times. Agents break all three at once!
We propose simulation-driven resilience to address this problem
Simulation can enable us to systematically explore the agent-database boundary, and discover/prevent metastability failures before a production incident forces a reactive (and nonworking) patch. We propose a simulation based approach because only simulation is cheap enough to sweep an enormous trigger space, and deterministic enough to replay and dissect every failure it finds.
- Benchmarks measure steady state. But, metastability is transient and emergent: it lives in the sequence of rare events, not in the average.
- Queueing theory assumes mostly stationary independent arrivals. Agents break these assumptions.
- Testing with the production system is hopeless for design sweeps and gives you almost no observability. When your database falls over under load, you still don't know what went wrong, or how it went wrong, and you can't explore the design space because you have no feedback to explore it with.
MESSI finds failures in the seams
Metastability scurries in the seams/interaction of the composition of subsystems, so we need a tool that lets us look there. Aleksey developed MESSI (MEtaStability SImulator), a discrete-event simulation framework for exploring metastability dynamics in distributed systems. MESSI enables modeling any (sub)system as a directed graph and composition of (sub)systems. Logic Nodes implement routing and state policy. Processors model the physical resource constraints (queues, I/O delays, network latency). Individual work items, QItems, carry state as they traverse the graph. There is a clear separation of roles: policy lives in the nodes, resource contention lives in the edges, and you can vary each independently.
Because this is a simulator, it is deterministic and replayable, and it exposes the full internal state of every component at every tick. A metastable trigger you discover once can then be re-run against alternate designs to see which ones survive.
Our findings from the Execution Control System (ECS) simulations
Using MESSI, we performed an analysis of the Execution Control System (ECS), because it is the critical first domino: when the ECS fails, every subsystem downstream of it (caches, buffer pools, lock managers) fails after it. The ECS sits between admitted requests and the execution engine and decides who runs, in what order, with how much. It is the component that mediates resource contention at the backend. The usual design hands out a bounded pool of execution tickets (one ticket buys one worker thread), sorts admitted requests into a few priority queues with different ticket budgets, and dispatches them to worker threads and I/O slots. But, unlike an OS scheduler, which aims for fairness and wants every thread to eventually run, the ECS has to make decisive choices. It needs to prioritize the latency-sensitive short queries and shed the excess, as it aims the cost of waiting off the server and back onto the client. This is, of course, exactly what closes the feedback loop with a retrying agent.
We found two interesting results in our analysis.
Two reasonable policies may compose into a metastable loop
A natural ECS design uses two queues, a high-priority one for short tasks and a low-priority one for long tasks, with a probing policy on each that nudges its ticket count up or down to chase a performance metric. Each policy is sensible in isolation. But when you compose them, they interfere with each other on occasion in a metastable manner. The long queue probes for more tickets to improve its own throughput. More long-task tickets means more threads contending for the same cores, which steals CPU from the short queue, which then escalates its ticket count to keep up. Now both queues are inflating until they slam into their hard limits. The trap is that ticket acquisition rate, the metric each policy is optimizing, stops predicting actual progress under overload. A queue full of waiting tasks can churn tickets at a furious rate (grab a ticket, do 1 ms of work, yield, repeat) while getting almost nothing done. The metric looks healthy while the system performance collapses.
Admission control and the ECS destructively interfere
Putting an admission controller in front of the ECS sounds like defense in depth, but this can also backfire when done naively. Admission control drops requests indiscriminately. It sits at the network edge, and when it sees elevated latency, it starts rejecting a fraction of everything, short and long task alike. In simulations, we found that a workload spike can trigger admission control immediately, even though the ECS, left alone, would have rebalanced its tickets and absorbed the load after a brief adjustment. But the drops from the admission control prevent the ECS from rebalancing. With work being shed out from under it, the ECS never gets the signal it needs to adjust allocations and priorities, and the system stays parked in reduced-goodput mode until the workload subsides on its own. Under agents this gets worse, because admission control can't tell an agent's first attempt from its fifth retry, and a rejection would usually cause the agent to escalate things.
The full ECS design study is in our companion paper, "Towards Designing an Execution Control System with Metastability Resilience" (to appear at IEEE ICCCN 2026).
June 07, 2026
Getting Paid by Flat Rate Movers
Back in 2023, I hired Flat Rate Movers (A.K.A. Flat Rate Moving) for an interstate move. They subcontracted to a third party who showed up under-staffed, under-equipped, and very confused; the whole mess wound up causing a good deal of damage to my belongings and home. I filed a claim with Flat Rate and also requested they issue a partial refund. They refused to talk to me or their insurance company, and after six months of calling, emailing, and writing, I got overwhelmed by other life struggles and gave up. I did manage to file a claim with their backing insurance company, but got nothing for the damage to the house, or failure to deliver goods and services.
In October 2025 Flat Rate started sending me e-mails asking me to use Flat Rate for a move again. I called, explained the situation, and asked how they were planning to resolve it. They invited me to write to customer service, and the whole process resumed, with essentially the same result. I’d call every few days to ask how things were going, and various Flat Rate representatives would promise they’d get back to me tomorrow, or a check was in the mail, or that they’d talk to claims for me. Unsurprisingly, none of this worked.
Long story short, after a bunch of claim forms, emails, letters, complaints to various regulators, requests for arbitration, and one hundred and sixty-one phone calls with Flat Rate directly, I have succeeded. I started calling their main line at (212) 988-9292, picking a three-digit extension, and asking that person for help. I did this ten times each morning, and within two weeks had multiple people offering to help. I can’t say for sure which person got things over the line, because I called a lot of them, but by April 2026, I had a check in the mail.
If you wind up in the same boat, I recommend two things. First, keep a log of everything you do. Flat Rate may tell you things like “we did not receive supporting documentation within sixty days” or “you did not file the claim in time”; it is extremely helpful to be able to say things like “Please see my email from 2023-05-06 at 15:32 in which I submitted the requested documentation.” Second, set up a spreadsheet to keep track of which extensions you’ve called and when. Here's mine—each row number is the three-digit extension.
Extended RUM in DocumentDB extension for PostgreSQL: Efficient ESR (Equality, Sort, Range) Queries
Last year, I examined RUM indexes within this series on multi-key indexing, demonstrating that they cannot substitute MongoDB's compound indexes for sorted queries. A year later, the DocumentDB extension has evolved to incorporate an Extended RUM index, which allows for an ordered scan rather than a bitmap scan. Let's revisit our pagination query to see how it performs now.
I start a container with the latest DocumentDB (version v0.112-0 from May 26, 2026):
docker run -d --name documentdb-local -p 10260:10260 -p 9712:9712 ghcr.io/documentdb/documentdb/documentdb-local:latest --username franck --password franck --start-pg
I can connect to PostgreSQL on port 9712, where many extensions are installed, including the extended RUM index:
docker exec -it documentdb-local psql -p 9712 postgres
psql (17.10 (Debian 17.10-1.pgdg13+1))
Type "help" for help.
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
-------------------------+---------+------------+------------------------------------------------------------
documentdb | 0.112-0 | public | API surface for DocumentDB for PostgreSQL
documentdb_core | 0.112-0 | public | Core API surface for DocumentDB on PostgreSQL
documentdb_extended_rum | 0.112-0 | public | DocumentDB Extended RUM index access method
pg_cron | 1.6 | pg_catalog | Job scheduler for PostgreSQL
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgis | 3.6.3 | public | PostGIS geometry and geography spatial types and functions
tsm_system_rows | 1.0 | public | TABLESAMPLE method which accepts number of rows as a limit
vector | 0.8.2 | public | vector data type and ivfflat and hnsw access methods
(8 rows)
postgres=#
I can also connect to the MongoDB-compatible API:
docker exec -it documentdb-local mongosh -u franck -p franck 'mongodb://localhost:10260/?tls=true&tlsAllowInvalidCertificates=true'
Current Mongosh Log ID: 6a0b3b537d2a1c3471d1a7ba
Connecting to: mongodb://<credentials>@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true&directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.8.3
Using MongoDB: 7.0.0
Using Mongosh: 2.8.3
For mongosh info see: https://www.mongodb.com/docs/mongodb-shell/
[direct: mongos] test>
Like in the previous post, I created a simple collection with 10,000 documents:
[direct: mongos] test>
for (let i = 0; i < 10000; i++) {
db.demo.insertOne({
a: 1,
b: Math.random(),
ts: new Date()
});
}
I create a compound index that follows the MongoDB Equality, Sort, Range rule—designed for queries with an equality filter on a and a sort on ts:
[direct: mongos] test>
db.demo.createIndex({ "a": 1, "ts": -1 });
I run the same query as in the previous post, which, with the standard RUM indexes, produced a Bitmap Index Scan followed by a Sort of all documents matching a: 1 before returning the top 10:
[direct: mongos] test>
db.demo.find(
{ a: 1 }
).sort(
{ ts: -1 }
).limit(10).explain("executionStats");
The good surprise is that with the current version of DocumentDB, the execution plan looks like MongoDB's native IXSCAN with no additional sort step:
[direct: mongos] test> db.demo.find(
{ a: 1 }).sort({ts:-1}).limit(10).explain("executionStats")
;
{
explainVersion: 2,
...
executionStats: {
nReturned: Long('10'),
executionTimeMillis: 0.286,
executionStartAtTimeMillis: 0.256,
totalDocsExamined: Long('10'),
totalKeysExamined: Long('10'),
executionStages: {
stage: 'LIMIT',
nReturned: Long('10'),
executionTimeMillis: 0.286,
executionStartAtTimeMillis: 0.256,
totalDocsExamined: 10,
totalKeysExamined: 10,
numBlocksFromCache: 25,
inputStage: {
stage: 'FETCH',
nReturned: Long('10'),
executionTimeMillis: 0.267,
executionStartAtTimeMillis: 0.253,
totalKeysExamined: 10,
numBlocksFromCache: 25,
inputStage: {
stage: 'IXSCAN',
nReturned: Long('10'),
executionTimeMillis: 0.267,
executionStartAtTimeMillis: 0.253,
indexName: 'a_1_ts_-1',
totalKeysExamined: 10,
numBlocksFromCache: 25
}
}
}
},
ok: 1
}
It read only the minimum necessary to get the result: ten index entries (totalKeysExamined: 10) in the expected order and fetched only ten documents (totalDocsExamined: 10). This is the most efficient execution plan.
Comparing the Two RUM Index Definitions
I connect to PostgreSQL to describe the table that stores my collection documents (you will see later how I obtained the name):
postgres=# \d documentdb_data.documents_7
Table "documentdb_data.documents_7"
Column | Type | Collation | Nullable | Default
-----------------+--------+-----------+----------+---------
shard_key_value | bigint | | not null |
object_id | bson | | not null |
document | bson | | not null |
Indexes:
"collection_pk_7" PRIMARY KEY, btree (shard_key_value, object_id)
"documents_rum_index_25" documentdb_extended_rum (document documentdb_extended_rum_catalog.bson_extended_rum_composite_path_ops (pathspec='[ "a", { "ts" : -1 } ]', tl='2691'))
Check constraints:
"shard_key_value_check" CHECK (shard_key_value = '7'::bigint)
postgres=#
What was a standard RUM index in the previous post is now an extended RUM index:
| Attribute | Previous post | Current test |
|---|---|---|
| Index Type | documentdb_rum |
documentdb_extended_rum |
| Operator Class |
bson_rum_single_path_ops (×2) |
bson_extended_rum_composite_path_ops |
| Fields |
a (asc), ts (implicit asc) |
a (asc), ts (desc) |
Sort Direction on ts |
Not specified / default ascending | Explicitly -1 (descending) |
| Path Encoding | Two separate path= entries |
Single JSON pathspec array |
The extended RUM index acts as a sort-order-aware composite index, embedding the descending direction directly into the pathspec. Unlike the previous approach, which stored each path independently, this approach encodes all indexed fields as a single composite pathspec and generates a single composite index entry per document, preserving the relative ordering between fields. An index scan (RumOrderedScan) efficiently covers both filtering and sorting, eliminating the need for a separate Sort node in the PostgreSQL execution plan. This benefit is evident when executing the same query via the DocumentDB API in PostgreSQL:
postgres=# explain (analyze, buffers, verbose, costs off)
select document from bson_aggregation_find(
'test',
'{
"find": "demo",
"filter": { "a": 1 },
"sort": { "ts": -1 },
"limit": 10
}'::documentdb_core.bson
);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Limit (actual time=0.108..0.138 rows=10 loops=1)
Output: document, (bson_orderby(document, '{ "ts" : { "$numberInt" : "-1" } }'::bson))
Buffers: shared hit=4
-> Index Scan using "a_1_ts_-1" on documentdb_data.documents_7 collection (actual time=0.104..0.117 rows=10 loops=1)
Output: document, bson_orderby(document, '{ "ts" : { "$numberInt" : "-1" } }'::bson)
Index Cond: (collection.document @= '{ "a" : { "$numberInt" : "1" } }'::bson)
Order By: (collection.document |-<> '{ "ts" : { "$numberInt" : "-1" } }'::bson)
Buffers: shared hit=4
Planning:
Buffers: shared hit=2
Planning Time: 0.453 ms
Execution Time: 0.192 ms
(12 rows)
postgres=#
Note: I got the name of the internal table I described above from this execution plan, which uses the collection name in the query. The MongoDB API's explain() shows a MongoDB-compatible execution plan, and EXPLAIN in PostgreSQL shows the PostgreSQL version of it.
Comparison of Execution Plans
Here is how the new behavior with an ordered index scan compares to the previous bitmap scan.
| Feature | Ordered Index Scan | Bitmap Index Scan |
|---|---|---|
| PostgreSQL Node | Index Scan |
Bitmap Index Scan |
| Ordering | Handled by sort direction in index | Lost—requires a Sort node |
| Scan Type | scanType: RumOrderedScan |
scanType: RumFastScan / RumRegularScan |
| Efficiency | Supports early termination (LIMIT) |
Must scan all matching TIDs into bitmap |
| RUM Entry Point | rumgettuple() |
rumgetbitmap() |
| Sort Step | None — useSimpleScan = true
|
rum_tuplesort_performsort() required |
| Memory Usage | Low—one tuple at a time | High—full TIDBitmap + sort state |
| Index Structure Used | B-tree walk via orderStack
|
Posting list / posting tree dump |
| Filter Evaluation | Inline via ValidateIndexEntry()
|
Post-collection in keyGetItem()
|
| Seek Optimization | Yes—advances queryKey as entries exhaust |
No |
| Multi-column Support | Multi-column via composite pathspec | Multi-column via separate entries |
LIMIT benefit |
✅ Full — stops after N rows | ❌ None — bitmap built before LIMIT applies |
| Recheck Behavior |
xs_recheckorderby per tuple |
xs_recheck on bitmap result |
| Trigger Condition |
RumEnableOrderedOperatorScans + willSort + norderbys > 0
|
Default path |
Index Only Scan
Other improvements are coming to Extended RUM, like Index Only Scan, currently supported for COUNT:
postgres=# explain (analyze, buffers, verbose, costs off)
SELECT document FROM bson_aggregation_count(
'test',
'{
"count": "demo",
"query": { "a": 1 }
}'::documentdb_core.bson
);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (actual time=22.760..22.763 rows=1 loops=1)
Output: documentdb_api_internal.bsoncommandcount(1)
Buffers: shared hit=109
-> Index Only Scan using "a_1_ts_-1" on documentdb_data.documents_7 collection (actual time=0.089..14.292 rows=10000 loops=1)
Output: collection.document
Index Cond: (collection.document @= '{ "a" : { "$numberInt" : "1" } }'::bson)
Heap Fetches: 0
Buffers: shared hit=109
Planning:
Buffers: shared hit=4
Planning Time: 0.441 ms
Execution Time: 22.883 ms
(12 rows)
Index Only Scan will be supported in the future (see IsQueryValidForIndexOnlyScan)
Conclusion
A year ago, DocumentDB's RUM indexes had a significant limitation for pagination queries: even with the right compound index, the planner would fall back to a Bitmap Index Scan followed by a full Sort, meaning every matching document had to be collected and sorted before the first result could be returned. A LIMIT 10 query on 10,000 documents would examine all 10,000—defeating the purpose of the compound index.
With v0.112-0, this is fixed. The new documentdb_extended_rum index type, combined with the RumOrderedScan execution path, reduces the gap with native MongoDB behavior:
- The index encodes sort direction directly in the
pathspec({ "ts": -1 }) - The planner chooses an Index Scan instead of a Bitmap Index Scan
- No
Sortnode appears in the plan -
LIMIT 10examines exactly 10 index entries and 10 documents
This is more than just a cosmetic change. In time-series queries—such as filtering on a low-cardinality field, sorting by timestamp descending, and retrieving the first page—the difference between the two plans ranges from O(result) to O(total size). For OLTP systems, pagination queries are common and need to be quick and reliable, since they show results to the user before the user takes any action, selects, refines filters, or moves to the next page.
This ordered scan is also essential for TTL indexes to efficiently identify expiration candidates.
The key ingredients that make this work together are visible from the DocumentDB open-source code:
-
documentdb_extended_rum—compositepathspecwith explicit sort direction -
bson_extended_rum_composite_path_ops— single operator class covering all fields -
RumOrderedScan— B-tree walk in index order viaorderStack, bypassingrumgetbitmap() -
useSimpleScan— returns one tuple at a time, enabling trueLIMITpushdown -
RumAllowOrderByRawKeys— the GUC that enables this path, now on by default
This is all enabled by default:
postgres=# \dconfig *rum*order*
List of configuration parameters
Parameter | Value
----------------------------------------------+-------
documentdb_rum.enable_ordered_operator_scans | on
documentdb_rum.forceRumOrderedIndexScan | off
(2 rows)
In under a year, DocumentDB evolved from "RUM instead of GIN, but with the same pagination limitations" to "RUM with ordered scan, aligning more with MongoDB's IXSCAN behavior for ESR-pattern indexes". For developers implementing cursor-based pagination or queries with a selective filter and sorting on a time or sequence field, this marks the version at which it begins to function as expected.
June 05, 2026
Getting Started with pg_durable: Durable Workflows Inside PostgreSQL
Modern applications demand long‑lasting workflows that can withstand crashes, restarts, and failures, often relying on external platforms. pg_durable integrates durable workflow orchestration directly into PostgreSQL, enabling workflows to be created in SQL and reliably executed, monitored, inspected, and recovered from disruptions.
This tool is particularly useful for ETL processes, data pipelines, background tasks, scheduled jobs, long‑running business procedures, and internal workflow management.
To truly grasp its capabilities, the best approach is to try it yourself. I did so by building and running it locally, which provided practical insight.
Installing pg_durable
I installed Rust to compile the extension:
curl https://sh.rustup.rs -sSf | sh
. "$HOME/.cargo/env"
I installed the PostgreSQL build dependencies.
sudo dnf install -y openssl-devel pkg-config libicu-devel readline-devel flex
I installed pgrx for PostgreSQL 17:
cargo install cargo-pgrx --version 0.16.1 --locked
cargo pgrx init --pg17=download
export PG_VERSION=pg17
I enabled the extension preload library:
sed -e "/shared_preload_libraries/s/^.*$/shared_preload_libraries='pg_durable'/" \
-i ~/.pgrx/data-17/postgresql.conf
I declared the superuser that I'll create to run the background worker:
cat >> ~/.pgrx/data-17/postgresql.conf <<<"pg_durable.worker_role='superworker'"
I cloned pg_durable from the Microsoft repository:
git clone https://github.com/microsoft/pg_durable.git
cd pg_durable
I built the extension, started PostgreSQL, and got a psql prompt:
cargo pgrx run
I checked the pg_durable configuration, connected to the pg_durable.database, created the pg_durable.worker_role user, and installed the extension:
pg_durable=# \dconfig pg_durable.*
List of configuration parameters
Parameter | Value
---------------------------------------+-------------
pg_durable.database | postgres
pg_durable.enable_superuser_instances | off
pg_durable.execution_acquire_timeout | 30
pg_durable.max_duroxide_connections | 10
pg_durable.max_management_connections | 6
pg_durable.max_user_connections | 10
pg_durable.worker_role | superworker
(7 rows)
\c postgres
CREATE ROLE superworker SUPERUSER LOGIN;
CREATE EXTENSION pg_durable;
My PostgreSQL instance is now ready to use durable functions.
Hello World
I create a user that will execute workflows and grant access to pg_durable:
CREATE USER franck;
GRANT CREATE ON DATABASE postgres TO franck;
SELECT df.grant_usage('franck');
I reconnect as that user:
postgres=# \c postgres franck
You are now connected to the database "postgres" as user "franck".
postgres=>
I start a durable workflow:
postgres=> SELECT df.start(
$$SELECT ''Hello, durable world!'' AS message$$
);
start
----------
393f09e0
(1 row)
One second later, it is completed:
postgres=> SELECT df.status('393f09e0');
status
-----------
completed
(1 row)
The result can be fetched as a document with a list of rows:
postgres=> SELECT df.result('393f09e0');
result
------------------------------------------------------------------
{"rows": [{"message": "Hello, durable world!"}], "row_count": 1}
(1 row)
I used the same "Hello, durable world!" example as in the HorizonDB durable functions documentation, because this is where pg_durable comes from: Microsoft provides it to PostgreSQL as an open-source extension.
Here is more information about this execution:
postgres=> \x
Expanded display is on.
postgres=> SELECT * FROM df.list_instances();
-[ RECORD 1 ]---+-------------------------------------------------------------
instance_id | 393f09e0
label |
function_name | pg_durable::orchestration::execute-function-graph
status | completed
execution_count | 1
output | {"rows":[{"message":"Hello, durable world!"}],"row_count":1}
postgres=> \x
Expanded display is off.
postgres=>
The execution graph is basic, as there's a single operation:
postgres=> SELECT df.explain('393f09e0');
explain
------------------------------------------------------------------------
Instance: 393f09e0 +
Status: ✓ Completed +
Output: {"rows":[{"message":"Hello, durable world!"}],"row_count":1}+
+
SQL: SELECT 'Hello, durable world!' AS message ✓
(1 row)
There is a single node:
postgres=> \x
Expanded display is on.
postgres=> SELECT * from df.instance_nodes('393f09e0');
-[ RECORD 1 ]+-----------------------------------------------------------------
execution_id | 1
node_id | 60e433bb
node_type | SQL
query | SELECT 'Hello, durable world!' AS message
result_name |
left_node |
right_node |
status | completed
result | {"rows": [{"message": "Hello, durable world!"}], "row_count": 1}
updated_at | 2026-06-04 16:36:38.594906+00
postgres=> \x
Expanded display is off.
It was executed once:
postgres=> SELECT * from df.instance_executions('393f09e0');
execution_id | status | event_count | duration_ms | output
--------------+-----------+-------------+-------------+--------------------------------------------------------------
1 | Completed | 14 | 692 | {"rows":[{"message":"Hello, durable world!"}],"row_count":1}
(1 row)
Here are the global metrics:
postgres=> SELECT * FROM df.metrics();
total_instances | running_instances | completed_instances | failed_instances | total_executions | total_events
-----------------+-------------------+---------------------+------------------+------------------+--------------
1 | 0 | 1 | 0 | 1 | 14
(1 row)
This simply looks like ADBA—Asynchronous Database Access, but it can do more. We will see that it can define complex workflows with a declarative syntax. The examples start the workflow immediately, but it can also wait for a signal or a specific time using a cron syntax:
Beyond a Single Query
While a simple SQL statement demonstrates the basics, pg_durable becomes much more powerful when orchestrating multiple operations.
The SQL workflow DSL supports:
| Feature | Operator |
|---|---|
| Variable capture | => |
| Sequential execution | ~> |
| Parallel execution (wait) | & |
| Parallel execution (race) | ` |
| Conditional branching | {% raw %}?> and !>
|
| Infinite loops | @> |
Here are simple table-free examples demonstrating pg_durable's key features. Let's start with a sequential execution (~>):
postgres=> SELECT df.start(
'SELECT now() as step1' |=> 't1'
~> 'SELECT pg_sleep(5)'
~> 'SELECT now() as step2' |=> 't2'
~> 'SELECT now() as step3' |=> 't3',
'sequential-timing'
) as i \gset
-- Check the results after five seconds
postgres=> SELECT df.result(:'i');
result
---------------------------------------------------------------------------
{"rows": [{"step3": "2026-06-04T21:33:59.939785+00:00"}], "row_count": 1}
(1 row)
postgres=> SELECT node_type, result_name, result, node_id, left_node, right_node FROM df.instance_nodes(:'i');
node_type | result_name | result | node_id | left_node | right_node
-----------+-------------+---------------------------------------------------------------------------+----------+-----------+------------
THEN | t3 | {"rows": [{"step3": "2026-06-04T21:33:59.939785+00:00"}], "row_count": 1} | 49c9c8b4 | 48fcec43 | 9f818758
THEN | t2 | {"rows": [{"step2": "2026-06-04T21:33:59.507283+00:00"}], "row_count": 1} | 48fcec43 | 348c1948 | c78f2200
THEN | | {"rows": [{"pg_sleep": null}], "row_count": 1} | 348c1948 | c7317ea8 | 200f8a52
SQL | t1 | {"rows": [{"step1": "2026-06-04T21:33:53.746030+00:00"}], "row_count": 1} | c7317ea8 | |
SQL | | {"rows": [{"pg_sleep": null}], "row_count": 1} | 200f8a52 | |
SQL | | {"rows": [{"step2": "2026-06-04T21:33:59.507283+00:00"}], "row_count": 1} | c78f2200 | |
SQL | | {"rows": [{"step3": "2026-06-04T21:33:59.939785+00:00"}], "row_count": 1} | 9f818758 | |
(7 rows)
I displayed the timestamps to show that steps 2 and 3 occur 5 seconds after step 1 because I used pg_sleep(5) in the sequential execution.
The node_type column indicates the type of operation each node represents in the function graph. Here, THEN denotes sequential execution: running the left child, then the right child, and SQL is the leaf node with no children, executing the SQL query.
The complete list of valid node types in pg_durable is:
| node_type | Description | Operator/Function |
|---|---|---|
| SQL | Execute SQL query | Plain string or df.sql()
|
| THEN | Sequential execution |
~> operator or df.seq()
|
| IF | Conditional branch |
?> !> operators or df.if()
|
| JOIN | Parallel execution (wait all) |
& operator or df.join()
|
| RACE | Parallel execution (first wins) |
\ operator or df.race()
|
| LOOP | Infinite loop |
@> operator or df.loop()
|
| BREAK | Exit loop | df.break() |
| SLEEP | Pause execution | df.sleep() |
| WAIT_SCHEDULE | Wait for cron schedule | df.wait_for_schedule() |
| HTTP | Make HTTP request | df.http() |
| SIGNAL | Wait for external signal | df.wait_for_signal() |
We can execute steps in parallel (&):
SELECT df.start(
('SELECT now() as branch1' |=> 'b1' ~> 'SELECT pg_sleep(20)')
&
('SELECT now() as branch2' |=> 'b2' ~> 'SELECT pg_sleep(10)')
~> 'SELECT now() as after_join' |=> 'final',
'parallel-sleep'
) as i \gset
postgres=> SELECT node_type, result_name, result, node_id, left_node, right_node FROM df.instance_nodes(:'i');
node_type | result_name | result | node_id | left_node | right_node
-----------+-------------+--------------------------------------------------------------------------------------------------+----------+-----------+------------
SQL | b1 | {"rows": [{"branch1": "2026-06-04T21:53:47.870539+00:00"}], "row_count": 1} | 9d479c96 | |
SQL | b2 | {"rows": [{"branch2": "2026-06-04T21:53:47.870606+00:00"}], "row_count": 1} | cb844cd8 | |
SQL | | {"rows": [{"pg_sleep": null}], "row_count": 1} | ad8ec143 | |
THEN | | {"rows": [{"pg_sleep": null}], "row_count": 1} | 1ff3adc8 | cb844cd8 | ad8ec143
SQL | | {"rows": [{"pg_sleep": null}], "row_count": 1} | 9db57c15 | |
THEN | | {"rows": [{"pg_sleep": null}], "row_count": 1} | 5e4d8069 | 9d479c96 | 9db57c15
JOIN | | [{"rows": [{"pg_sleep": null}], "row_count": 1}, {"rows": [{"pg_sleep": null}], "row_count": 1}] | 6f95c84d | 5e4d8069 | 1ff3adc8
SQL | | {"rows": [{"after_join": "2026-06-04T21:54:08.752247+00:00"}], "row_count": 1} | d4be28af | |
THEN | final | {"rows": [{"after_join": "2026-06-04T21:54:08.752247+00:00"}], "row_count": 1} | df59e250 | 6f95c84d | d4be28af
(9 rows)
Here, b1 and b2 started at the same time, in parallel. One branch completed after 20 seconds and the other after 10 seconds, so the final step started after 20 seconds, and the total duration was 20 seconds—it would have been more than 30 seconds if executed serially:
postgres=> SELECT * from df.instance_executions(:'i');
execution_id | status | event_count | duration_ms | output
--------------+-----------+-------------+-------------+----------------------------------------------------------------------------
1 | Completed | 26 | 22107 | {"rows":[{"after_join":"2026-06-04T21:54:08.752247+00:00"}],"row_count":1}
(1 row)
Parallel nodes are used when we need the result from all operations. The next step awaits all. If we want to continue as soon as one branch gets a result, we can run a race, where the first one wins (|):
SELECT df.start(
'SELECT pg_sleep(30)' | 'SELECT pg_sleep(90)' ~> 'SELECT ''fast'' as winner',
'race-test'
) as i \gset
select usename,backend_start,query_start,wait_event,state,query,application_name from pg_stat_activity where usename=user and pid!=pg_backend_pid(
)
\watch 10
Fri 05 Jun 2026 04:08:30 AM GMT (every 10s)
usename | backend_start | query_start | wait_event | state | query | application_name
---------+-------------------------------+-------------------------------+------------+--------+---------------------+------------------
franck | 2026-06-05 04:08:29.873785+00 | 2026-06-05 04:08:29.877368+00 | PgSleep | active | SELECT pg_sleep(30) |
franck | 2026-06-05 04:08:29.875738+00 | 2026-06-05 04:08:29.879296+00 | PgSleep | active | SELECT pg_sleep(90) |
(2 rows)
Fri 05 Jun 2026 04:08:40 AM GMT (every 10s)
usename | backend_start | query_start | wait_event | state | query | application_name
---------+-------------------------------+-------------------------------+------------+--------+---------------------+------------------
franck | 2026-06-05 04:08:29.873785+00 | 2026-06-05 04:08:29.877368+00 | PgSleep | active | SELECT pg_sleep(30) |
franck | 2026-06-05 04:08:29.875738+00 | 2026-06-05 04:08:29.879296+00 | PgSleep | active | SELECT pg_sleep(90) |
(2 rows)
Fri 05 Jun 2026 04:08:50 AM GMT (every 10s)
usename | backend_start | query_start | wait_event | state | query | application_name
---------+-------------------------------+-------------------------------+------------+--------+---------------------+------------------
franck | 2026-06-05 04:08:29.873785+00 | 2026-06-05 04:08:29.877368+00 | PgSleep | active | SELECT pg_sleep(30) |
franck | 2026-06-05 04:08:29.875738+00 | 2026-06-05 04:08:29.879296+00 | PgSleep | active | SELECT pg_sleep(90) |
(2 rows)
Fri 05 Jun 2026 04:09:00 AM GMT (every 10s)
usename | backend_start | query_start | wait_event | state | query | application_name
---------+-------------------------------+-------------------------------+------------+--------+---------------------+------------------
franck | 2026-06-05 04:08:29.875738+00 | 2026-06-05 04:08:29.879296+00 | PgSleep | active | SELECT pg_sleep(90) |
(1 row)
Fri 05 Jun 2026 04:09:10 AM GMT (every 10s)
usename | backend_start | query_start | wait_event | state | query | application_name
---------+-------------------------------+-------------------------------+------------+--------+---------------------+------------------
franck | 2026-06-05 04:08:29.875738+00 | 2026-06-05 04:08:29.879296+00 | PgSleep | active | SELECT pg_sleep(90) |
(1 row)
Fri 05 Jun 2026 04:09:20 AM GMT (every 10s)
usename | backend_start | query_start | wait_event | state | query | application_name
---------+-------------------------------+-------------------------------+------------+--------+---------------------+------------------
franck | 2026-06-05 04:08:29.875738+00 | 2026-06-05 04:08:29.879296+00 | PgSleep | active | SELECT pg_sleep(90) |
(1 row)
Fri... (truncated)
Village News: MySQL/Database News + Events (5 June 2026)
June 04, 2026
Understanding how backups work in Amazon Aurora
Migrating from MongoDB 6.0 to 8.0: How Percona ClusterSync Handles Cross-Version Replication
Percona ClusterSync for MongoDB (PCSM) replicates data between MongoDB clusters to keep migrations with near-zero downtime. Prior to version 0.9.0 it required the source and target to run the same major version, which ruled out the lift-and-shift move most migrations want: going from an older major like 6.0 straight onto a newer one like 8.0. … Continued
The post Migrating from MongoDB 6.0 to 8.0: How Percona ClusterSync Handles Cross-Version Replication appeared first on Percona.
Multigres v0.1 Alpha: an operating system for Postgres
Supabase Series F
June 03, 2026
Happy Path, Escape Hatch, and the Space Between
Index types supported in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL using extensions (Bloom, pg_trgm, and pg_bigm)
Index types supported in Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL using extensions (SP-GiST, Btree_Gin and Btree_Gist)
Migrating data from Oracle to Amazon Aurora DSQL
Implementing real-time change data capture with Debezium for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL
June 02, 2026
ACM CAIS: Conference on AI and Agentic Systems
Last week, I traveled to San Jose to attend the ACM CAIS conference. On Day 0, I gave a short talk at the Supporting our AI Overlords (SAO) workshop. And yes, I promise to write a summary of our paper, "A Case for Simulation-Driven Resilience in Agent-First Data Systems" soon!
To start with an overall impression of the conference: much of the work presented felt exploratory and anecdotal. Since the compound AI space is still so new, many work seemed to share on-the-ground best practices that worked for them rather than principled results. Some talks really leaned into the "agent, act like a senior engineer and don't make mistakes" vibe. This was especially apparent in the "Agent Skills Workshop". I am not saying this is a bad thing, I learned some valuable lessons from that workshop, which I'll share below.
CAIS defines the conference's scope broadly as "research on compound AI architectures, optimization, and deployment". Unfortunately, this broadness seemed to work against the main track. Attendance at the primary conference talks was low, and it often felt like attendees were there solely to present their own work rather than engage with others, likely because the subject matter was spread too thin.
In contrast, the workshops were highly focused, which led to much better engagement and active listening from the audience. Moving forward, I think CAIS would benefit greatly from narrowing its focus, maybe specifically focusing more on data systems and infrastructure in support of AI.
On that note, what happened to our collective attention span? CAIS limited paper presentations to 7 minutes with just 2 minutes for questions. This year, SIGMOD also shifted to 9-minute talks. Our own paper, "LeaseGuard: Raft Leases Done Right!", got a mere 9 minutes in the spotlight after Jesse traveled all the way to Bangalore to present it. I've even heard that USENIX Security is down to 3-minute talks now. Should we maybe consider slowing down? After all, isn't attention all we need?
Agent Skills workshop (Day 0)
In the first talk, Graham Neubig discussed OpenHands, their open-source AI developer agent platform that's getting a lot of traction in highly regulated fields like finance and healthcare to speed up software development. A big theme of his talk was skill induction: "the process of inducing/verifying programmatic or prompt-based capabilities through testing/evaluation to enable single-agent systems complete complex long-horizon tasks". Through leveraging offline human-annotated examples or online user feedback, skill induction kicks off a rapid learning phase. In web navigation tests like WebArena, an agent's success rate can ramp up dramatically over a small number of trials before settling into a robust repeatable skill set.
Later on, Kanav Garg (co-founder of Core Automation) walked us through the lifecycle of a Reinforcement Learning (RL) environment. He defined it as a continuous loop made up of an actionable prompt, a starting state, a runtime environment (like a Docker container), configuration, and a reward system. The main takeaway here was that successful RL needs careful difficulty calibration and precise reward shaping to keep agents from hacking the reward system. To get this right, engineers have to actually look at the agent's traces instead of blindly trusting that the numbers on a chart are going up. Kanav also said that data environments are living projects with a shelf life of at most two months, and this means continuous learning from task failures and automated data pipelines are far more important/effective than relying on static expensive human data.
SAO: Supporting our AI overlords (Day 0)
The core theme of the SAO workshop was that agents are rapidly becoming both the primary users and the builders of data systems, and this shift is creating a vast new design space demanding entirely new abstractions. The workshop featured three keynote speakers and was incredibly well attended. With no seating left, people were standing in the doorway just to watch the talks.
Aaron Katz from Clickhouse gave a nice talk on this transition from human to agent users. He said that agents aren't just querying data anymore; they are actively provisioning services, so they need their own identities and budgets. Because agents drive such massive concurrency and require interactive latencies, traditional per-query pricing models are becoming way too punitive. To keep up, platforms have to adapt to headless API-first experiences. He said that Clickhouse is launching the ability to build agents directly inside the database for lower latency and for blending structured and full-text search.
Next, Andy Pavlo talked about databases being the "final boss" for agents. Check out his slides. It is classic Andy humor and style, though a Wu-Tang Clan reference was sorely missing this time. Andy focused heavily on automated database tuning and development, comparing their Proto-X tuning agent with ChatGPT's Lambda-tune. While Proto-X gets better optimization results, it takes 12 hours to train per database, whereas ChatGPT is fast (14 minutes) but performs terribly. To bridge this gap, they adopted LLMs to boost automatic tuning algorithms by leveraging prior history. In the second (and shorter) part of the talk, Andy also noted that while coding agents are making progress, they still completely fail at building complex database components like query optimizers, which require much more support. Although they have successfully "vibed" and manually verified a couple of optimization passes (like DPHyp and Unnesting v2), blindly accepting an LLM's output is fraught with problems because verifying query plan equivalence is notoriously difficult (despite solver efforts from UW, Berkeley, and Microsoft). It seems like coding agents love to add special-case code when an optimizer actually needs to be as general as possible.
Finally, Nikita Shamgunov (ex-Neon, now Databricks) discussed the infrastructure needed for agents, categorizing this down into three pillars: state, compute, and middleware. He argued that because agents speed up the dev loop by 1000x, true serverless architectures are now absolutely critical to avoid the insane costs of overprovisioning. He also described Neon's architecture, highlighting how separating compute from storage allows for instant database branching for agents using microVMs. While he shared some genuinely interesting technical points, the presentation itself was pretty dry and felt like it was missing a clear focus.
The workshop ended with a panel. The organizers tried to spark a debate, but there wasn't much disagreement. The general consensus was that while traditional OLTP and OLAP boundaries will remain, agents will increasingly be the ones conducting tasks that span seamlessly across them. After the panel, we headed to a MongoDB-sponsored happy hour, where the workshop audience had more time to have relaxed conversations about the breakneck hellscape transition our industry is currently going through.
Wednesday (Day 1)
Since CAIS is an AI systems conference, the focus is more on building the systems that surround AI models rather training individual AI models. The dominant theme this year seemed to be [multi-]agent architectures, coordination protocols, and workflow design. The emphasis is on composition: how do we organize these agents, manage their contexts, coordinate their interactions, and handle their lifecycles?
A second major trend is the growing focus on day-to-day operations. Entire sessions were dedicated to evaluation, trace analysis, failure detection, routing, and cost optimization. With so many papers covering efficiency, scheduling, and economic tradeoffs, it is clear that the industry is shifting from just maximizing capability to maximizing capability per dollar.
My shortlist from Day 1:
- TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples
- Tressoir: Unifying Online, Offline, and HIL Design and Evolution of Multi-Agent Systems via Interpretable Blueprints
- Expansion-Contraction: A Multi-Agent Graph Traversal Pattern for Compound AI Systems
- A Language for Describing Agentic LLM Contexts
- Understanding and Improving Communication Performance in Multi-node LLM Inference
- Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
- Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints
- CAMI: Practical Cost-Aware Agent-Guided Multi-Indexing for Semantic Retrieval
Here a brief overview of the TraceFix paper, which is most relevant to my interests.
TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples
TraceFix tackles the coordination failures that happen when multiple LLM agents try to work together concurrently on shared tasks. It is crucial to understand that this paper is not about generating classical distributed computing algorithms; rather, it focuses on scaffolding the formal "rules of engagement" for multi-agent LLM systems. When agents collaborate on domain tasks requiring fine-grained mutual exclusion over shared mutable resources (such as editing the same codebase or scheduling access to a simulated lab instrument) they naturally run into interleaving-sensitive bugs like deadlocks, missed handshakes, and race conditions. TraceFix solves this by isolating the coordination layer, formally verifying the protocol for how agents use shared locks and message channels, and then allowing the agents to remain completely autonomous in executing their actual domain-specific work.
To achieve this, TraceFix introduces a verification-first pipeline where an orchestration agent first synthesizes a declarative protocol topology and writes the behavioral logic in PlusCal. Before any agent takes action, the TLA+ model checker (TLC) exhaustively searches this proposed protocol for safety violations and feeds concrete counterexample traces back to the agent for iterative repair until the code is fully verified. At runtime, these verified process bodies are compiled into agent prompts, and a monitor strictly enforces the approved topology by rejecting any invalid or out-of-bounds coordination attempts. TraceFix is evaluated across a benchmark of 48 complex tasks. The system achieved a 100% verification success rate within four repair iterations and significantly improved runtime task completion, proving that formal model-checker feedback can effectively eliminate the deadlocks and resource clashes in multi-agent workflows.
Counterintuitively, the introduction of this formal coordination scaffolding did not bog the system down with unnecessary overhead, but rather, it significantly accelerated execution. By structurally preventing agents from colliding over resources (a problem that caused a massive 61.2% contention rate and endless retry loops in unstructured chat-only setups) TraceFix eliminated wasted trial-and-error steps. As a result, the verified topology-monitored protocol executed in an average of just 93 seconds using 62 tool-call steps. This vastly outperformed both the chaotic chat-only baseline (229 seconds and 203 steps) and a sequential single-agent baseline (~304 seconds).
Thursday (Day 2)
On Thursday, the papers were more about making agent systems manageable. Many of these papers are about memory, planning, governance, verification, security, and control. Researchers are increasingly treating agents as long-running software systems that require architecture, interfaces, observability, safety mechanisms, and operational discipline. This resembles the evolution of distributed systems from clever protocols toward operational concerns such as consistency models, monitoring, fault tolerance, and standards.
A second trend was the emergence of agents as compound systems rather than monolithic models. Instead of expecting a single model to solve everything, people arenow building ecosystems of interacting components with explicit roles. The vibe here is more like systems engineering for AI.
My shortlist:
- Securing Agents With Tracked Capabilities (putting the agents in a programming-language-based safety harness, Scala 3 with capability types)
- Dossier: Deep Research via Ledger-Driven Branching Search and Query Encoding Learning
- Open Agent Specification: Enabling Cross-Framework Comparison of AI Agents
- Do Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool Calling
- The Verifier Tax: Horizon Dependent Safety–Success Tradeoffs in Tool Using LLM Agents
- Supervisory Control Theory for LLM Revision
- FedMECA: Scalable Federated Learning via Memory-Efficient and Concurrent Aggregation
Friday (Day 3)
In the Friday program, multiple papers study agent societies, debate, persuasion, socialization, consensus formation, and safety in multi-agent environments. Also, several papers focus on practical concerns such as routing requests across models, optimizing energy consumption, serving multi-agent systems, evaluating agent frameworks, and integrating AI into production workflows.
My shortlist:
- MoltGraph: A Longitudinal Temporal Graph Dataset of Moltbook for Coordinated-Agent Detection
- Why Johnny Can’t Use Agents: Industry Aspirations vs. User Realities with AI Agents
- The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate
- SEAR: Schema-Based Evaluation and Routing for LLM Gateways
- AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices