In DynamoDB, a single-table design stores one-to-many relationships in a single physical block while still following relational-like normal form decomposition. In MongoDB, the Single Collection Pattern unnests relationships from a single document, but goes against the general recommendation as it sacrifices one of MongoDB’s key advantages—keeping a document in a single block. In Oracle Database and MySQL, JSON-relational duality views normalize JSON documents into relational tables that span multiple blocks, also without the data-locality benefits of MongoDB. Can we turn these duality views into a document-database equivalent that keeps together the data accessed together?
Here is the Single-Cluster Duality View 🥁.
DynamoDB
NoSQL started with a simple key‑value API. For example, DynamoDB can get an item using the full key, query multiple items using a partial key, or scan all items without a key. The value, in this key‑value datastore, is atomic, but sometimes you need partial reads or writes. Then you have two solutions: read or write the full item, which is inefficient if you don't need it, or change your schema design to split it into multiple items.
The latter was known as the single table design because it not only splits items into multiple entities but also stores them in the same table, sharing a key prefix, to retrieve all items with a single query. The idea is that you can access individual objects while also benefiting from data locality when querying the full aggregate.
The benefit of this design relies heavily on DynamoDB specifics: the storage internals, where items are partitioned and clustered by their key, and the billing model, where you pay per table request unit.
Here is an example, using the traditional one-to-many schema with departments employees:
- One DynamoDB table
- Department and employees stored as separate items
- Same partition key (
DEPT#<deptno>) for both:
aws dynamodb create-table \
--table-name scott \
--attribute-definitions \
AttributeName=PK,AttributeType=S \
AttributeName=SK,AttributeType=S \
--key-schema \
AttributeName=PK,KeyType=HASH \
AttributeName=SK,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST
The logical model, where the relationship is materialized by sharing the department number as the same partition key, is also the physical model in which they are stored together, since items are partitioned on this key:
| PK |
SK |
Meaning |
DEPT#10 |
DEPT |
Department row |
DEPT#10 |
EMP#7782 |
Employee |
DEPT#10 |
EMP#7839 |
Employee |
DEPT#20 |
DEPT |
Department row |
DEPT#20 |
EMP#7369 |
Employee |
I insert some data
aws dynamodb put-item --table-name scott --item '{
"PK": {"S": "DEPT#10"},
"SK": {"S": "DEPT"},
"deptno": {"N": "10"},
"dname": {"S": "ACCOUNTING"},
"loc": {"S": "NEW YORK"}
}'
aws dynamodb put-item --table-name scott --item '{
"PK": {"S": "DEPT#10"},
"SK": {"S": "EMP#7782"},
"empno": {"N": "7782"},
"ename": {"S": "CLARK"},
"job": {"S": "MANAGER"},
"sal": {"N": "2450"}
}'
Here is a query that retrieves the whole aggregate by its partition key:
aws dynamodb query \
--table-name scott \
--key-condition-expression "PK = :d" \
--expression-attribute-values '{":d":{"S":"DEPT#10"}}' \
--consistent-read \
--return-consumed-capacity TOTAL
{
"Items": [
{
"deptno": { "N": "10" },
"PK": { "S": "DEPT#10" },
"loc": { "S": "NEW YORK" },
"dname": { "S": "ACCOUNTING" },
"SK": { "S": "DEPT" }
},
{
"ename": { "S": "CLARK" },
"PK": { "S": "DEPT#10" },
"job": { "S": "MANAGER" },
"empno": { "N": "7782" },
"sal": { "N": "2450" },
"SK": { "S": "EMP#7782" }
}
],
"Count": 2,
"ScannedCount": 2,
"ConsumedCapacity": {
"TableName": "scott",
"CapacityUnits": 1.0
}
}
All items are read from a single partition with 1 consistent read capacity unit (RCU), thanks to physical colocation by key design.
MongoDB
MongoDB doesn't need to split documents to get partial reads and writes. The data modeling objective is to keep aggregates in a single document and use the advanced API to access individual items. For example, you can use covering indexes or search indexes with returnStoredSource to avoid reading the whole document. And you can use $push or $set with arrayFilters to update individual array items.
Still, some users preferred to split the documents and, by analogy to DynamoDB, called it the Single Collection Pattern. In truth, it is rarely useful. MongoDB is not DynamoDB. Storing documents in one or multiple collections doesn't impact billing, and using the same key prefix doesn't co‑locate them — except in two cases: clustered collections (special‑purpose, not generally recommended), and sharing a sharding key to co‑locate data on the same shard (but still different blocks in the filesystem).
At the storage block level, documents may be stored together only if they were inserted together.
Although this is not a recommendation when you use the full power of the MongoDB API, here is an example using a single collection and embedding only references, following the documentation: one document per department and one document per employee:
db.empdept.insertMany([
{
_id: "DEPT#10",
doc_type: "dept",
deptno: 10,
dname: "ACCOUNTING",
loc: "NEW YORK",
links: [
{ target: "DEPT#10", doc_type: "dept" },
{ target: "EMP#7782", doc_type: "emp" },
{ target: "EMP#7839", doc_type: "emp" }
]
},{
_id: "EMP#7782",
doc_type: "emp",
empno: 7782,
ename: "CLARK",
job: "MANAGER",
sal: 2450,
deptno: 10,
links: [
{ target: "EMP#7782", doc_type: "emp" },
{ target: "DEPT#10", doc_type: "dept" }
]
},{
_id: "EMP#7839",
doc_type: "emp",
empno: 7839,
ename: "KING",
job: "PRESIDENT",
sal: 5000,
deptno: 10,
links: [
{ target: "EMP#7839", doc_type: "emp" },
{ target: "DEPT#10", doc_type: "dept" }
]
}];
With this schema, the following gets the full aggregate as multiple documents without an aggregation pipeline:
db.empdept.find({ "links.target": "DEPT#10" })
This can identify the document using a single index range with a multi-key index on "links.target", but it will have to fetch multiple small documents, introduces strong coupling between those documents without the possibility of validating the reference, and requires an explicit transaction and retry logic to update them.
MongoDB is designed to store aggregates as a single document, and the right schema is simply:
db.empdept.insertOne({
_id: 10,
dname: "ACCOUNTING",
loc: "NEW YORK",
employees: [
{ empno: 7782, ename: "CLARK", job: "MANAGER", sal: 2450 },
{ empno: 7839, ename: "KING", job: "PRESIDENT", sal: 5000 }
]
})
With this schema, you can still query the full aggregate with a simple find():
db.empdept.find({ _id: 10 })
and you can still update a single employee in‑place:
db.empdept.updateOne(
{ _id: 10, "employees.empno": 7782 },
{ $set: { "employees.$.sal": 2600 } }
)
This updates the first matching array item using the positional $ operator. You can be more precise with arrayFilters
db.empdept.updateOne(
{ _id: 10 },
{ $set: { "employees.$[e].sal": 2600 } },
{ arrayFilters: [{ "e.empno": 7782 }] }
)
I've written about this and measured the efficiency in a previous post:
Emulations
A single document per aggregate, with embedded one-to-many relationships, is efficient in MongoDB, not in emulations. For example, in Oracle Autonomous Database, where the document is stored as OSON, the update will rewrite the full document:
ora> db.empdept.updateOne(
{ _id: 10, "employees.empno": 7782 },
{ $set: { "employees.$.sal": 2600 } }
)
{
acknowledged: true,
insertedId: null,
matchedCount: 1,
modifiedCount: 1,
upsertedCount: 0
}
ora> db.aggregate( [ { $sql : " select * from dbms_xplan.display_cursor( format=>'BASIC') " } ] ).forEach(row => print(row.PLAN_TABLE_OUTPUT));
EXPLAINED SQL STATEMENT:
------------------------
update "ORA"."empdept" set "DATA" = :1 where ("RESID" = :2 ) returning "RESID", "ETAG" into :3 , :4
Plan hash value: 893016358
-----------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------
| 0 | UPDATE STATEMENT | |
| 1 | UPDATE | empdept |
| 2 | TABLE ACCESS BY INDEX ROWID| empdept |
| 3 | INDEX UNIQUE SCAN | SYS_C0031043 |
-----------------------------------------------------
You have also the possibility to store the collection with JSON-relational duality views (JDV), but this partial update is not supported:
arrayFilters option is not supported on duality view collections
However, even if the JDV doesn't allow partial updates, the aggregate is stored as individual in SQL tables, and users can switch to SQL statements for partial updates.
Duality Views
In relational databases, applications typically work with a logical view of the data model, while the storage layer may transparently cluster or scatter data into fixed-size blocks. Oracle Database (and MySQL) added JSON-Relational Duality Views (JDV) to present a single logical view for JSON documents, while splitting them to multiple SQL tables. As each SQL tables have their own physical segment(s), aggregates are stored across multiple physical blocks.
If you are a fan of the single‑table or single‑collection idea, you may create a single view. However, it has no advantage in data locality, as you explicitly normalized to multiple tables:
CREATE TABLE dept (
deptno NUMBER CONSTRAINT dept_pk PRIMARY KEY,
dname VARCHAR2(14),
loc VARCHAR2(13)
);
CREATE TABLE emp (
empno NUMBER CONSTRAINT emp_pk PRIMARY KEY,
ename VARCHAR2(10),
job VARCHAR2(9),
sal NUMBER(7,2),
deptno NUMBER NOT NULL,
CONSTRAINT emp_dept_fk FOREIGN KEY (deptno) REFERENCES dept(deptno)
);
CREATE INDEX emp_dept_fk ON emp (deptno);
INSERT INTO dept (deptno, dname, loc) VALUES (10, 'ACCOUNTING', 'NEW YORK');
INSERT INTO emp (empno, ename, job, sal, deptno) VALUES (7782, 'CLARK', 'MANAGER', 2450, 10);
INSERT INTO emp (empno, ename, job, sal, deptno) VALUES (7839, 'KING', 'PRESIDENT', 5000, 10);
-- add many departments to check the number of blocks scanned
INSERT INTO dept(deptno) SELECT rownum+1000
FROM xmltable('0 to 999')
;
-- one view over multiple tables with JDV
CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW empdept_dv AS
SELECT JSON {
'_id' : d.deptno,
'dname': d.dname,
'loc' : d.loc,
'employees' : [
SELECT JSON {
'empno': e.empno,
'ename': e.ename,
'job' : e.job,
'sal' : e.sal
}
FROM emp e WITH INSERT UPDATE DELETE
WHERE e.deptno = d.deptno
]
}
FROM dept d WITH INSERT UPDATE DELETE
;
The view looks like a document collection with an embedded one-to-many relationship, similar to MongoDB’s recommended model, but it’s not truly embedded—the view actually splits it into two tables. We have lost the main advantage of MongoDB: data that's accessed together should be stored together.
I can query it as if it were a single table:
SELECT data
FROM empdept_dv
WHERE
JSON_EXISTS("DATA",'$?(@._id.numberOnly() == $B0)' passing 10 as "B0"
type(strict))
;
but it actually reads two tables:
SQL> select * from dbms_xplan.display_cursor(format=>'ALLSTATS LAST -COST');
PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________________
SQL_ID gfr8jkdwatdnz, child number 0
-------------------------------------
SELECT data FROM empdept_dv WHERE JSON_EXISTS("DATA",'$?(@._id.numberOnly() == $B0)' passing 10 as "B0" type(strict))
Plan hash value: 2755083285
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 | 3 | | | |
| 1 | SORT GROUP BY | | 1 | 1 | 1 | 2 | 2048 | 2048 | 2048 (0)|
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| EMP | 1 | 1 | 2 | 2 | | | |
|* 3 | INDEX RANGE SCAN | EMP_DEPT_FK | 1 | 1 | 2 | 1 | | | |
| 4 | TABLE ACCESS BY INDEX ROWID | DEPT | 1 | 1 | 1 | 3 | | | |
|* 5 | INDEX UNIQUE SCAN | DEPT_PK | 1 | 1 | 1 | 2 | | | |
----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("E"."DEPTNO"=:B1)
5 - access("D"."DEPTNO"=10)
The access is efficient, using the index on the foreign key, but it doesn't colocate a one-to-many relationship as we expect when storing an aggregate into a single document, or when using the single-table design.
You can update through the view, but the following will update a lot more than what you expect, as it rewrites the full document:
UPDATE empdept_dv
SET data = JSON_TRANSFORM(
data,
SET '$.employees[0].sal' = 2600
)
WHERE
JSON_EXISTS("DATA",'$?(@._id.numberOnly() == $B0)' passing 10 as "B0"
type(strict))
;
Using duality views instead of OSON storage allows you to fall back to SQL to update a single row, similar to a single-table design. However, unlike DynamoDB, you cannot group different item types in the same table. These are SQL tables with a fixed schema and no polymorphism, so you cannot store both department and employee attributes in a single table. JSON-Relational Duality Views don’t replace document databases: they act like an object-relational mapper (ORM) with all logic deployed in the SQL database.
Cluster
To get closer to the performance of a document database, I need a single-table duality view capable of storing documents that are split across multiple SQL tables, yet behave as if they were in a single physical table. In Oracle Database, this corresponds to a CLUSTER, where only the key is declared, like DynamoDB tables, and an estimated value size are predefined to fill the fixed-size blocks:
CREATE CLUSTER dept_cluster (
deptno NUMBER ------------------------------------- clustering key
)
SIZE 1024 -- expected bytes per cluster key
;
CREATE INDEX dept_cluster_idx ON CLUSTER dept_cluster
;
The logical SQL tables are stored in the cluster under a common clustering key, instead of allocating their own physical segment:
CREATE TABLE dept (
deptno NUMBER CONSTRAINT dept_pk PRIMARY KEY, --- clustering key
dname VARCHAR2(14),
loc VARCHAR2(13)
)
CLUSTER dept_cluster (deptno) ---------------------- clustering key
;
CREATE TABLE emp (
empno NUMBER CONSTRAINT emp_pk PRIMARY KEY,
ename VARCHAR2(10),
job VARCHAR2(9),
sal NUMBER(7,2),
deptno NUMBER NOT NULL, ---------------------- clustering key
CONSTRAINT emp_dept_fk
FOREIGN KEY (deptno)
REFERENCES dept(deptno)
)
CLUSTER dept_cluster (deptno) ---------------------- clustering key
;
INSERT INTO dept (deptno, dname, loc) VALUES (10, 'ACCOUNTING', 'NEW YORK');
INSERT INTO emp (empno, ename, job, sal, deptno) VALUES (7782, 'CLARK', 'MANAGER', 2450, 10);
INSERT INTO emp (empno, ename, job, sal, deptno) VALUES (7839, 'KING', 'PRESIDENT', 5000, 10);
-- add many departments to check the number of blocks scanned
INSERT INTO dept(deptno) SELECT rownum+1000
FROM xmltable('0 to 999')
;
I created the same duality view as before, as the logical tables are the same, and query it:
SELECT data
FROM empdept_dv
WHERE
JSON_EXISTS("DATA",'$?(@._id.numberOnly() == $B0)' passing 10 as "B0"
type(strict))
;
Physically, it accesses the logical tables in the pre-joined cluster, via its index:
SQL> select * from dbms_xplan.display_cursor(format=>'ALLSTATS LAST');
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID 26py2vsmch5kx, child number 0
-------------------------------------
SELECT data FROM <... (truncated)
by Franck Pachot