Non-First Normal Forms ( 1NF) and MongoDB: an alternative to 4NF to address 3NF anomalies
SQL databases are grounded in relational algebra, but they are not the only databases with a strong theoretical basis. MongoDB, designed as a document database to solve practical engineering problems and improve developer experience, also builds on theory. It supports non–first-normal-form schemas and extends relational operators to work with them. This foundation is described in a 1986 paper, published 23 years earlier: Theory of Non-First Normal Form Relational Databases
MongoDB's aggregation pipeline operators ($unwind, $group, $lookup, set operations) are concrete implementations of the paper's abstract algebraic operators.
I've build the following examples while reading the paper to illustrate the concepts with practical examples.
The Basic ¬1NF Data Model
The paper defines nested relations where attributes can be atomic or relation-valued:
This structure records facts about employees, like some information about their children, their skills, and exams they passed.
This maps directly to a MongoDB document schema:
// This IS the paper's example, expressed as MongoDB documents
db.employees.insertMany([
{
ename: "Smith",
Children: [
{ name: "Sam", dob: "2/10/84" },
{ name: "Sue", dob: "1/20/85" }
],
Skills: [
{
type: "typing",
Exams: [
{ year: 1984, city: "Atlanta" },
{ year: 1985, city: "Dallas" }
]
},
{
type: "dictation",
Exams: [
{ year: 1984, city: "Atlanta" }
]
}
]
},
{
ename: "Watson",
Children: [
{ name: "Sam", dob: "3/12/78" }
],
Skills: [
{
type: "filing",
Exams: [
{ year: 1984, city: "Atlanta" },
{ year: 1975, city: "Austin" },
{ year: 1971, city: "Austin" }
]
},
{
type: "typing",
Exams: [
{ year: 1962, city: "Waco" }
]
}
]
}
])
MongoDB's document model is essentially what Roth called a "database scheme" — a collection of rules where attributes can be zero-order (scalar fields) or higher-order (embedded arrays of documents). The paper's Figure 3-1 is literally a MongoDB collection:
When using an object-oriented language, this model looks natural but this paper is from 1986 so there was another motivation.
Motivation: ¬1NF as an alternative to 4NF decomposition
The paper's example (Figure 1-1) shows an employee relation in 1NF requires 10 rows with massive redundancy:
employee | child | skill
-----------------------------
Smith | Sam | typing
Smith | Sue | typing
Smith | Sam | filing
Smith | Sue | filing
Jones | Joe | typing
Jones | Mike | typing
Jones | Joe | dictation
Jones | Mike | dictation
Jones | Joe | data entry
Jones | Mike | data entry
The ¬1NF version needs only 2 tuples. The paper notes three problems with 1NF:
- Insert anomaly: Adding a child for Jones requires adding 3 tuples (one per skill)
- Update anomaly: Changing Smith's "typing" to "word processing" requires updating multiple rows
- Decomposition cost: The 1NF solution requires splitting into two tables and joining them back
Today, these anomalies are not a major problem because a single updateMany() or insertMany operation in a MongoDB transaction is atomic, just like an UPDATE or INSERT in SQL databases. More importantly, the document model makes multi-document operations rare—the Cartesian explosion of redundant data remains undesirable.
The first four normal forms (1NF, 2NF, 3NF, BCNF) do not address this, because there are no non-trivial functional dependencies in this relation: all three attributes together form the only candidate key. This relation is therefore already in BCNF. This is precisely why Fagin introduced 4NF in 1977.
This example illustrates a multivalued dependency (MVD) violation: children and skills are independent multivalued attributes of an employee, but they’re stored together. This creates a redundant Cartesian product (10 rows for 2 employees). Fourth Normal Form (4NF) addresses this by splitting the independent dependencies into separate tables to remove these anomalies, but at the cost of more tables touched per transaction and more joins per query.
Even a Third Normal Form (3NF) schema can still suffer from insert and update anomalies and data redundancy. MongoDB’s document model can offer many of 4NF’s benefits without incurring the join overhead.
Here is the MongoDB Equivalent:
// The ¬1NF representation — exactly what MongoDB does naturally
db.employees.insertMany([
{
employee: "Smith",
Children: [{ child: "Sam" }, { child: "Sue" }],
Skills: [{ skill: "typing" }, { skill: "filing" }]
},
{
employee: "Jones",
Children: [{ child: "Joe" }, { child: "Mike" }],
Skills: [
{ skill: "typing" },
{ skill: "dictation" },
{ skill: "data entry" }
]
}
])
Adding a new child for Jones is just one operation, no anomaly:
db.employees.updateOne(
{ employee: "Jones" },
{ $push: { Children: { child: "Sara" } } }
)
Changing Smith's "typing" to "word processing" — one operation:
db.employees.updateOne(
{ employee: "Smith", "Skills.skill": "typing" },
{ $set: { "Skills.$.skill": "word processing" } }
)
Querying everything about an employee doesnt need a join:
db.employees.findOne({ employee: "Smith" })
MongoDB exists for a fundamental reason: its document model eliminates the need for decomposition and joins, as well as the update anomalies introduced by the 1NF constraint. Most workloads rely on single-document operations, which MongoDB physically optimizes as single-shard transactions, a single disk read or write, and a single replication operation. These inserts and updates lock the document, which serves as the consistency boundary, but modify only the fields whose values change and update only the corresponding index entries.
Nest and Unnest Operators
The paper defines nest (ν) to aggregate flat rows into nested structures, and unnest (μ) to flatten nested structures back:
-
ν_{B=(C,D)}(r): nest attributes C,D into a new nested relation B -
μ_{B}(r): unnest nested relation B
MongoDB's aggregation pipeline has direct equivalents.
The paper's unnest (μ) operator is $unwind:
db.employees.aggregate([
{ $unwind: "$Skills" }
])
Each skill becomes its own document, duplicating employee info.
Deep unnest (μ* in the paper) is possible with multiple $unnest stages:
db.employees.aggregate([
{ $unwind: "$Children" },
{ $unwind: "$Skills" }
])
This produces the full 1NF Cartesian expansion, flattening both Children and Skills.
When starting with a flat collection, the paper's nest operator (ν) is $group:
db.flat_employees.aggregate([
{
$group: {
_id: "$employee",
Children: { $addToSet: { child: "$child" } },
Skills: { $addToSet: { skill: "$skill" } }
}
}
])
The paper proves that the order of unnesting doesn't matter (Thomas & Fischer's result). In MongoDB, these produce the same fully-flat result:
db.employees.aggregate([
{ $unwind: "$Children" },
{ $unwind: "$Skills" }
])
db.employees.aggregate([
{ $unwind: "$Skills" },
{ $unwind: "$Children" }
])
The paper also notes that nest is not always an inverse for unnest in general, but IS an inverse for Partitioned Normal Form (PNF) relations.
Partitioned Normal Form (PNF)
PNF requires that the atomic attributes of each relation (and each nested relation) form a key. The paper shows a pathological relation (Figure 1-3) that violates PNF:
Smith, the same employee, has two different skill sets, and Jones has a duplicate across sets.
In MongoDB, PNF corresponds to the fundamental design principle that _id determines the document. The pathological case above would mean having two documents for "Smith" with different Skills arrays — which is exactly what MongoDB's _id uniqueness constraint prevents at the collection level.
Without using _id as the employee identifier, we could have two employees with the same name:
db.bad_design.insertMany([
{ employee: "Smith", Skills: ["typing", "filing"] },
{ employee: "Smith", Skills: ["sorting", "mailing"] }
])
The correct design if we don't want to use _id is enforcing PNF with a unique index:
db.good_design.createIndex({ employee: 1 }, { unique: true })
db.good_design.insertOne({
employee: "Smith",
Skills: ["typing", "filing", "sorting", "mailing"]
})
Like in many papers, the employee name is used as an identifier (two "Smith" is one person) but obviously in real life there's a generated identifier to identify the physical person.
The paper, in theorem 5-1, proves that PNF is closed under unnesting. In MongoDB terms: if your documents are well-designed (one document per logical entity), then $unwind won't create ambiguous or inconsistent flat results.
For nested relations, PNF also applies recursively.
MongoDB schema validation can enforce the PNF constraint by comparing the size of an array with the size when duplicates are ignored (using $setUnion with an empty array):
db.createCollection("employees", {
validator: {
$and: [
// JSON Schema for structure and types
{
$jsonSchema: {
bsonType: "object",
required: ["ename"],
properties: {
ename: { bsonType: "string" },
Children: { bsonType: "array", items: { bsonType: "object", required: ["name"], properties: { name: { bsonType: "string" }, dob: { bsonType: "string" } } } },
Skills: { bsonType: "array", items: { bsonType: "object", required: ["type"], properties: { type: { bsonType: "string" },
Exams: { bsonType: "array", items: { bsonType: "object", required: ["year"], properties: { year: { bsonType: "int" }, city: { bsonType: "string" } } } } } } }
}
}
},
// PNF uniqueness constraints using $expr
{
$expr: {
$and: [
// Level 1: Children.name must be unique within the document
{
$eq: [
{ $size: { $ifNull: ["$Children.name", []] } },
{ $size: { $setUnion: [{ $ifNull: ["$Children.name", []] }] } }
]
},
// Level 1: Skills.type must be unique within the document
{
$eq: [
{ $size: { $ifNull: ["$Skills.type", []] } },
{ $size: { $setUnion: [{ $ifNull: ["$Skills.type", []] }] } }
]
},
// Level 2: Within EACH Skills element, Exams.year must be unique
{
$allElementsTrue: {
$map: {
input: { $ifNull: ["$Skills", []] },
as: "skill",
in: { $eq: [ { $size: { $ifNull: ["$$skill.Exams.year", []] } }, { $size: { $setUnion: [{ $ifNull: ["$$skill.Exams.year", []] }] } } ] }
}
}
}
]
}
}
]
},
validationLevel: "strict",
validationAction: "error"
});
The fact that nest is not always the inverse of unnest is a well-known MongoDB pitfall:
db.employees.drop()
db.employees.insertMany([
{
_id: 1,
employee: "Smith",
Children: [{ child: "Sam" }, { child: "Sue" }],
Skills: [{ skill: "typing" }]
},
{
_id: 2,
employee: "Smith",
Children: [{ child: "Tom" }],
Skills: [{ skill: "filing" }, { skill: "sorting" }]
},
{
_id: 3,
employee: "Jones",
Children: [{ child: "Joe" }],
Skills: [{ skill: "typing" }]
}
])
db.employees.aggregate([
// unnest (μ)
{ $unwind: "$Children" },
// nest (ν), grouping by employee, which is NOT a key
{
$group: {
_id: "$employee",
Children: { $addToSet: "$Children" },
Skills: { $first: "$Skills" }
}
},
// Clean up
{ $project: { _id: 0, employee: "$_id", Children: 1, Skills: 1 } },
{ $sort: { employee: 1 } }
])
This results in two Smiths collapsed into one:
[
{
Children: [ { child: 'Joe' } ],
Skills: [ { skill: 'typing' } ],
employee: 'Jones'
},
{
Children: [ { child: 'Tom' }, { child: 'Sam' }, { child: 'Sue' } ],
Skills: [ { skill: 'typing' } ],
employee: 'Smith'
}
]
The correct way is grouping by the proper key:
db.employees.aggregate([
// unnest (μ)
{ $unwind: "$Children" },
// nest (ν), grouping by _id, which IS a key (PNF)
{
$group: {
_id: "$_id",
employee: { $first: "$employee" },
Children: { $push: "$Children" },
Skills: { $first: "$Skills" }
}
},
// Clean up
{ $project: { _id: 1, employee: 1, Children: 1, Skills: 1 } },
{ $sort: { _id: 1 } }
])
This is the paper's Theorem 5-2 demonstrated in MongoDB: nest inverts unnest if and only if the grouping key satisfies PNF.
Extended Algebra Operators
I a previous post, From Relational Algebra to Document Semantics I explained how MongoDB extended the semantics of the relational selection Selection (σ) to non-1NF schemas, and mentioned that other relational operations are available in MongoDB. They are covered by the ¬1NF paper.
The paper defines extended union (∪ᵉ) to merge nested relations for tuples that agree on atomic attributes, rather than treating them as separate tuples. In MongoDB, this is achieved with $merge or with aggregation. Suppose we want to merge new course data into existing student records:
db.students.insertMany([
{
sname: "Jones",
Courses: [
{ cname: "Math", grade: "A" },
{ cname: "Science", grade: "B" }
]
},
{
sname: "Smith",
Courses: [
{ cname: "Math", grade: "A" },
{ cname: "Physics", grade: "C" },
{ cname: "Science", grade: "A" }
]
}
])
db.students.updateOne(
{ sname: "Jones" },
{ $addToSet: { Courses: { cname: "Physics", grade: "B" } } }
)
db.students.updateOne(
{ sname: "Smith" },
{
$addToSet: {
Courses: {
$each: [
{ cname: "Chemistry", grade: "A" },
{ cname: "English", grade: "B" }
]
}
}
}
)
This added "Physics:B" to Jones, and "Chemistry:A", "English:B" to Smith without adding two tuples with different course sets like a standard union would do:
db.students.find()
[
{
_id: ObjectId('69c3a72344fc089068d4b0c2'),
sname: 'Jones',
Courses: [
{ cname: 'Math', grade: 'A' },
{ cname: 'Science', grade: 'B' },
{ cname: 'Physics', grade: 'B' }
]
},
{
_id: ObjectId('69c3a72344fc089068d4b0c3'),
sname: 'Smith',
Courses: [
{ cname: 'Math', grade: 'A' },
{ cname: 'Physics', grade: 'C' },
{ cname: 'Science', grade: 'A' },
{ cname: 'Chemistry', grade: 'A' },
{ cname: 'English', grade: 'B' }
]
}
]
The paper emphasizes that standard set union treats entire tuples as atomic — if two tuples differ at all, both appear in the result. Extended union is ... (truncated)