{ w: 1 } Asynchronous Writes and Conflict Resolution in MongoDB
MongoDB guarantees durability—the D in ACID—over the network with strong consistency—the C in the CAP theorem—by default. It still maintains high availability: in the event of a network partition, the majority of nodes continue to serve consistent reads and writes transparently, without raising errors to the application.
A consensus protocol based on Raft is used to achieve this at two levels:
- Writes are directed to the shard's primary, which coordinates consistency between the collection and the indexes. Raft is used to elect one replica as primary, with the others acting as secondaries.
- Writes to the shard's primary are replicated to the secondaries and acknowledged once a majority has guaranteed durability on persistent storage. The equivalent of the Raft log is the data itself—the transaction oplogs.
It's important to distinguish the two types of consensus involved: one for controlling replica roles and one for the replication of data itself. By comparison, failover automation around monolithic databases like PostgreSQL can use a consensus protocol to elect a primary (as Patroni does), but replication itself is built into PostgreSQL and does not rely on a consensus protocol—a failure in the middle may leave inconsistency between replicas.
Trade-offs between performance and consistency
Consensus on writes increases latency, especially in multi-region deployments, because it requires synchronous replication and waiting on the network, but it guarantees no data loss in disaster recovery scenarios (RPO = 0). Some workloads may prefer lower latency and accept limited data loss (for example, a couple of seconds of RPO after a datacenter burns). If you ingest data from IoT devices, you may favor fast ingestion at the risk of losing some data in such a disaster. Similarly, when migrating from another database, you might prefer fast synchronization and, in case of infrastructure failure, simply restart the migration from before the failure point. In such cases, you can use {w:1} write concern in MongoDB instead of the default {w:"majority"}.
Most failures are not full-scale disasters where an entire data center is lost, but transient issues with short network disconnections. With {w:1}, the primary risk is not data loss—because writes can be synchronized eventually—but split brain, where both sides of a network partition continue to accept writes. This is where the two levels of consensus matter:
- A new primary is elected, and the old primary steps down, limiting the split-brain window to a few seconds.
- With the default
{w:"majority"}, writes that cannot reach a majority are not acknowledged on the side of the partition without a quorum. This prevents split brain. However, with{w:1}, those writes are acknowledged until the old primary steps down.
Because the failure is transient, when the old primary rejoins, no data is physically lost: writes from both sides still exist. However, these writes may conflict, resulting in a diverging database state with two branches. As with any asynchronous replication, this requires conflict resolution. MongoDB handles this as follows:
- Writes from the new primary are preserved, as this is where the application has continued to make progress.
- Writes that occurred on the old primary during the brief split-brain window are rolled back, and it pulls the more recent writes from the new primary.
Thus, when you use {w:1}, you accept the possibility of limited data loss in the event of a failure. Once the node is back, these writes are not entirely lost, but they cannot be merged automatically. MongoDB stores them as BSON files in a rollback directory so you can inspect them and perform manual conflict resolution if needed.
This conflict resolution is a Recover To a Timestamp (RTT).
Demo on a Docker lab
Let's try it. I start 3 containers as a replica set:
docker network create lab
docker run --network lab --name m1 --hostname m1 -d mongo --replSet rs0
docker run --network lab --name m2 --hostname m2 -d mongo --replSet rs0
docker run --network lab --name m3 --hostname m3 -d mongo --replSet rs0
docker exec -it m1 mongosh --eval '
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "m1:27017", priority: 3 },
{ _id: 1, host: "m2:27017", priority: 2 },
{ _id: 2, host: "m3:27017", priority: 1 }
]
})
'
until
docker exec -it m1 mongosh --eval "rs.status().members.forEach(m => print(m.name, m.stateStr))" |
grep -C3 "m1:27017 PRIMARY"
do sleep 1 ; done
The last command waits until m1 is the primary, as set by its priority. I do that to make the demo reproducible with simple copy-paste.
I insert "XXX-10" when connected to m1:
docker exec -it m1 mongosh --eval '
db.demo.insertOne(
{ _id:"XXX-10" , date:new Date() },
{ writeConcern: {w: "1"} }
)
'
{ acknowledged: true, insertedId: 'XXX-10' }
I disconnect the secondary m2:
docker network disconnect lab m2
With a replication factor of 3, the cluster is resilient to one failure and I insert "XXX-11", when connected to the primary:
docker exec -it m1 mongosh --eval '
db.demo.insertOne(
{ _id:"XXX-11" , date:new Date() },
{ writeConcern: {w: "1"} }
)
'
{ acknowledged: true, insertedId: 'XXX-11' }
I disconnect m1, the current primary, and reconnect m2, and immediately insert "XXX-12", still connected to m1:
docker network disconnect lab m1
docker network connect lab m2
docker exec -it m1 mongosh --eval '
db.demo.insertOne(
{ _id:"XXX-12" , date:new Date() },
{ writeConcern: {w: "1"} }
)
'
{ acknowledged: true, insertedId: 'XXX-12' }
Here, m1 is still a primary for a short period before it detects it cannot reach the majority of replicas and steps down. If the write concern was {w: "majority"} it would have waited and failed, not able to sync to the quorum, but with {w: "1"} the replication is asynchronous and the write is acknowledged when written to local disks.
Two seconds later, a similar write fails because the primary stepped down:
sleep 2
docker exec -it m1 mongosh --eval '
db.demo.insertOne(
{ _id:"XXX-13" , date:new Date() },
{ writeConcern: {w: "1"} }
)
'
MongoServerError: not primary
I wait that m2 is the new primary, as set by priority, and connect to it to insert "XXX-20":
until
docker exec -it m2 mongosh --eval "rs.status().members.forEach(m => print(m.name, m.stateStr))" |
grep -C3 "m2:27017 PRIMARY"
do sleep 1 ; done
docker exec -it m2 mongosh --eval '
db.demo.insertOne(
{ _id:"XXX-20" , date:new Date() },
{ writeConcern: {w: "1"} }
)
'
{ acknowledged: true, insertedId: 'XXX-20' }
No nodes are down, it's only a network partition, and I can read from all nodes as long as I don't connect through the network. I query the collection on each side:
docker exec -it m1 mongosh --eval 'db.demo.find()'
docker exec -it m2 mongosh --eval 'db.demo.find()'
docker exec -it m3 mongosh --eval 'db.demo.find()'
The inconsistency is visible, "XXX-12" is only in m1 and "XXX-20" only in m2 and m3:
I reconnect m1 so that all nodes can communicate and synchronize their state:
docker network connect lab m1
I query again and all nodes show the same values:
"XXX-12" has disappeared and all nodes are now synchronized to the current state. When it rejoined, m1 rolled back the operations that occurred during the split-brain window. This is expected and acceptable, since the write used a { w: 1 } write concern, which explicitly allows limited data loss in case of failure in order to avoid cross-network latency on each write.
The rolled back operations are not lost, MongoDB logged them in a rollback directory in the BSON format, with the rolled back document as well as the related oplog.
I read and decode all BSON in the rollback directory:
docker exec -i m1 bash -c '
for f in /data/db/rollback/*/removed.*.bson
do
echo "$f"
bsondump $f --pretty
done
' | egrep --color=auto '^|^/.*|.*("op":|"XXX-..").*'
The deleted document is in /data/db/rollback/0ae03154-0a51-4276-ac62-50d73ad31fe0/removed.2026-02-10T10-40-58.1.bson:
{
"_id": "XXX-12",
"date": {
"$date": {
"$numberLong": "1770719868965"
}
}
}
The deleted oplog for the related insert is in /data/db/rollback/local.oplog.rs/removed.2026-02-10T10-40-58.0.bson:
{
"lsid": {
"id": {
"$binary": {
"base64": "erR2AoFXS3mbcX4BJSiWjw==",
"subType": "04"
}
},
"uid": {
"$binary": {
"base64": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
"subType": "00"
}
}
},
"txnNumber": {
"$numberLong": "1"
},
"op": "i",
"ns": "test.demo",
"ui": {
"$binary": {
"base64": "CuAxVApRQnasYlDXOtMf4A==",
"subType": "04"
}
},
"o": {
"_id": "XXX-12",
"date": {
"$date": {
"$numberLong": "1770719868965"
}
}
},
"o2": {
"_id": "XXX-12"
},
"stmtId": {
"$numberInt": "0"
},
"ts": {
"$timestamp": {
"t": 1770719868,
"i": 1
}
},
"t": {
"$numberLong": "1"
},
"v": {
"$numberLong": "2"
},
"wall": {
"$date": {
"$numberLong": "1770719868983"
}
},
"prevOpTime": {
"ts": {
"$timestamp": {
"t": 0,
"i": 0
}
},
"t": {
"$numberLong": "-1"
}
}
}
Conclusion: beyond Raft
By default, MongoDB favors strong consistency and durability: writes use { w: "majority" }, are majority-committed, never rolled back, and reads with readConcern: "majority" never observe rolled-back data. In this mode, MongoDB behaves like a classic Raft system: once an operation is committed, it is final.
MongoDB also lets you explicitly relax that guarantee by choosing a weaker write concern such as { w: 1 }. In doing so, you tell the system: "Prioritize availability and latency over immediate global consistency." The demo shows what that implies:
- During a transient network partition, two primaries can briefly accept writes.
- Both branches of history are durably written to disk.
- When the partition heals, MongoDB deterministically chooses the majority branch.
- Operations from the losing branch are rolled back—but not discarded. They are preserved as BSON files with their oplog entries.
- The node then recovers to a majority-committed timestamp (RTT) and rolls forward.
This rollback behavior is where MongoDB intentionally diverges from vanilla Raft.
In classic Raft, the replicated log is the source of truth, and committed log entries are never rolled back. Raft assumes a linearizable, strongly consistent state machine where the application does not expect divergence. MongoDB, by contrast, comes from a NoSQL and event-driven background, where asynchronous replication, eventual consistency, and application-level reconciliation are sometimes acceptable trade-offs.
As a result:
- MongoDB still uses Raft semantics for leader election and terms, so two primaries are never elected in the same term.
- For data replication, MongoDB extends the model with Recover To a Timestamp (RTT) rollback.
- This allows MongoDB to safely support lower write concerns, fast ingestion, multi-region latency optimization, and migration workloads—without silently corrupting state.
In short, MongoDB replication is based on Raft, but adds rollback semantics to support real-world distributed application patterns. Rollbacks happen only when you explicitly allow them, never with majority writes, and they are fully auditable and recoverable.