From 1 ms to 100 µs: Extreme Performance Tuning for MongoDB

By the MinervaDB MongoDB Support Team

There is a particular moment in database engineering that every serious team eventually faces — the moment when “fast enough” stops being fast enough. You have already indexed your collections, tuned your queries, and upgraded your hardware. Latency is sitting at around 1 ms on average, and yet the business is asking for more. Tighter SLAs, real-time dashboards, sub-millisecond trading confirmations, latency-sensitive recommendation engines — the list of demands keeps growing. The question on the table is no longer whether MongoDB can get faster. It is how far you are willing to go to make it happen.

At MinervaDB, we have spent years working with engineering teams operating MongoDB deployments at scale — from mid-sized SaaS platforms to globally distributed financial systems. This post documents the techniques we reach for when the target is not 1 ms but 100 µs: the extreme end of MongoDB performance tuning where every configuration decision, every schema choice, and every hardware interaction carries measurable weight.

Why the Gap Between 1 ms and 100 µs Is Not Trivial

Techniques for Effective MongoDB Performance Tuning

Reducing latency by a factor of ten sounds straightforward on paper. In practice, it is one of the most demanding engineering challenges in the database world. At the 1 ms tier, most of the gains come from standard best practices — proper indexing, avoiding collection scans, and sensible write concern settings. At the 100 µs tier, you are fighting against physics: network round-trip times, kernel scheduling jitter, storage controller latency, and the overhead of MongoDB’s own internal processing pipeline.

The workloads that need sub-millisecond MongoDB response times are typically characterized by high read-to-write ratios, simple lookup patterns (often single-document reads by primary key or a narrow compound index), extremely tight SLA requirements, and co-located application and database infrastructure. If your workload does not fit this profile, the techniques in this post may still yield meaningful gains, but 100 µs as a P99 target is realistic only when the access pattern and infrastructure are aligned from the ground up.

Memory Architecture: Keeping Everything in WiredTiger Cache

The single biggest variable in MongoDB read latency is whether data lives in memory or must be retrieved from disk. WiredTiger’s cache behavior is the first place to look when targeting extreme latency numbers. By default, WiredTiger uses 50% of available RAM minus 1 GB. For latency-critical workloads, this should be tuned aggressively — allocate as much memory as possible to the cache while leaving enough headroom for the operating system page cache, connection overhead, and aggregation pipeline memory. The goal is a cache hit ratio consistently above 99.5%.

Use the serverStatus command to monitor WiredTiger cache statistics in real time:

db.serverStatus().wiredTiger.cache

Look at the ratio of “bytes currently in the cache” to “maximum bytes configured” and, more importantly, watch “pages read into cache” versus “pages requested from the cache.” Any meaningful number of disk reads in a latency-sensitive read path is a problem. If your working set exceeds available memory, no amount of query tuning will get you to 100 µs. Hardware investment becomes unavoidable at that point.

Index Design at the Microsecond Level

Indexing for low latency is not just about having indexes — it is about having precisely the right indexes with minimal overhead. Every index you add consumes memory and adds write overhead. For sub-millisecond reads, the priority is covering indexes: indexes that satisfy the entire query without requiring a document fetch from the collection. When MongoDB can return query results directly from an index, it eliminates the additional I/O of loading the full document, which matters enormously at this latency tier.

Consider the field ordering within compound indexes carefully. The equality-sort-range (ESR) rule is well-established, but at extreme latency targets, the specifics matter even more. An index that serves a query with two equality predicates and no sort will outperform a general-purpose compound index that is also designed to support sort operations. Specialized indexes for specific high-frequency access patterns are worth the management overhead.

Index prefix compression in WiredTiger is enabled by default and helps keep indexes compact in memory, but for the most frequently accessed indexes, it is worth profiling whether the decompression overhead on reads outweighs the cache efficiency benefit. In most cases, the default behavior is optimal, but this assumption should be validated with profiling data rather than accepted uncritically.

Connection Management and Driver Configuration

Application-to-database latency at the sub-millisecond level is heavily influenced by connection pooling behavior. Each new TCP connection introduces overhead that dwarfs the actual query execution time. The MongoDB drivers all support connection pooling, but the default pool sizes and connection lifecycle settings are designed for general-purpose applications, not extreme-latency workloads.

Set minPoolSize to match your steady-state concurrency requirements so that connections are established at startup and not on demand. Set maxPoolSize conservatively — connection overhead on the server side accumulates, and a bloated connection pool creates scheduling pressure on MongoDB’s thread pool. For a latency-critical service with predictable concurrency, a tight connection pool with pre-warmed connections is significantly better than a large pool with cold connections.

If your application and MongoDB are deployed in the same availability zone or on the same physical host, also evaluate whether TCP_NODELAY is enabled. By default, the Nagle algorithm is disabled in modern MongoDB deployments, but verify this in your specific environment. Even a 40 µs buffering delay caused by Nagle can be the difference between meeting and missing a 100 µs target.

Write Concern and Read Concern: The Latency Trade-Off You Control

Write concern and read concern settings are often treated as correctness decisions — and they are — but they also have direct latency implications that are frequently underestimated. Writing with w: "majority" introduces replication round-trip latency. For workloads where individual writes are not latency-critical but reads are, this is a reasonable trade-off. For workloads where write latency also matters, w: 1 with journaling enabled (j: true) offers a middle ground between durability and speed.

On the read side, readConcern: "local" is the lowest-latency option and is appropriate when reading from the primary with no requirement to wait for replication acknowledgment. readConcern: "linearizable" offers the strongest consistency guarantees but introduces substantial latency overhead. For the vast majority of sub-millisecond read use cases, local or available is the correct choice.

If your workload allows reads from secondaries, replica set read preferences can distribute load across members, but routing latency and replication lag must be factored into SLA calculations. Reads from secondaries are not always faster than reads from the primary — the benefit depends heavily on replica topology and network placement.

Operating System and Kernel Tuning

MongoDB performance at the extreme end is inseparable from operating system configuration. Several kernel-level settings are frequently left at defaults that are inappropriate for low-latency database workloads.

Transparent huge pages (THP) should be disabled. This is well-documented by MongoDB and consistent with our experience — THP causes unpredictable memory allocation latency spikes that appear as high-percentile outliers in latency distributions. Disable THP at the kernel level and add the disabling logic to your server startup scripts to ensure it survives reboots.

NUMA (Non-Uniform Memory Access) topology is another source of latency variance. When MongoDB processes run on a NUMA node and access memory allocated on a different node, the latency penalty can be significant. Use numactl --interleave=all to start mongod, which spreads memory allocations across NUMA nodes and reduces the likelihood of cross-node memory access penalties. On modern multi-socket servers, this change alone can meaningfully improve tail latency.

CPU frequency scaling should be set to the “performance” governor rather than the default “ondemand” or “powersave” modes. MongoDB’s internal operations — including lock management, cursor handling, and connection scheduling — are sensitive to CPU clock speed. A CPU that ramps up from a lower frequency in response to load introduces latency variability at exactly the moments of peak demand.

Storage Layer: NVMe, Direct I/O, and Avoiding the Bottleneck

Even with a perfectly tuned WiredTiger cache, some I/O is inevitable — checkpoint flushes, journal writes, index builds, and working set expansion during traffic spikes all touch storage. For sub-millisecond latency targets, the storage tier must be NVMe-based, locally attached, and sized with enough IOPS headroom that storage never becomes a bottleneck during peak load.

MongoDB’s WiredTiger storage engine uses its own block cache rather than relying on the OS page cache for data reads. This is intentional and generally correct, but it means the OS page cache is less useful for MongoDB data than for some other databases. Where the OS page cache does help is in journal and oplog reads during secondary catch-up and certain administrative operations.

Separate journal and data directories onto different NVMe devices when possible. Journal writes are sequential and small; data writes are more random. Isolating them prevents the journal flush from interfering with data I/O at the storage controller level. This is particularly important during high write volume periods when journal pressure increases.

Schema Design for Cache Efficiency

Schema design at extreme latency targets is a discipline in its own right. MongoDB’s flexible document model is a strength, but it can become a liability when documents are large, deeply nested, or contain arrays that require significant traversal to satisfy a query predicate. For sub-millisecond reads, documents should be lean — contain only the fields that will actually be read, use shorter field names (verbose field names consume cache space at scale), and avoid deeply nested structures that increase deserialization overhead in the driver.

Consider pre-computing derived values rather than calculating them at query time. If a query frequently needs a value that is derived from multiple fields — a status flag computed from several conditions, a score calculated from sub-fields — storing that computed value directly in the document and keeping it updated on write is almost always faster than computing it on read, especially at high query rates.

For workloads with a small set of hot documents that are read extremely frequently, the subset pattern — splitting frequently accessed fields into a separate, smaller document — can dramatically improve cache utilization. A 10 KB document with 2 KB of hot fields will consume five times as much cache space as a 2 KB hot-fields document. At scale, this difference in cache efficiency translates directly into cache hit rates and, ultimately, latency.

Profiling the Real Bottleneck

One of the most common mistakes we see in high-performance MongoDB engagements is tuning based on assumptions rather than measurements. Before making any of the changes described above, establish a baseline with real production traffic or a representative load test. Use MongoDB’s database profiler at level 2 (all operations) in a staging environment to capture actual query execution statistics. Examine the keysExamined to docsReturned ratio, the executionTimeMillis breakdown, and the planSummary to confirm index usage.

For production environments where full profiling is too expensive, MongoDB’s $currentOp aggregation stage and the slow query log (configurable via slowms) provide targeted visibility into operations that are not meeting latency targets. Combine these with system-level metrics from tools like mongostat, mongotop, and your preferred APM platform to build a complete picture of where time is actually being spent.

At MinervaDB, we often find that the final jump from 500 µs to 100 µs requires addressing three or four distinct issues simultaneously — a slightly suboptimal index, a kernel setting that adds scheduling jitter, a connection pool that occasionally runs cold, and a few oversized documents in an otherwise lean collection. No single change is a silver bullet. The path to extreme performance is paved with careful measurement, incremental adjustment, and a willingness to look at every layer of the stack.

Closing Perspective

Pushing MongoDB from 1 ms to 100 µs average latency is not a configuration exercise — it is an engineering discipline. It requires a detailed understanding of WiredTiger’s internal behavior, the operating system’s interaction with memory and CPU, the storage subsystem’s I/O characteristics, and the application’s own connection and query patterns. The techniques described here represent the approaches that have consistently delivered results in our most demanding client engagements, but they must be applied in the context of your specific workload and infrastructure.

If your team is facing a MongoDB performance challenge — whether you are trying to break the 1 ms barrier or simply ensure that your 99th percentile latency stays below your SLA threshold — the MinervaDB MongoDB Support Team is available to help. We provide hands-on performance assessments, architecture reviews, and ongoing operational support for MongoDB deployments of all sizes. Reach out to us to discuss what extreme performance tuning looks like for your environment.

About MinervaDB Corporation 286 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.