In sharding your database by using elastic clusters, you are adding an additional tier
to your database infrastructure. DocumentDB will add a request router layer that
will be the primary point to handle your request. After the request router parses
the request, it will forward the request to the relevant shard(s) to process the
request and return the result to your client. Note that, because of this additional
request router tier, there may be a slight increase in overall latency when moving to
an elastic cluster.
In creating your elastic cluster, you’ll need to choose which collections are sharded.
For those that are unsharded, the entire collection will be located on a single shard
in your elastic cluster. For collections that are sharded, they will be split across the
instances according to a shard key that you specify.
Choosing a shard key for your collection is very important to see the benets of
elastic clusters. You’ll want to choose a shard key that is evenly distributed across
your dataset so that the data ends up being well-distributed across your shards.
If you use an unbalanced shard key, you won’t get the full benet of splitting up
your data.
Additionally, you want to use a shard key that correlates with your access patterns.
Ideally you are using a shard key that contains an exact match in all or most of your
database operations. This way, your operations can be directed to a single shard to
handle the request rather than doing a scatter-gather operation across all of the
shards. Again, this will allow you to take full advantage of the decision to shard.
Finally, elastic clusters can also be helpful in the rare case where you need to
increase the number of connections to your DocumentDB cluster. While a single
DocumentDB instance maxes out at 30,000 open connections, an elastic cluster
supports up to 300,000 open connections.
DocumentDB provides a number of mechanisms to scale your database to
meet your usage. In general, try to avoid sharding your database with elastic
clusters where you can, due to the extra latency and planning work that requires.
In the event that you do need to scale via sharding, DocumentDB provides a
straightforward mechanism via elastic clusters.
Reduce I/O
A second advanced tip is to reduce your I/O consumption in DocumentDB. In
DocumentDB, I/O is cost. It is cost not only literally, in the sense that you are
charged directly for I/O consumption, but also in the sense that I/O reduces your
performance by consuming scarce resources.
To understand how to reduce I/O consumption, let’s rst review some details about
DocumentDB’s underlying architecture. Then, we’ll look at some tips for optimizing
your I/O consumption.
Under the hood, DocumentDB is using a multi-version concurrency control (MVCC)
architecture. This is common to many database systems and can assist with
handling concurrent operations without the use of locks. A MVCC architecture may
have multiple versions of a particular document that existed at dierent times in
the database lifecycle.
To understand DocumentDB’s eects on I/O, you need to know both what happens
during individual write operations (inserts, updates, and deletes) as well as the
garbage collection process.
A caveat up front -- this is an advanced topic that goes deep into DocumentDB
architecture. As you learn about these MVCC internals, do not confuse this with
whether it aects the correctness of your results. You’ll read about multiple versions
of your document and the impact on indexes, but the query engine understands
how to handle these versions to return the proper result.
Introduction
The DocumentDB API
The relational
model
Adapting to
the document model
Documents and
the _id eld
Reading documents
Inserting documents
Sorting, projecting,
and other options
Updating
documents
Deleting
documents
Aggregation
framework
Operations
conclusion
Schema management
in DocumentDB
– Managing your schema
in your application code
– Managing relationships
with embedding
– Managing relationships
with referencing
– Compound indexes
– Sparse indexes
– Multi-key indexes
Indexes in DocumentDB
– Managing your schema
with DocumentDB’s JSON
Schema validation
Managing relationships
in DocumentDB
Data modeling
patterns
Transactions
Advanced tips
Conclusion
Scaling with
DocumentDB
Reduce I/O
Reduce document size
Use the aggregation
framework wisely
– Handling relationships
with duplication
42