Designing for Amazon DynamoDB

DynamoDB may be one of the simplest databases ever designed. Sadly, that does not translate into DynamoDB is the simplest database to design for. The simple set of rules that governs DynamoDB is easy to grasp, but hard to apply to anything non-trivial. Yet this does not mean that it is impossible or not a good idea. When used and designed for correctly DynamoDB is one of the best performing, least maintenance databases you will find. Instantaneous scaling to any demand, pay-per-request pricing and a completely serverless, zero-maintenance model is why you would go through the difficult process of designing for it.

To successfully design for DynamoDB we must understand its serverless nature, what building blocks available to us, the seemingly simple rules that follow and how we can apply techniques to design for not-so-simple cases. In this blog we will show how you can do more than you believed you could within the limitations imposed on you. So without further ado, a gentle introduction into the complex world of DynamoDB design.

A serverless NoSQL database

AWS describes DynamoDB as a serverless “key-value and document database”. That is not a description that allows us to form an intuitive understanding, so let’s take a closer look.

DynamoDB is a database that fully leverages the advantages of the cloud by instantly scaling to any demand, requiring no maintenance and exposing no infrastructure in the form of servers or virtual machines. The basic unit of DynamoDB is a Table in which you place Items that contain Attributes. A technical description of DynamoDB can be as short as the following:

– DynamoDB is a NoSQL database.
– DynamoDB supports lookup of entries by key.
– DynamoDB supports storing more than a value per key, you can store a *document* with columns.
– The columns can be typed.

This is still hard to imagine, so an example in the form of a simple table is warranted:

The table requires us to define a partition key, other attributes are dynamic and optional (hence NoSQL). You can look up an Item in the Table by querying on the partition key (lookup of entries by key). You can store multiple attributes instead of just a single value (hence document store). Each attribute can be typed as a Number, String, JSON document, etc. (hence typed columns). This is basically a description of a modern NoSQL database how you would imagine it, but there is more that makes DynamoDB one of a kind.

The building blocks of DynamoDB

We can describe how DynamoDB works by going through a mere five features. Ready?

Every table must have a primary key

Every table must have a primary key. A primary key consists of one of the following two:

1. A partition key (PK)
2. A partition key & a sort key (SK)

The primary key must be unique for different items. If you go for option 1, your partition key must always be unique. If you go for option 2, the combination of partition key and sort key must be unique.

Partition keys determine the partition

To scale to any size DynamoDB splits data into so called partitions. It uses the partition key in your item to decide which partition to use. When reading the item back, it uses the partition key in your query to quickly find the correct partition regardless of the total amount of partitions. This allows for fast inserts and fast lookup, as long as you actually query by partition key.

DynamoDB hashes your partition key twice before using it to store your data. This is why the partition key is also called the hash key. This hashing spreads the data as uniformly as possible over your partitions.

Sort keys allow quick search within a partition

The second (optional) part of your primary key is the sort key. When you use both a partition- and sort key as a primary key, you can query by either the partition key alone, or both of them together. Querying by partition key will return all the items that are stored under that key, meaning all unique combinations of PK & SK. The sort key can be used to further narrow down your query. You can specify the first or last X number of items or use a between or begins_with type of selection. Because sort keys determine ranges of data they are also called range keys in the documentation.

Local secondary indices allow us to use multiple sort key

A local secondary index (LSI) simply allows us to use another column as our sort key. Mind that the partition key must be unchanged. We can use an LSI when we need another way to sort our data, or our items are also uniquely described and queried by another combination of partition- and sort key. We will see examples later.

Global secondary indices allow us to use multiple partition keys

Global secondary indices (GSI’s) are a powerful tool when designing tables. They allow us to specify another partition key and then query by it. This actually creates a second table for us, and copies as much data as we want to this table. We can copy the partition key only, the partition key and a number of items, or simple everything to this table. This might seem like the trick that allows us to use DynamoDB as we see fit, but there are a number of limits:

– You pay for the copying to another table.
– You can have no more than 20 GSI’s per table.
– You must specify the GSI that you want to use when you query by a “secondary” partition key.
– Queries by GSI’s are eventually consistent.

Nothing we can’t work with, but it’s also not what you would call an alternative to indices from relational databases.

Rules for designing

DynamoDB has few building blocks and whatever blocks we have are well-defined and limited. Misuse of these blocks will lead to badly performing applications, so it is important to understand the rules for using them.

You must only query by Primary Key

You can query by anything, but you will only experience good performance as long as you query by the partition key in your table or in a GSI. If you deviate from this you trigger what is called a Table Scan. And yes, it is as bad as it sounds.

You must know your access patterns beforehand

Because of the limits imposed by the Primary Key, it is extremely important to know the way you will consume your data beforehand. This way you can use the rule-set to design a table that will serve this data with fast queries. GSI’s can be used to add an extra partition key ad-hoc, but this will only get you so far. Besides, replication between tables costs money and is eventually consistent, so a good design beforehand is necessary so that you do not incur cost and suffer consistency issues.

You must use a single table*

AWS mentions the single-table design principle often. Often enough that it would seem that it is the only approach to take. This is not a sentiment we agree with, because it can make tremendous sense to create multiple tables for your application. When applicable, the single-table design does make optimal use of the building blocks of DynamoDB and keeps the table performant at scale, so do not dismiss it either. The reality is that traditional techniques from relational databases work poorly with DynamoDB, and multi-table designs carry a risk of turning into relational systems.

The single- vs multi-table design topic is extremely nuanced. As with many other design questions the answer often ends up being ‘It depends’. That is not a useful answer here, so while oversimplified, the following rule-of-thumb can be used: prefer single-table design and deviate from it when it does not make sense anymore. The techniques described in the next section should make you more comfortable with the concept of single-table design.

Design techniques & examples

The rules imposed on us by DynamoDB seem to suggest that we can only design simple key-value stores, but that is luckily not the case. It is possible to store and query hierarchical data, store multiple entities in a single table and gain access to multiple complex access patterns by using GSI’s. We are going to demonstrate this by working out an example step by step. The situation we will be modelling is that of multiple stores that sells categorized goods. Every good has a name and a price. A seemingly simple example, but it covers our cases:

– Categories are typically hierarchical (Electronics -> Laptops -> Accessories). We want to query on each level.
– We have multiple types of entities (stores & goods).
– We want to retrieve items not only per store and per category, but also per price range.

We will start with solving the first problem: hierarchical data. The solution here is to use the concept of compound keys in DynamoDB. An example:

Note how we use store as a partition key, but construct the sort key from categories separated by hashes. The separation character does not have to be a hash, it could also be a slash or another character. Regardless of the separation character sued, it now becomes possible to query the following:

– All goods for a store (query on partition key).
– All goods in a store for a particular category:
– PK = StoreA + SK = Electronics will return all electronics goods.
– PK = StoreA + SK = Electronics#Laptops#Accessories will return only laptop accessories.

Our next problem is that this model does not lend itself to storing data about the store, since our partition- and sort key combination must always result in a good. We can solve this by using the concept of compound keys in the partition key:

We prefix each partition key with an identifier. STORE for store metadata and INV for inventory goods. Note how the sort key for store metadata is basically a copy of the partition key value, that is because we want to be able to query for exactly one item by providing both a partition- and a sort key in our query. We now have the following extra queries at our disposal:

– Retrieve all stores by querying on partition key with a prefix STORE#.
– Query for the location of a single store by querying on PK + SK with prefix STORE#.
– Query for store inventory by using the prefix INV# in the partition key.

This introduces many more access patterns already. We have one more to cover here, that is filtering by price range. It is here that we start using a GSI, because we have a need for another primary key. Let’s start with the example first:

This looks a lot more complex than it is. The best way to understand it is to start from the access pattern. The access pattern we want to achieve is to query on a category in a store, and then filter by price range. Filtering is best done in the sort key, so we want to position the price there. Our partition key must therefore be a concatenation of our store and the category hierarchy. That makes it possible to:

– Query all prices for a particular category in a store by querying on the GSI1 partition key.
– Filter by price for a particular category in a store by querying on GSI1 partition key + a range in the GSI1 sort key (e.g. price > 1000 && price < 1500).

To create a GSI we add two columns to our table. The names of the columns are GSI1PK and GSI1SK to mirror the fact that one is the partition- and the other is the sort key for our first GSI. The GSI can then be created by referring to both of these columns.

Summary

DynamoDB is a powerful serverless database in the cloud and a useful tool for your developer toolbox. A tricky tool maybe, but one that is extremely powerful if the situation allows for it. We took a look at the serverless nature that makes DynamoDB a compelling choice, the building blocks available to us, the rules that govern our design and the techniques that help us when designing more complex data models.

The rules that govern DynamoDB may appear limiting, but we can still do a lot within their confines. We can create non-trivial models by using techniques such as compound sort keys for filtering hierarchical data, compound partition keys for querying multiple entities and GSI’s for introducing reverse access patterns. We have only covered three so far, but these three are enough to model situations that are far more complex than the typical key-value retrieval.

Want to learn more? The DynamoDB Book by Alex DeBrie has helped us understand DynamoDB and is an excellent resource for further study. He covers the techniques outlined here and more, with examples along the way. Recommended reading for the holiday period.

Ilia Awakimjan

Ilia Awakimjan is na het behalen van zijn Master-titel sinds 2017 Software Engineer met specialisatie AWS in dienst bij Profit4Cloud. Ilia is AWS Certified DevOps Professional, AWS Certified Security Specialty en AWS Certified Networking Specialty gecertificeerd.