I've created an Advanced CDK course, you can check it out here!

BlogResumeTimelineGitHubTwitterLinkedInBlueSky

The Pros and Cons of DynamoDB

Cover Image for The Pros and Cons of DynamoDB

I've been using DynamoDB on various projects for a little over 2 years (as of June 2019). It's a fantastic system and one that should be considered for a lot of use cases.

However, that's not to say it's perfect. I'll attempt to cover some of the pros and cons of DynamoDB and reach some conclusions about its stronger use-cases and when you shouldn't use it. If you disagree, please feel free to @me on Twitter using the link at the bottom.

Pro: It's simple to set up

It's stupid simple to set up.

  • CDK:
new Table(this, 'docs-catalog', { partitionKey: { name: 'docId', type: AttributeType.String } })
  • SAM:
Type: AWS::Serverless::SimpleTable
Properties:
  TableName:
    docs-catalog
  PrimaryKey:
    Name: docId
    Type: String
  • CloudFormation:
Type: AWS::DynamoDB::Table
Properties:
  TableName: docs-catalog
  KeySchema:
    - AttributeName: docId
      KeyType: String

Con: Weak querying model

Global and Secondary Indexes can only get you so far. If you know your access patterns, this can be manageable. But if you don't, then it's difficult, nigh impossible, to build an ad-hoc querying system like you can build with traditional RDBMS or fuller-featured NoSQL systems like MongoDB.

Pro: Non-hourly billing model

Many of the systems that I've built in the past were internal enterprise BPM-style application. Many had very low access requirements. They were not serving millions of users, they were serving dozens. With these environments, having a pricing model like DynamoDB's can drastically reduce hosting costs for applications.

Con: Harder to predict costs

However, since you're paying per usage, and it's often hard to predict when that usage might spike, it's not unheard of to get caught with unexpected costs. While AWS offers a lot under their free-tier pricing, minor mistakes can blow through that.

Earlier this month I was testing a process that used SQS messaging. However, a bug in my lambda handler went unnoticed, causing messages to be re-queued repeatedly and blowing right through that first one million free messages I was getting.

During development phases it's not uncommon to have similar situations and with DynamoDB that could result in an unexpected bill, whereas with an RDS or Aurora instance, you'd see the same cost each month regardless. To some, being consistent is better than being cheap.

Pro: Streams

DynamoDB supports streams, allowing other systems to react to data changes. Perhaps you want to render a materialized view, or update an aggregate.

This type of functionality is often handled with database triggers, or in your application layer. Both propose some significant challenges. Having DynamoDB Streams built-in to the system is very useful.

Con: Lack of server-side updates

Sometimes you have to make changes against a bulk set of records. Perhaps there was a change to your data model, like going from a .fullname field to .first_name and .last_name. To do this, you must update each record individually.

This may not seem too bad at first, but after a large enough set of records it becomes an untenable solution. Ultimately, you'll probably find it better to do data migrations on-demand in your API, as records are being read out of the system.

Pro: Time-to-Live

There are a lot of situations where having a record with a Time-to-live (TTL) can be very handy.

This is a built-in feature of the system and can replace a lot of use-cases that would normally require something like Redis.

Con: Provisioned throughput and batch jobs don't work well together.

Until recently, you had to tell DynamoDB what your read/write throughput would be, and this is how your DB was sized and priced. However, if you are running any recurring batch processes that do a large amount of read/writes in a short amount of time, your normal throughput levels will likely lead to write errors or throttled reads, neither of which are good.

You could work around this by scaling up your throughput before any known batch processes occur, and then scaling back down once done to reduce costs, but that assumes you'll always know when your batch processes will start.

On-demand pricing of DynamoDB definitely helps resolve this, if you are ok with that pricing model.

Conclusions

DynamoDB is a very simple, but powerful, database system. There aren't a lot of bells and whistles, but there don't need to be.

DynamoDB will work well in a lot of various scenarios, but you need to be aware of the downsides. Primarily, that it doesn't allow for traditional SQL queries and access patterns, which means for things like free-text search or ad-hoc queries, you'll likely need to export data to a different system. Also, you need to model your data differently than you're used to with an RDBMS. One of the most common problems I run across is systems that try to fit a third normal form schema into DynamoDB, and it will fail horribly every time. If you're coming from another NoSQL system, like Mongo, you're likely to be more familiar with the modeling changes.

Finally, I recommend you watch this video from re:Invent, it will really open up how DynamoDB is best leveraged.