Taken at Hack The North 25

  • Time sensitive data
  • Unpredictable news generation
  • Managing orders and reconciling them
  • Client latency does not matter!

Hacking the Speed of Light

About DynamoDB

DynamoDB is a key-value store database store.

Behind the scenes, Dynamo automatically splits your data into partitions. Each partition is stored on 3 replicas, with one elected leader.

Additionally we define “Latency” as the time it takes from you sending a request, to when you receive the result.

AWS splits the world into geographic regions. In Canada there are two, one based on Montreal, another in Calgary. Within each region, there are 3 availability zones. This is important, as Amazon guarantees no zero availability zones will fail at the same time.

When you ask for something, it goes to a Request Router. Requests are send to a random request router for load balancing.

The Request Router has the responsibility to:

  • authenticate the request
  • route it to the right storage node. Recall there are 3 storage nodes per partition. There is one node per availability zone. This way power outages cannot take out multiple storage nodes.

If its a write, that request must be routed to the leader.

GetItem - Strongly Consistent

  • Guarantees the most up to date data, this is routed to the leader.

GetItem - Eventually Consistent

  • Sent to where ever.
  • This option has a higher rate limit for customers.

Speed of Light

Because of the speed of light, Availability zones which are about 100km apart (which has to happen to prevent anything bad from happening to multiple), take about 1 ms. And even after that, it likely has to go to another availability zone etc.

  1. Takes 1000us to get from one availability zone (likely customer EC2 instance) to the request router
  2. Takes probably 200us to authenticate and route the request
  3. Then another 1000us to get to the right storage node in another availability zone
  4. Then 200us to serve that request

You see, of the 2.4ms, 2ms is actually due to the speed of light.

Ideally, if both the client (EC2), request router, and storage node were all in the same availability zone.

  1. 300us to send request
  2. 200us authenticate and route request
  3. 300us to send request to storage node
  4. 200us to serve

Split Horizon DNS can be used!

  • Every time you do a dig (DNS lookup), it’ll be based on where you are. Whenever you search up the ip of some storage node, it’ll try to return the one in the same AZ (Availability zone). This is good!

But the problem.. is failure.

Why do we use 3 nodes? And why can only 1 fail?

Because if two fail, then we cannot establish quorum and thus no leader can be promoted. That’s because the last alive node doesn’t know if the other two nodes are actually still alive and working, maybe it’s network is the one that’s broken, so we wait.

Failure Handling

Let’s investigate a failing request router in the case where we randomly distribute.

  • 1/3 A
  • 1/3 B
  • 1/3 C

What if B fails? Then we have B routing requests 1/6 to A and 1/6 to C so we get

  • 1/2 A
  • Fail
  • 1/2 C

In this scenario, as the maximum load on any server even with failure is 1/2 of total load, we need a 50% overhead on all servers.

Ok, now we use split horizon DNS, everyone tries to use their own. There might be some inbalance like:

  • 1/4 A
  • 1/2 B
  • 1/4 C

We consider B failing again

  • Fail
  • 5/8 B
  • 3/8 C

All of a sudden, the largest server need is 5/8 which is much larger.

Side note: because Amazon is very big, Law of Large Numbers, and because there are customers who are not in any AZ and are coming from the internet (which is about equally distant from all AZs so we can use them to balance) this isn’t as big of a problem as we see.

Notes to complete

  • then we basically load balance, 1/4 to one, then move 1/12 to others or something like that
  • this drops us to 300us for the first 1000us
  • next one is how we move leaders and data such that the leader is likely in the same AZ as the request router

End

In a response about the CAP theorem, he mentioned this blog post from another AWS engineer: Let’s Consign CAP to the Cabinet of Curiosities - Marc’s Blog