So. A few friends and I got together a while ago to build an app for the newly launched APIs for Mxit. We didn’t plan on making it big, just having some fun. So we also decided to learn new stuff across the board. I’ve worked with Heroku in the past, but never EC2. So we got a completely new stack going: EC2, Node.js and DynamoDB.
The first thing we are running into that isn’t panning out as we had hoped is DynamoDB.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
You pay for throughput. Amazon does the rest. They replicate the data, managed it, and let it scale automatically. If you need more throughput (reads/writes per second) you simply dial it up. This itself is quite awesome and clever.
However we have run into some problems that are a pain to iron out. Hopefully by telling someone about it, it can be avoided.
There are 4 ways to fetch items from DynamoDB: Get, BatchGet, Query and Scan. For Get and Query, you can only use the hash key (think of primary key in sql dbs) and the range key. If you want to fetch items based on any other values, you have to use a scan and add your own conditions (here you can now use anything, not just the hash and range key). The problem is: Scan is expensive. It reads the entire table and fetches what is relevant.
Cool. For now, we aren’t worried that much about performance. We are just playing around after all. WRONG. You can’t sort on Scans! FFFuuu.
You can only fetched sorted results based on the range key you specified in the table (ie one column) with a Query call. Not really knowing we had to do that (our own stupidity) means not one of our tables have a range key. You can’t retroactively add one. The only way to assign a new range key is to create a new table, port over all the data. Now wanting to fetch any type of sorted data, we have to put aside a whole afternoon to code in the table migration, etc. Not ideal.
Chris Moyer has a solution to this. Create a new table with new hash and range key referencing the hashes of the parent tables. Now you can ‘sort’ on any column. The problem is, you have to open up new tables, provisioning throughput to them too, costing more money.
We have to do something about it soon, because we have anyway been hitting our throttledrequests (probably due to all the scans).
So, we have to decide now. Are we sticking with dynamodb, or moving to our own mongodb cluster? I think for our needs right now, dynamodb is causing more headaches than not. We haven’t hit mass scale yet where dynamodb’s provisioning model will help us, and we don’t want to be restricted by how dynamodb handles the data.
So if you want to use dynamodb, remember that!