How NOT to scale: Tweekly.fm in 2009.

Back in January of 2009, it was university holidays and I just finished my first year of university (studying BComm Computer Science). Being kinda bored and wanting to make something, I decided to whip up a mashup called Tweekly.fm (then childishly called T.W.A.T. (Top Weekly Artist Tweet)).
It was simple. I wanted to learn how to use APIs, and further some of my programming skills. It takes your most listened to artists for the week (from Last.fm) and posts it to Twitter on a Sunday. This was an early version (circa March 2009).

Before I explain the rollercoaster ride, I want to explain my skills at that point:
- Java (in highschool)
- C (1st-year computer science)
- Game Maker
- Basic PHP/MySQL/HTML/CSS (learned it myself in high-school)
I had no sysadmin experience (no idea how to work with linode or aws whatsoever) and didn’t know of MVC frameworks. I just sort of winged it.
The nature of Tweekly.fm meant it could only go viral. Each week, users sent out tweets, and more users signed up. The week after that, more people signed up and so forth.
Now. For the most part of 2009, everything ran smoothly. Tweekly.fm was hosted on my shared hosting provider’s sites and it was performing within the imposed limits. It wasn’t hitting the bandwidth cap and so forth.
The biggest problem however (with my lack of knowledge), was that on Sunday morning 00:00 GMT+2, a CRON job called a PHP script that collected all the data from the DB and queried each user’s last.fm profile, and sent a tweet. A bottleneck quickly formed: the API calls was slowing the script down… a lot. I didn’t care really. I thought this is how these things worked. 1 hour… 2 hours… 3 hours… and so it grew.
Then I thought, considering that the tweets are sending at random times, I’m going to push the CRON back a bit, so in the middle of the 4 hour script, it would be 00:00. Cool.
Then the madness started.
My shared hosting provider sent me an e-mail:
Your website has normally used to many resources after hours and we’ve let it slip until now but now during peak times you’ve used more resources than allowed and that’s why we were forced to stop the script from running.
Basically it seems like your opening lots of connections with your script instead of just running one script doing everything and that’s what is causing the extra loadon the server.

Being in the middle of a uni term, I didn’t have time to sort it out (my provider kept mailing me). No idea what to do, I started searching for better servers (with no idea what I was looking for). I had no money too. So I started a donation drive amongst the users to fund it. I got enough (thanks! you know who you are) to get a decent server going for 2-3 months (I didn’t think past that moment).
I started with Media Temple. I think was the grid service option. Took me 2 days to transfer everything and getting it up and running. At that stage, I didn’t have my own DNS nameserver, so I had to point it to the servers each time. It took about two days to resolve, so I had to “run” two websites simultaneously (the old one just said we are moving, and the new one actually worked). Also: Moving servers = dumping the db into a 4mb SQL script (through PHPMyAdmin) and uploading all the files through FTP. I knew at that point there had to be an easier way, I just didn’t know about it.
It was up on Media Temple, and I was ready to see what was going to happen. Giving up my Saturday night, I sat by my PC…
…and then the server kept crashing.
It was running out of virtual memory extremely quickly shutting the server down. It then repowered up by itself. But: Now only a few tweets were sent. So I search Twitter, checked who the last user was, found his position in the db, and ran the script again changing the index pointer (mysql_data_seek) of the mysql result set… and then it crashed again. I continued for an hour, fell asleep, and finished up the rest the next day. I can’t do it again like this, I had to find another way… within a week, while doing uni work. So, I moved back to my shared hosting. I was hoping they just won’t notice.
That week our family was off the visit my sister in the Karoo, so that was my only option.
So there I was in the middle of the Karoo, having a break around the braai, drinking some brandy and enjoying my time with the family when I realised I forgot to take out the “mysql_data_seek” line from the code…
Now, the Karoo is big and in some places you don’t have phone reception. It also gets very cold at night. It was akin to this: except night.

So I called my friend. I had 30min left before the script would start. In the nick of time, I gave him instructions to FTP in and edit out the line. SUCCESS.
It was short-lived though. When I came back I had to move servers again. This time I opted for Rackspace’s cloud service. I did more research this time of course. It scales automatically, so by definition, the memory wouldn’t run out, right? Cool. Let’s go.
Come Saturday night again, I sit and stare at the screen… Everything is going smoothly… until 15 minutes in the script just stops. FFFUUUU. There was a hard-limit on how long scripts could run on rackspace. I gave up. I had no idea how to handle this. My skills weren’t up to scratch and it was 1am. I felt hopeless, shed a tear and went to sleep.
…
The next week, I stopped the service, while I thought what to do. I tried selling Tweekly.fm to Last.fm, talking to CBS over the phone, but no luck. So I put it back on my shared hosting and made it manual. The final nail in the coffin of a fun experiment… until a few weeks later, Scott (@dordotky) started a similar service. I mailed him, talked some terms out, and we have been working together since on it. He has been a tremendous help in scaling the service since then. We don’t have much time to work on it (both doing other things full-time), but at least it is still going and running smoothly (not crashing servers).
If you want to scale today. Don’t do what I did. Ask people to help (to those that did, thanks!).
What did I learn from the whole experience?
1) Don’t use shared hosting for anything more than a simple static site. This advice might seem old now, but hey, I didn’t know of any better. Nowadays, however, there are much much more cloud services available that is easier and better and cheaper.
2) If something seems overly complicated, it probably is. While I was working on a weekly schedule, it would’ve been better to forego a week’s tweets to take time and learn and research the better options! I should’ve loaded up my own DNS server (managed or otherwise), I should’ve learned about dvcs so that I don’t have to use slow FTP to upload files, etc.
3) Most of all however. If you have no idea what you are doing, just wing it anyway.