scalability
Current Obsessions, June ‘08
It’s a new feature! I spend way too much time reading interesting papers… I should not be so selfish as to not share them.
Steve Yegge’s ramblings will enlighten you or depress you, but at least they are enjoyable. There, on my list, right under Joel Spolsky (all-time Nerd Superstar).
Clustering and Cloud Computing
The Helmer Cluster, but only because I can also be a hardware nerd.
Dr Queue, not the only one, but a queue manager that proved its worth.
Terracotta because I have a big Java project that needs scaling and I believe that scaling out is a more…er…scalable approach.
Performance
I mean raw performance; as in scaling up, this time.
Tokyo Cabinet seems to be the best tree-based database available these days. Takes me back to when I was using C-ISAM from Informix. NO, not all problems need a big OORDB thrown at them. In fact, transitioning some of your data to Tokyo Cabinet seems like a good application of “write first, optimize later.”
“How do they do it?” LinkedIn are kind enough to expose their architecture.
Twitter! Sorry, no link. Oh, well, maybe one. Oh, no, wait it’s down. No it’s back up! Wow, the Twitter API is currently capped to 10 requests per hour. Application developers: keep your calls to no more than one every 6 minutes…except for the public timeline, unless they decide to restrict its usage as well.
Wireframing, Blueprinting and A/B Testing
I need to post links here. Keep an eye on this blog entry.
Anyway, Yahoo now conveniently provides stencils for web mockups.
Erlang
It’s beautiful. I’m just getting started, though.
Git
For so-called “virtual companies” this would be the ultimate in source code management. Even if you’re not “virtual”, you may want to give it a try. I absolutely love how it allows me to try the craziest framework modifications without having to suffer any consequence.
Easy Git — “git for mere mortals”
Interact with your Subversion code base – man, I want the same thing for Perforce!
Educate yourself. Well, seems to work for me…
The “If I had more time…” section
NestedVM: whatever your source language, cross-compile to MIPS .o files, then convert to pure Java.
Really optimized image colorization.
If you enjoyed this post, make sure you subscribe to my RSS feed!
The Twitter Exponential Argument
A quick note: I was reading Techcrunch earlier today, when I realized that I could not reconcile their post “Twitter at scale: Will it work?” with my views on how to build a scalable application.
Nick Cubrilovic contends that “Every new Twitter user and every new connection results in an exponentially greater computational requirement.”
And yet, I fail to see the exponential quality of it all.
It looks like Nick is saying: “I have U users, posting P posts, read by F followers. Hence, if I were to draw this on paper, I would end up with an exponential slope.”
That’s odd because, as I understand Twitter’s architecture, we indeed have U people posting P posts – BTW, P is unknown, and as Twitter goes I’d wager that the f(P) curve would be logarithmic; but I digress. Let’s, for simplicity’s sake, consider the total number of posts and call it X.
Now, Nick would obviously be referring to a f(F) curve. If F followers have to monitor X posts, then yes, I expect the slope to be exponential.
But that’s not how it works: Twitter is a pull service; each Twitter client regularly asks the server(s): “Do you have anything for me?” The server replies: “No” or “Yes, these x posts.” “x”, not “X” because only relevant posts are retrieved.
Since “F” is bound to be much bigger than “x”, and the overheard of retrieving multiple posts is very small compared to the time elapsed between two polls, it seems to me that the formula we should use here is that of a linear approximation.
Just my 2 cents. My math is *very* rusty but it seems to me that Nick’s argument doesn’t hold water.
If you enjoyed this post, make sure you subscribe to my RSS feed!







