Where the alter-ego of codelust plays


Benchmarking CouchDB against MemcacheDB

This is by no means meant to represent a scientific evaluation, nor should it be seen as the absolute word on what other tweaks could be done to extract maximum performance out of either MemcacheDB or CouchDB. The components are mostly running with out-of-the-box settings and my aim was to get it as close as possible to what my needs are in the production set up I deal with. You may have a different way of setting the infrastructure up.

Moreover, it is really not a brilliant idea to compare CouchDB with MemcacheDB. Both represent and do things differently and are, honestly, meant to be used for different things. I already have a set up in place which offloads a lot of front-end data crunching to Memcached and it is the lack of persistence in that led me to look for MemcacheDB and CouchDB.

Again, it is not a fair comparison because CouchDB can, in one request, get all the data needed for a particular record, while MemcachedDB would require multiple requests to cobble together a similar data set. Thus, CouchDB can afford to be slower and a bit more expensive in terms of requests. Then there is also the additional penalty of having to reconstruct the logic to assemble the data together for MemcacheDB, which won't show up in a test like this. But the results do throw up some very interesting results.

Proto.in 2008

Since the whales have taken over Twitter, back to blogging now.

Such industry events are massive echo chambers and fanboy factories with little actual knowledge being shared.

Term that comes to mind: oversimplification.

More evidence of shifting consumpting patterns

Hitwise posts their list of hottest bands on social networks and blogs and the social network list makes for interesting reading. Other than three artists in the list of twenty, I have no clue about who or what the others are.

Robin makes the point clear with this:

As you can see, with the exception of Coldplay - who shot to the top of the list thanks to their new album - most of top searches sending traffic to social networking sites are up and coming new bands / artists.

I will readily accept a single statistic alone cannot support an entire theory, but I do believe that we will increasingly see trends like this in the times to come.

Processing and scaling high volume social applications

This was a proposed architecture for a high-traffic query-heavy social network application that was struggling to keep up with the load. Reposting it here so that it could be of use anyone who is facing the same problems.

Event Queue: Stores all the incoming events from the application clients. It is a dumb queue that gets to operate on a first in, first out basis. The EQ also represents the first line of scaling that is possible in the system. A Queue Proxy can be deployed in front of an array of EQ stores to scale incoming requests to any level you would want to scale it to.

Dispatchers: These are persistent processes which pull data from the EQ. To prevent duplication, they will have to be mutex locked at the thread level. Two or more dispatcher threads or processes should not access the same EQ at a time.


Dispatchers will only hit the EQ, pull out a prefixed number of max_events at a time, and use a lookup table to figure out which shard queue a particular event should be should be sent to. 


You can scale here to insane extents with more dispatchers rolled into handle extra load.

Data should not be polled here by any external process or thread, they have to be pushed to the next level.

Shard Queue: Same as the event queue, only difference being that events are sharded here. It is a dumb queue once again, which does nothing more than maintain the queues.

Shard Application Server: This is a SQLite, BDB store, keeping data in files, which is sharded.

Leadership quality checklist

Couple of points sent to a friend who asked for a checklist for ascertaining leadership qualities:

Ability to learn and adapt on the run: Leadership is 99% learning - people, tools, processes, skills and market realities. Once you know you are going to do something, then what stands between you and the actual doing is about how fast you can learn the skills that go into whatever it takes to hold it all together. If you do something right, improve it. If you do something wrong, catch it as early as possible and fix it. Rinse repeat. Apologize without fear or prejudice and thank every contributor personally.

Ability to take decisions: It is easy to decide between a good choice and a bad choice. What you really need to know as a leader is to be able to decide between two really bad choices with your back against the wall.

Ability to see the larger picture: If you can perceive why the sum of benefits from 12 sub-optimal choices can often easily be more than the sum of a highly optimized single choice and understand why the right choice is different from picture-to-picture, you are already half way there.

Ability to inspire people and share your learning: Teamwork is best explained in cycling. The sum of the parts is most often better than a single outstanding part. In fact, you can't win the Tour de France without a good team backing you. And every good team is an experience of endless give-and-take. But every team also has a leader who inspires the team and leads them to do things which they probably would not be able to do if they are on their own.