Processing and scaling high volume social applications

This was a proposed architecture for a high-traffic query-heavy social network application that was struggling to keep up with the load. Reposting it here so that it could be of use anyone who is facing the same problems.

Event Queue: Stores all the incoming events from the application clients. It is a dumb queue that gets to operate on a first in, first out basis. The EQ also represents the first line of scaling that is possible in the system. A Queue Proxy can be deployed in front of an array of EQ stores to scale incoming requests to any level you would want to scale it to.

Dispatchers: These are persistent processes which pull data from the EQ. To prevent duplication, they will have to be mutex locked at the thread level. Two or more dispatcher threads or processes should not access the same EQ at a time.


Dispatchers will only hit the EQ, pull out a prefixed number of max_events at a time, and use a lookup table to figure out which shard queue a particular event should be should be sent to. 


You can scale here to insane extents with more dispatchers rolled into handle extra load.

Data should not be polled here by any external process or thread, they have to be pushed to the next level.

Shard Queue: Same as the event queue, only difference being that events are sharded here. It is a dumb queue once again, which does nothing more than maintain the queues.

Shard Application Server: This is a SQLite, BDB store, keeping data in files, which is sharded.

Requests passed on to the SAS generates the XML/JSON files, that are then served via a reverse caching proxy at the Shard API proxy level. (You could RSYNC the files to the SAPI too, but that could eventually result in hitting file inode limits).

Data will be polled from the SQs than being pushed to the SAs.

Shard API Proxy: Can be Memcached or XML/JSON files. If your cache runs really hot, Memcached will only slow things down.

But using RSYNC or URL-rules based reverse-caching proxies will mean the SAS has to be beefier machines, or hit inode limits with all the files being in one place.