Memcached: distributed cache for your application

Targerts: projects with huge amount of data, that could be quite static; projects with huge amount of data that are non-important in case you loose it;

While looking at today’s erlang-russia subscription, I found out comment about custom implementation of gen_server, that executes about 30% faster that embedded into erlang. And this post contained comment about gen_server implementation in merle erlang project – that is a client to memcached. I’ve read a bit about memcached and idea made me interested.

Memcached is server for caching key-value pairs from your application. The main advantage is that it is distributed cache, that means your application can maintain data on several cache servers. This again means that theoretically in case you have a lot of data, you can spread it over servers. For me prove of concept was list of services that uses Memcached: LiveJournal, Wikipedia, Flickr, Bebo, Twitter, Typepad, Yellowbot, Youtube, Digg, WordPress.

So I started some testings. First, I’ve got sources of memcached and make them. It takes just a minute including dowload, so it is very easy to install. Then I started 2 instances – one on default port 11211 and second on port 11212.
Merle erlang client showed good results on storing/getting values – it put 1 000 000 key-pairs in 130 seconds and fetched 1 000 000 key-pairs in about 110 seconds. BUT – this client implementation is not distributed 🙁 . When I tried to connect to my second instance it wrote that gen_server already running. Well, it is 0.3 version of implementation, so hopefully we’ll see it distributed in some near future, otherwise for the moment the only use of it I can see is to share data between several nodes of erlang.

Next I tried java client implementation. First that I found was spymemcached. As it is implemented through the java-classes I had no doubts it will connect to several instances of memcached. You need to create several connections to do this, but this is not yet a distributed cache as you need manually check if server contains hash of the key. The solution was http://code.google.com/p/hibernate-memcached/ – a hibernate implementation based on spymemcached. You can put space delimited list of memcached instances in host:port format into configuration of hibernate, thus you’ll have fully distributed resource. As spymemcached show performance same as erlang implementation, and it was quite impressive, I think hibernate-memcached could be very useful in java and grails.
Meanwhile there are clients for PHP, Perl, Ruby, C# of memcached. But it is not possible to read from cache values inserted from other language (well, theoretically it is possible,

Merle erlang client quite easy to change so, that it will be able to read simple strings from Java, but “as is” Merle and Spymemcached are non-compatible.

Conclusion: memcached is interesting technology, in case your language has proper client implementation it will be interested for you to try it in distributed mode.