Java Memcached Performance &Scalability

Mon Jul 31 17:15:47 UTC 2006

Sorry, just caught this.

On 26 Jul 2006, at 13:36, Prateek Mathur wrote:

> Folks
>  Some questions on performance and scalability:
>  1) How do you compare the Java version of Memcached with the C  
> versions in handling load and performance?

I don't.  There's almost no way I'd ever be willing to wrap the C- 
based library in a java API and do that much JNI boundary crossing  
across hundreds of threads in our applications.  You can do so;  
forgive me if I think you're completely insane for it.  :)

It's fast enough that I don't have to worry about how fast it is.   
The cost is miniscule to nonexistent, compared to the serialization  
overhead we pay for serializing our object hierarchies into the server.

> Are there any figures from real production environment?

If I was worried about comparing it to something else, I'd bother.   
There's nothing worth comparing it to, so I don't.  Object load times  
from our database are orders of magnitude more expensive than loading  
objects from the memcached cache backend, on our system.

>  2) How do you compare the Java version of Memcached with other  
> Java caches like EhCache,JBossCache etc.Any figures?

They don't compare, IMO.  I wouldn't ever use the long-term caching  
on most of those, personally - I don't like the implementation, and  
most importantly, the whole point of having a cache like this is  
making sure that nothing can go wrong within the Java VM that damages  
the availability and uptime of the caches themselves.

What we do here is a two-level readthrough/writethrough cache, with  
an EhCache providing "short term memory" caching, and a Memcached  
providing "long term" caching.  Cache updates and reads check the  
short term cache, which is limited by numbers and/or by TTL, and fall  
through to the TTL'd memcached backend.  Updates clear keys in both  
sets; the next access will fall through both caches and result in a  
fresh read from the database.

Simple layering of a 'recently used' cache on top of the memcached  
access API can get you a predictably-performing short term  
implementation with long-term fallthrough to the memcached backends  
for longer TTL data persistence with very little overhead.

All of this is personal preference - feel free to use the disk caches  
and server-to-server replication if you prefer.  It all depends on  
what you're trying to accomplish.