HTTP ?

Tue Nov 30 08:17:10 PST 2004

Gregory Block wrote:
> 
> The only, the *only* thing I can think of to wish for is a better 
> performing GzipOutputStream, and that's mostly because I haven't gone 
> looking for either a faster compression stream in Java, or something 
> backed by hardware to do that compression, and that's just a 
> nice-to-have rather than a killer.

This is the one thing I would like as well!  I played around with the 
basic compression streams in java and the performance was horrible.  Our 
solution was probably not the most elegant.  As the client supports a 
threshold object size, below which, no compression is attempted, we just 
set this value fairly high (128K) and store only our largest object 
compressed.   Given a typical memcached server is fairly cheap, we just 
bought more hardware to run as servers.

However, I would still love to find a fast Java compression stream 
(though I am not holding my breath on finding one anytime soon).  A 
related problem is with the new hashing algorithm being used by a few of 
the other clients (perl, php).  They switched to using ...

private static int newCompatHashingAlg(String key) {
     CRC32 checksum = new CRC32();
     checksum.update(key.getBytes());
     int crc = (int) checksum.getValue();
     return (crc >> 16) & 0x7fff;
}

This is sloooooow in Java (due to the CRC32 checksumming) and really 
fast in other langs like perl.  My solution was to allow the java client 
to specify the hashing algorithm to use.  It can run in a compatibility 
mode, which will use the slower algorithm, but hash the same as other 
clients, but it defaults to a simple String.hashCode(), which is blazing 
fast (mostly due to the fact that Java maintains a pool of String 
objects for you, so you generally only ever calc hashCode one time for a 
given String).

>  - If you've got connection fatigue, hold your connections around.
>  - If you can't solve that, you'll replace 
> connection-fatigue-with-memcached with 
> connection-fatigue-with-loadbalancer.
>  - Assuming you don't have the latter problem, you've probably got a 
> solution that could have been applied to fix the former, identical issue.
>  - If the cost of losing a cache is too high, distribute the load across 
> more caches, effectively reducing the impact of a single cache failure.  
> (And having said this, the uptimes on our caches are unbelievably long, 
> even under heavy use.)

We have only ever had to restart ours when we make sufficient code 
changes to invalidate the majority of data we have cached.  For speed 
purposes, we restart memcached to flush the cache.

>  - If the cost of accessing cache for everything is too high, split 
> cache into a "long term" cache which is retained in memcached, and a 
> "short term" cache which, like human short-term memory, is kept within 
> specific time/usage limits and holds a limited amount of information; 
> that raises the problem of needing to keep this content in sync, but 
> we're purposefully talking about small amounts of content here where 
> content is pushed simultaneously into the short-term and long-term 
> memory system for later recall/reuse.

We do retain cache in our app servers local memory.  Generally, this is 
small data that does not change, yet is accessed frequently.  No point 
in making the network hop for it.  Most data though, we cache in memcached.

Greg