[patch] Compression for Perl API

Justin Matlock jmat@shutdown.net
Fri, 15 Aug 2003 14:46:50 -0400


So what those results say, is it takes 2 seconds longer to retrieve
compressed data?  I'm very surprised at the big increase in get times with
compression.  I'm not seeing anything near those numbers on the PHP side.

How big was your test data set, and what kind of data did it consist of?
(I'd like to run the same test on my code and see what comes out).

Something else you might want to consider for the Perl client:  I now do a
quick check of the first 10 or so characters of the data to be compressed,
and abort the compression if it turns out the data is a JPEG, PNG, or GIF.
You just have to look for '\x89PNG' for PNG, '0xffd8' for JPG, and 'GIF' for
GIF files, starting from position 0.  I found that one of my
not-so-bright-about-compression semi-developer (he should just stick to
HTML) was trying to store 3k JPEG files in the memcache and couldn't figure
out why compression was taking so long.

As soon as I get full power back here, I'm going to run some similar
benchmarks on the PHP client, also on the PHP5 client, which uses streams,
and the C/Zend API I've written that will actually compile into the PHP
binary.  So far, the C program is just a little faster than the pre-compiled
PHP code (using the Zend Optimizer pre-compiler) leading me to believe it's
going to be better to leave it as plain PHP...

I just hope my development box comes back when Con-Ed gets around to
plugging us back in.  I had stolen it's UPS for one of my servers last week,
and hadn't gotten around to replacing it yet (doh!).   It blew my 403 day
uptime... dangit. :)

J

----- Original Message ----- 
From: "Brad Whitaker" <whitaker@danga.com>
To: <memcached@lists.danga.com>
Sent: Friday, August 15, 2003 2:00 PM
Subject: [patch] Compression for Perl API


> I've had this on my todo list for some time, but after it was implemented
> in the PHP API, I couldn't let Perl get behind. :P
>
> A diff to MemCachedClient.pm is attached.  It uses Compress::Zlib to do
> gzip compression on inserts and decompression on gets.  There is currently
> a constant minimum compression gain of 20%.  That is, if it does the GZIP
> on write and it saves less than 20%, it will insert the uncompressed
> version in an effort to speed up later gets.  This number should probably
> be tweaked later (and maybe bet set by the user?), but for now this seems
> reasonable.
>
> A new 'compress_threshold' key can now be passed to the constructor to
> determine the minimum size threshold before a value is compressed.
> Since compression is disabled by default, this key also serves as the
> global on/off switch.  To change its value later there is also a
> set_compress_threshold() method.
>
> For instances when compression should be temporarily disabled regardless
> of data size (such as inserting compressed images), there is a new
> enable_compress() method which takes a true or false value.
>
> Curious as to how compression affected speed, I wrote up a little test
> inserting a set of test data repeatedly.  Below is the output of 3 runs
> showing the number of seconds get/set actions took with and without
> compression:
>
> compress: no
> sets: => 5.33751511573792
> gets: => 6.15450966358185
> compress: yes
> sets: => 5.281329870224
> gets: => 8.92959928512573
>
> compress: no
> sets: => 5.56796407699585
> gets: => 6.13490855693817
> compress: yes
> sets: => 5.11762177944183
> gets: => 8.82273375988007
>
> compress: no
> sets: => 5.89584612846375
> gets: => 6.15535402297974
> compress: yes
> sets: => 5.19342517852783
> gets: => 8.98201024532318
>
> After writing this mail, I checked and saw that Brad committed this patch
> just before he left.  As a result, I've put the new Perl client live at
> test.livejournal.org and I'm watching to make sure everything continues to
> run smoothly.
>
> --
> Brad Whitaker
> whitaker@danga.com