[patch] Compression for Perl API

Brad Whitaker whitaker@danga.com
Fri, 15 Aug 2003 12:35:57 -0700 (PDT)


My data set was 40 rows, roughly 200k.  The entire data set was inserted
something like 20 times for each test (I forget exactly, I could run it
again).  The extra 2 seconds isn't so bad when you figure that it's run
over many iterations.  In this case 800.

The benchmarks were a quick hack before I left for the day, and my
computer isn't very fast, so I'll likely play with them later to make the
numbers a bit more useful.

I'm about to leave for the weekend, so I'll have to fully respond to this
later.

Here are some quick compression ration stats from our test machine:

1093 => 657 (0.40%)
5808 => 1785 (0.69%)
1799 => 933 (0.48%)
8835 => 2091 (0.76%)
1232 => 733 (0.41%)
2783 => 1352 (0.51%)
1360 => 739 (0.46%)
1349 => 771 (0.43%)
1925 => 988 (0.49%)
2035 => 1090 (0.46%)
1051 => 616 (0.41%)
1706 => 966 (0.43%)
2911 => 1489 (0.49%)
1314 => 751 (0.43%)
3423 => 1680 (0.51%)
1477 => 638 (0.57%)
3693 => 952 (0.74%)
4391 => 2247 (0.49%)
4671 => 2186 (0.53%)
5096 => 2144 (0.58%)

Pretty huge savings.  I'm wondering if the 20% minimum savings constant
should be moved up closer to 30%?

--
Brad Whitaker
whitaker@danga.com

On Fri, 15 Aug 2003, Justin Matlock wrote:

> So what those results say, is it takes 2 seconds longer to retrieve
> compressed data?  I'm very surprised at the big increase in get times with
> compression.  I'm not seeing anything near those numbers on the PHP side.
>
> How big was your test data set, and what kind of data did it consist of?
> (I'd like to run the same test on my code and see what comes out).
>
> Something else you might want to consider for the Perl client:  I now do a
> quick check of the first 10 or so characters of the data to be compressed,
> and abort the compression if it turns out the data is a JPEG, PNG, or GIF.
> You just have to look for '\x89PNG' for PNG, '0xffd8' for JPG, and 'GIF' for
> GIF files, starting from position 0.  I found that one of my
> not-so-bright-about-compression semi-developer (he should just stick to
> HTML) was trying to store 3k JPEG files in the memcache and couldn't figure
> out why compression was taking so long.
>
> As soon as I get full power back here, I'm going to run some similar
> benchmarks on the PHP client, also on the PHP5 client, which uses streams,
> and the C/Zend API I've written that will actually compile into the PHP
> binary.  So far, the C program is just a little faster than the pre-compiled
> PHP code (using the Zend Optimizer pre-compiler) leading me to believe it's
> going to be better to leave it as plain PHP...
>
> I just hope my development box comes back when Con-Ed gets around to
> plugging us back in.  I had stolen it's UPS for one of my servers last week,
> and hadn't gotten around to replacing it yet (doh!).   It blew my 403 day
> uptime... dangit. :)
>
> J
>
> ----- Original Message -----
> From: "Brad Whitaker" <whitaker@danga.com>
> To: <memcached@lists.danga.com>
> Sent: Friday, August 15, 2003 2:00 PM
> Subject: [patch] Compression for Perl API
>
>
> > I've had this on my todo list for some time, but after it was implemented
> > in the PHP API, I couldn't let Perl get behind. :P
> >
> > A diff to MemCachedClient.pm is attached.  It uses Compress::Zlib to do
> > gzip compression on inserts and decompression on gets.  There is currently
> > a constant minimum compression gain of 20%.  That is, if it does the GZIP
> > on write and it saves less than 20%, it will insert the uncompressed
> > version in an effort to speed up later gets.  This number should probably
> > be tweaked later (and maybe bet set by the user?), but for now this seems
> > reasonable.
> >
> > A new 'compress_threshold' key can now be passed to the constructor to
> > determine the minimum size threshold before a value is compressed.
> > Since compression is disabled by default, this key also serves as the
> > global on/off switch.  To change its value later there is also a
> > set_compress_threshold() method.
> >
> > For instances when compression should be temporarily disabled regardless
> > of data size (such as inserting compressed images), there is a new
> > enable_compress() method which takes a true or false value.
> >
> > Curious as to how compression affected speed, I wrote up a little test
> > inserting a set of test data repeatedly.  Below is the output of 3 runs
> > showing the number of seconds get/set actions took with and without
> > compression:
> >
> > compress: no
> > sets: => 5.33751511573792
> > gets: => 6.15450966358185
> > compress: yes
> > sets: => 5.281329870224
> > gets: => 8.92959928512573
> >
> > compress: no
> > sets: => 5.56796407699585
> > gets: => 6.13490855693817
> > compress: yes
> > sets: => 5.11762177944183
> > gets: => 8.82273375988007
> >
> > compress: no
> > sets: => 5.89584612846375
> > gets: => 6.15535402297974
> > compress: yes
> > sets: => 5.19342517852783
> > gets: => 8.98201024532318
> >
> > After writing this mail, I checked and saw that Brad committed this patch
> > just before he left.  As a result, I've put the new Perl client live at
> > test.livejournal.org and I'm watching to make sure everything continues to
> > run smoothly.
> >
> > --
> > Brad Whitaker
> > whitaker@danga.com
>