[patch] Compression for Perl API

Justin Matlock jmat@shutdown.net
Fri, 15 Aug 2003 17:00:55 -0400


Have a good weekend -- just some notes from me before I get the hell out of
NYC for the next 3 days.. Our power's gone out three moire times in the past
hour, and I'm tired of it.... :)

Here's what I just got back for the PHP API;  the memcache server is running
on a Celeron 600mhz with 512mb RAM (which also serves as my load balancer,
so it's semi-loaded).   The machine running the test is a AMD Athlon 1800.
I'm a little confused as to how you did your test - you did 20 loads of
200k, or 20 loads of 40x200k?

Anyway, here are my stats -- the data set ended up being around 450k (50,000
random words -- I'm assuming that's mostly what you're storing @ LJ).
Setting with compressed data takes quite a bit longer, but getting with
compressed data is actually faster... it hits that threshhold of network
speeds vs. decompression speeds.  I have a second run under the first; 5000
words (45k).

..

Generating random data using 50000 words from /usr/share/dict/words.
Generated data is 450,454 bytes in size.

Starting test for UNCOMPRESSED data... doing 20 loops...
Test completed in 1.927 seconds.

Average SET without compression: 0.04581275
Average GET without compression: 0.05032785
Maximum Set: 0.0594160000
Maximum Get: 0.0597590000
Size of datablock sent/received: 450454 bytes
With compression OFF, for SET, we get about: 9,602 kB per second
With compression OFF, for GET, we get about: 8,741 kB per second

Starting test for COMPRESSED data... doing 20 loops...
Test completed in 4.242 seconds.

Average SET with compression: 0.16641615
Average GET with compression: 0.04549155
Maximum Set: 0.1825650000
Maximum Get: 0.0505000000
Size of datablock sent/received: 206821 bytes
With compression ON, for SET, we get about: 2,643 kB per second
With compression ON, for GET, we get about: 9,670 kB per second

Compressed data is 54 % smaller than uncompressed.

.............  and now just 5000 words (45k)... it's almost identical... but
you're saving 51% of your memory in the cache server.

Generating data using 5000 words from wordlist...
Generated data is 44,780 bytes in size.

Starting test for UNCOMPRESSED data... doing 20 loops...
Test completed in 0.308 seconds.

Average SET without compression: 0.00834685
Average GET without compression: 0.00688475
Maximum Set: 0.0235120000
Maximum Get: 0.0109610000
Size of datablock sent/received: 44780 bytes
With compression OFF, for SET, we get about: 5,239 kB per second
With compression OFF, for GET, we get about: 6,352 kB per second

Turning Compression ON...
Starting test for COMPRESSED data... doing 20 loops...
Test completed in 0.478 seconds.

Average SET with compression: 0.0170215
Average GET with compression: 0.0066843
Maximum Set: 0.0217830000
Maximum Get: 0.0088190000
Size of datablock sent/received: 21853 bytes
With compression ON, for SET, we get about: 2,569 kB per second
With compression ON, for GET, we get about: 6,542 kB per second

Compressed data is 51 % smaller than uncompressed.

...... just for entertainment purposes (I hope you're not storing this
much)....

Generating data using 500000 words from wordlist...
Generated data is 4,507,165 bytes in size.

Starting test for UNCOMPRESSED data... doing 20 loops...
Test completed in 9.720 seconds.

Average SET without compression: 0.47382015
Average GET without compression: 0.0120007
Maximum Set: 0.4974120000
Maximum Get: 0.0165600000
Size of datablock sent/received: 4507165 bytes
With compression OFF, for SET, we get about: 9,289 kB per second
With compression OFF, for GET, we get about: 366,773 kB per second

Turning Compression ON...
Starting test for COMPRESSED data... doing 20 loops...
Test completed in 33.313 seconds.

Average SET with compression: 1.65426465
Average GET with compression: 0.01120345
Maximum Set: 1.7040400000
Maximum Get: 0.0135460000
Size of datablock sent/received: 2055760 bytes
With compression ON, for SET, we get about: 2,661 kB per second
With compression ON, for GET, we get about: 392,873 kB per second

Compressed data is 54 % smaller than uncompressed.

----- Original Message ----- 
From: "Brad Whitaker" <whitaker@danga.com>
To: "Justin Matlock" <jmat@shutdown.net>
Cc: <memcached@lists.danga.com>
Sent: Friday, August 15, 2003 3:35 PM
Subject: Re: [patch] Compression for Perl API


> My data set was 40 rows, roughly 200k.  The entire data set was inserted
> something like 20 times for each test (I forget exactly, I could run it
> again).  The extra 2 seconds isn't so bad when you figure that it's run
> over many iterations.  In this case 800.
>
> The benchmarks were a quick hack before I left for the day, and my
> computer isn't very fast, so I'll likely play with them later to make the
> numbers a bit more useful.
>
> I'm about to leave for the weekend, so I'll have to fully respond to this
> later.
>
> Here are some quick compression ration stats from our test machine:
>
> 1093 => 657 (0.40%)
> 5808 => 1785 (0.69%)
> 1799 => 933 (0.48%)
> 8835 => 2091 (0.76%)
> 1232 => 733 (0.41%)
> 2783 => 1352 (0.51%)
> 1360 => 739 (0.46%)
> 1349 => 771 (0.43%)
> 1925 => 988 (0.49%)
> 2035 => 1090 (0.46%)
> 1051 => 616 (0.41%)
> 1706 => 966 (0.43%)
> 2911 => 1489 (0.49%)
> 1314 => 751 (0.43%)
> 3423 => 1680 (0.51%)
> 1477 => 638 (0.57%)
> 3693 => 952 (0.74%)
> 4391 => 2247 (0.49%)
> 4671 => 2186 (0.53%)
> 5096 => 2144 (0.58%)
>
> Pretty huge savings.  I'm wondering if the 20% minimum savings constant
> should be moved up closer to 30%?
>
> --
> Brad Whitaker
> whitaker@danga.com
>
> On Fri, 15 Aug 2003, Justin Matlock wrote:
>
> > So what those results say, is it takes 2 seconds longer to retrieve
> > compressed data?  I'm very surprised at the big increase in get times
with
> > compression.  I'm not seeing anything near those numbers on the PHP
side.
> >
> > How big was your test data set, and what kind of data did it consist of?
> > (I'd like to run the same test on my code and see what comes out).
> >
> > Something else you might want to consider for the Perl client:  I now do
a
> > quick check of the first 10 or so characters of the data to be
compressed,
> > and abort the compression if it turns out the data is a JPEG, PNG, or
GIF.
> > You just have to look for '\x89PNG' for PNG, '0xffd8' for JPG, and 'GIF'
for
> > GIF files, starting from position 0.  I found that one of my
> > not-so-bright-about-compression semi-developer (he should just stick to
> > HTML) was trying to store 3k JPEG files in the memcache and couldn't
figure
> > out why compression was taking so long.
> >
> > As soon as I get full power back here, I'm going to run some similar
> > benchmarks on the PHP client, also on the PHP5 client, which uses
streams,
> > and the C/Zend API I've written that will actually compile into the PHP
> > binary.  So far, the C program is just a little faster than the
pre-compiled
> > PHP code (using the Zend Optimizer pre-compiler) leading me to believe
it's
> > going to be better to leave it as plain PHP...
> >
> > I just hope my development box comes back when Con-Ed gets around to
> > plugging us back in.  I had stolen it's UPS for one of my servers last
week,
> > and hadn't gotten around to replacing it yet (doh!).   It blew my 403
day
> > uptime... dangit. :)
> >
> > J
> >
> > ----- Original Message -----
> > From: "Brad Whitaker" <whitaker@danga.com>
> > To: <memcached@lists.danga.com>
> > Sent: Friday, August 15, 2003 2:00 PM
> > Subject: [patch] Compression for Perl API
> >
> >
> > > I've had this on my todo list for some time, but after it was
implemented
> > > in the PHP API, I couldn't let Perl get behind. :P
> > >
> > > A diff to MemCachedClient.pm is attached.  It uses Compress::Zlib to
do
> > > gzip compression on inserts and decompression on gets.  There is
currently
> > > a constant minimum compression gain of 20%.  That is, if it does the
GZIP
> > > on write and it saves less than 20%, it will insert the uncompressed
> > > version in an effort to speed up later gets.  This number should
probably
> > > be tweaked later (and maybe bet set by the user?), but for now this
seems
> > > reasonable.
> > >
> > > A new 'compress_threshold' key can now be passed to the constructor to
> > > determine the minimum size threshold before a value is compressed.
> > > Since compression is disabled by default, this key also serves as the
> > > global on/off switch.  To change its value later there is also a
> > > set_compress_threshold() method.
> > >
> > > For instances when compression should be temporarily disabled
regardless
> > > of data size (such as inserting compressed images), there is a new
> > > enable_compress() method which takes a true or false value.
> > >
> > > Curious as to how compression affected speed, I wrote up a little test
> > > inserting a set of test data repeatedly.  Below is the output of 3
runs
> > > showing the number of seconds get/set actions took with and without
> > > compression:
> > >
> > > compress: no
> > > sets: => 5.33751511573792
> > > gets: => 6.15450966358185
> > > compress: yes
> > > sets: => 5.281329870224
> > > gets: => 8.92959928512573
> > >
> > > compress: no
> > > sets: => 5.56796407699585
> > > gets: => 6.13490855693817
> > > compress: yes
> > > sets: => 5.11762177944183
> > > gets: => 8.82273375988007
> > >
> > > compress: no
> > > sets: => 5.89584612846375
> > > gets: => 6.15535402297974
> > > compress: yes
> > > sets: => 5.19342517852783
> > > gets: => 8.98201024532318
> > >
> > > After writing this mail, I checked and saw that Brad committed this
patch
> > > just before he left.  As a result, I've put the new Perl client live
at
> > > test.livejournal.org and I'm watching to make sure everything
continues to
> > > run smoothly.
> > >
> > > --
> > > Brad Whitaker
> > > whitaker@danga.com
> >
>