binary protocol notes from the facebook hackathon

Tres Seaver tseaver at
Wed Jul 11 17:08:07 UTC 2007

Hash: SHA1

‏Alex Stapleton wrote:
> On 11 Jul 2007, at 16:35, Tres Seaver wrote:
>> Hash: SHA1
>> marc at wrote:
>>> Hi everyone,
>>> I'm happy to see a nice and compact result with zero bloat.  I'm also
>>> happy you guys kept alignment within the request/response struct and
>>> that would help performance.
>>> I see byte ordering is mentioned twice;  the length field both in the
>>> request and response.
>>> While network byte ordering (Big Endian) is traditionally the 'right'
>>> thing to do (or the default thing to do), in most cases it's a minor
>>> performance hit due to constant swapping.  Since we're implementing a
>>> binary protocol specifically to avoid/minimize minor performance hits
>>> and since this is a brand new protocol I would recommend to keep all
>>> values as Little Endian because:
>>> - It's easier that all values are kept to a the same endianess;  
>>> reduces
>>> confusion.
>> Heh, agreed.  Any numeric value larger than one byte should be in
>> network order, which removes the confusion. ;)
>>> - Nowadays MOST (but obviously not all) servers are running little
>>> endian.   So this saves byte swapping for most people's cases and  
>>> thus a
>>> few cycles are spared on each request -- isn't that the whole  
>>> point? ;)
>> - -1.  Burden of proof is on those wanting host order to show  
>> *measured*
>> overhead on real workloads.
> It's 1 single extra instruction (BSWAP) to convert each multibyte  
> value. So the overhead is rather low.
> My quick benchmark on this managed to do 20,000,000,000 htonls()  
> (implemented as BSWAP) in 0.88 seconds.
> On one hand it's almost no performance hit, on the other,  
> intentionally adding any performance penalty seems like a bad call.  
> It would make implementation somewhat simpler to only support network  
> ordering, and supporting both orders is probably not going to be  
> justified the performance gains, which I imagine will be close to 0.

Thanks for quantifying.  44 picoseconds per command seems pretty
tolerable overhead to me.

> +1 for network ordering only. (And I'm an Intel user ;)

Agreed.  Given the possibility of pipeline stall on modern CPUs, it is
quite credible that network-only implementation is faster, even on
Intel, than one which sniffs the magic byte to determine whether to *do*
the swapping.

- --
Tres Seaver          +1 540-429-0999          tseaver at
Palladion Software   "Excellence by Design"
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


More information about the memcached mailing list