binary protocol motivations

Tue Nov 13 10:09:15 UTC 2007

On Mon, Nov 12, 2007 at 14:43:11 -0800, Marc wrote:
> For the record, my motivation for a binary protocol was not computational
> efficiency but more efficient I/O, especially for large sets of small keys
> with small values, AND to reduce code complexity.

That's why you should be voting against "tags" approach.  Now you have
the reply format:

  <tag> <data>
  <tag> <data>
  --- nothing for not found items
  <tag> <data>
  <tag> <data>
  <tag> <data>
  --- nothing for not found items
  <tag> <data>

Since you want to query lots of data, I guess <tag> is more than one
byte, right (otherwise you can query at most 256 items in one
streaming round)?  But what if we implement one-to-one correspondence
between requested key and response?  Let's see:

  <found> <data>
  <found> <data>
  <not_found>
  <found> <data>
  <found> <data>
  <found> <data>
  <not_found>
  <found> <data>

where <found> and <not_found> are one byte (bit would be enough, if we
can add it to some other field).  So, while with sizeof(tag) >= 2 you
have at least 2 * 6 = 12 meta bytes.  With one-to-one, you have 1 * 8
= 8.  If the hit ratio is >50%, that is.

It was said Facebook have get-intensive 99-1 applications, so I doubt
you are optimizing for a hit rate <<50%.  I also described code
complexity issues of matching keys/tags/whatever vs simple sequential
processing in other mail.

So, why would one want to have tags?

-- 
   Tomash Brechko