[PATCH] utf8 flag support on perl lib

Tim Bunce Tim.Bunce at pobox.com
Fri Jan 11 23:24:57 UTC 2008


On Fri, Jan 11, 2008 at 05:07:10PM +0300, Tomash Brechko wrote:
> On Fri, Jan 11, 2008 at 16:54:26 +0300, Tomash Brechko wrote:
> > I'd love to coordinate F_COMPRESSED flag.  C::M and C::M::F currently
> > use 0x2 (0x1 is F_STORABLE).
> 
> Thinking more about this, perhaps we may do the following.  As I
> understand most client libraries do not export flags to the user, but
> use them internally for bookkeeping.  There are 16/32 (which one?)

16 bits, per http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt

  <flags> is an arbitrary 16-bit unsigned integer (written out in
  decimal) that the server stores along with the data and sends back
  when the item is retrieved. Clients may use this as a bit field to
  store data-specific information; this field is opaque to the server.
  Note that in memcached 1.2.1 and higher, flags may be 32-bits, instead
  of 16, but you might want to restrict yourself to 16 bits for
  compatibility with older versions.

I wonder why that says "may". Does anyone know?

If it is now 32-bit then we'd know that the top 16 bits are very
unlikely to be used at the moment and so we could adopt those
for "informal standardisation" with little risk.

> flag bits total.  We may separate this space into three classes:
> 
>  1 common, shared among all clients.  F_COMPRESSED goes here, and we
>    additionally agree that the compression algorithm is deflate
>    (gzip).
> 
>  2 common to the language family.  F_STORABLE goes here got Perl
>    family.
> 
>  3 common to the particular client family, i.e. private for internal
>    client use.  Please put F_UTF8 here ;).

It's probably premature to get into this much detail. I will make one
suggestion though:

Since information about utf8 encoding is likely to be of general use,
I'd suggest using two bits in group #1:
One to indicate the data is known to be utf8 encoded, and another to
indicate the client supports utf8 encoded data but that this data isn't.

Only if both bits are off would a client need to consider using a
"treat as utf8 if it looks like utf8" heuristic.

> Class boundaries will be decided one and for all.  Class 1 will be
> maintained by memcached maintainers.  Class 2 will be maintained by
> the corresponding language community.  And class 3 is up to the client
> author.

Could you start by making a list of the clients and getting in touch
with their maintainers?

Would also be good, but not essential, to get official approval from the
maintainers of memcached.

Tim.


More information about the memcached mailing list