[PATCH] utf8 flag support on perl lib

Aaron Stone aaron at serendipity.cx
Sat Jan 12 07:23:03 UTC 2008


On Fri, 2008-01-11 at 23:24 +0000, Tim Bunce wrote:
> On Fri, Jan 11, 2008 at 05:07:10PM +0300, Tomash Brechko wrote:
> > On Fri, Jan 11, 2008 at 16:54:26 +0300, Tomash Brechko wrote:
> > > I'd love to coordinate F_COMPRESSED flag.  C::M and C::M::F currently
> > > use 0x2 (0x1 is F_STORABLE).
> > 
> > Thinking more about this, perhaps we may do the following.  As I
> > understand most client libraries do not export flags to the user, but
> > use them internally for bookkeeping.  There are 16/32 (which one?)
> 
> 16 bits, per http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
> 
>   <flags> is an arbitrary 16-bit unsigned integer (written out in
>   decimal) that the server stores along with the data and sends back
>   when the item is retrieved. Clients may use this as a bit field to
>   store data-specific information; this field is opaque to the server.
>   Note that in memcached 1.2.1 and higher, flags may be 32-bits, instead
>   of 16, but you might want to restrict yourself to 16 bits for
>   compatibility with older versions.
> 
> I wonder why that says "may". Does anyone know?
> 
> If it is now 32-bit then we'd know that the top 16 bits are very
> unlikely to be used at the moment and so we could adopt those
> for "informal standardisation" with little risk.

I don't personally know what the "may" means. In the binary protocol,
flags is 32 bits. I don't know about the text protocol.

"Informal standardization" works well enough if you're using a limited
set of clients that agree with each other. If the goal is to make sure
that all clients agree, so that you can read/write to the same memcache
with many language bindings, then we should just go all out on
standardizing the values and designating a live document to be the
repository of known and accepted values.

> > flag bits total.  We may separate this space into three classes:
> > 
> >  1 common, shared among all clients.  F_COMPRESSED goes here, and we
> >    additionally agree that the compression algorithm is deflate
> >    (gzip).
> > 
> >  2 common to the language family.  F_STORABLE goes here got Perl
> >    family.
> > 
> >  3 common to the particular client family, i.e. private for internal
> >    client use.  Please put F_UTF8 here ;).
> 
> It's probably premature to get into this much detail. I will make one
> suggestion though:
> 
> Since information about utf8 encoding is likely to be of general use,
> I'd suggest using two bits in group #1:
> One to indicate the data is known to be utf8 encoded, and another to
> indicate the client supports utf8 encoded data but that this data isn't.

There's another 1 byte field in the binary protocol that's called "data
type" and it might make sense to reserve a value there.

> Only if both bits are off would a client need to consider using a
> "treat as utf8 if it looks like utf8" heuristic.
> 
> > Class boundaries will be decided one and for all.  Class 1 will be
> > maintained by memcached maintainers.  Class 2 will be maintained by
> > the corresponding language community.  And class 3 is up to the client
> > author.
> 
> Could you start by making a list of the clients and getting in touch
> with their maintainers?

That would be super, super helpful!

> Would also be good, but not essential, to get official approval from the
> maintainers of memcached.
> 
> Tim.



More information about the memcached mailing list