[PATCH] utf8 flag support on perl lib
Aaron Stone
aaron at serendipity.cx
Sat Jan 12 07:23:03 UTC 2008
On Fri, 2008-01-11 at 23:24 +0000, Tim Bunce wrote:
> On Fri, Jan 11, 2008 at 05:07:10PM +0300, Tomash Brechko wrote:
> > On Fri, Jan 11, 2008 at 16:54:26 +0300, Tomash Brechko wrote:
> > > I'd love to coordinate F_COMPRESSED flag. C::M and C::M::F currently
> > > use 0x2 (0x1 is F_STORABLE).
> >
> > Thinking more about this, perhaps we may do the following. As I
> > understand most client libraries do not export flags to the user, but
> > use them internally for bookkeeping. There are 16/32 (which one?)
>
> 16 bits, per http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
>
> <flags> is an arbitrary 16-bit unsigned integer (written out in
> decimal) that the server stores along with the data and sends back
> when the item is retrieved. Clients may use this as a bit field to
> store data-specific information; this field is opaque to the server.
> Note that in memcached 1.2.1 and higher, flags may be 32-bits, instead
> of 16, but you might want to restrict yourself to 16 bits for
> compatibility with older versions.
>
> I wonder why that says "may". Does anyone know?
>
> If it is now 32-bit then we'd know that the top 16 bits are very
> unlikely to be used at the moment and so we could adopt those
> for "informal standardisation" with little risk.
I don't personally know what the "may" means. In the binary protocol,
flags is 32 bits. I don't know about the text protocol.
"Informal standardization" works well enough if you're using a limited
set of clients that agree with each other. If the goal is to make sure
that all clients agree, so that you can read/write to the same memcache
with many language bindings, then we should just go all out on
standardizing the values and designating a live document to be the
repository of known and accepted values.
> > flag bits total. We may separate this space into three classes:
> >
> > 1 common, shared among all clients. F_COMPRESSED goes here, and we
> > additionally agree that the compression algorithm is deflate
> > (gzip).
> >
> > 2 common to the language family. F_STORABLE goes here got Perl
> > family.
> >
> > 3 common to the particular client family, i.e. private for internal
> > client use. Please put F_UTF8 here ;).
>
> It's probably premature to get into this much detail. I will make one
> suggestion though:
>
> Since information about utf8 encoding is likely to be of general use,
> I'd suggest using two bits in group #1:
> One to indicate the data is known to be utf8 encoded, and another to
> indicate the client supports utf8 encoded data but that this data isn't.
There's another 1 byte field in the binary protocol that's called "data
type" and it might make sense to reserve a value there.
> Only if both bits are off would a client need to consider using a
> "treat as utf8 if it looks like utf8" heuristic.
>
> > Class boundaries will be decided one and for all. Class 1 will be
> > maintained by memcached maintainers. Class 2 will be maintained by
> > the corresponding language community. And class 3 is up to the client
> > author.
>
> Could you start by making a list of the clients and getting in touch
> with their maintainers?
That would be super, super helpful!
> Would also be good, but not essential, to get official approval from the
> maintainers of memcached.
>
> Tim.
More information about the memcached
mailing list