[PATCH] utf8 flag support on perl lib
Tim Bunce
Tim.Bunce at pobox.com
Fri Jan 11 23:24:57 UTC 2008
On Fri, Jan 11, 2008 at 05:07:10PM +0300, Tomash Brechko wrote:
> On Fri, Jan 11, 2008 at 16:54:26 +0300, Tomash Brechko wrote:
> > I'd love to coordinate F_COMPRESSED flag. C::M and C::M::F currently
> > use 0x2 (0x1 is F_STORABLE).
>
> Thinking more about this, perhaps we may do the following. As I
> understand most client libraries do not export flags to the user, but
> use them internally for bookkeeping. There are 16/32 (which one?)
16 bits, per http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
<flags> is an arbitrary 16-bit unsigned integer (written out in
decimal) that the server stores along with the data and sends back
when the item is retrieved. Clients may use this as a bit field to
store data-specific information; this field is opaque to the server.
Note that in memcached 1.2.1 and higher, flags may be 32-bits, instead
of 16, but you might want to restrict yourself to 16 bits for
compatibility with older versions.
I wonder why that says "may". Does anyone know?
If it is now 32-bit then we'd know that the top 16 bits are very
unlikely to be used at the moment and so we could adopt those
for "informal standardisation" with little risk.
> flag bits total. We may separate this space into three classes:
>
> 1 common, shared among all clients. F_COMPRESSED goes here, and we
> additionally agree that the compression algorithm is deflate
> (gzip).
>
> 2 common to the language family. F_STORABLE goes here got Perl
> family.
>
> 3 common to the particular client family, i.e. private for internal
> client use. Please put F_UTF8 here ;).
It's probably premature to get into this much detail. I will make one
suggestion though:
Since information about utf8 encoding is likely to be of general use,
I'd suggest using two bits in group #1:
One to indicate the data is known to be utf8 encoded, and another to
indicate the client supports utf8 encoded data but that this data isn't.
Only if both bits are off would a client need to consider using a
"treat as utf8 if it looks like utf8" heuristic.
> Class boundaries will be decided one and for all. Class 1 will be
> maintained by memcached maintainers. Class 2 will be maintained by
> the corresponding language community. And class 3 is up to the client
> author.
Could you start by making a list of the clients and getting in touch
with their maintainers?
Would also be good, but not essential, to get official approval from the
maintainers of memcached.
Tim.
More information about the memcached
mailing list