What is a valid key?
sgrimm at facebook.com
Thu Dec 20 19:36:14 UTC 2007
That seems fine to me, but we don't actually need to forbid 0x7F.
Memcached doesn't do anything special with that byte.
On Dec 20, 2007, at 11:34 AM, Aaron Stone wrote:
> This is pretty verbose, but hopefully will cut way down on this FAQ:
> Keys are limited in length to 250 octets. Octets in the key MUST NOT
> have value 0x20 or less, nor value 0x7F (corresponding to ASCII space
> and all control characters below it, and ASCII del, respectively).
> Octets MAY have their 'high bits' set.
> Note: The UTF-8 character encoding produces output octets which meet
> these requirements. Please be aware that some characters may be
> represented as more than one octet. Refer to your language's string
> length functions to ensure that you are producing keys of 250 or
> _octets_ and not simply 250 or fewer _characters_.
> I forgot about that ascii 127 deal until I re-read 'man ascii' just
> I assume we need to restrict that, too, so I put it in the text above.
> Do you think this text still inadvertently suggests that we require
> On Thu, Dec 20, 2007, Kieran Benton <kieran.benton at synchro.co.uk>
>> Point taken - that was something I hadn't considered.
>> I still think it's a good idea to add a footnote into that section of
>> the docs to note that UTF8 is a "safe" encoding to use since it is so
>> popular in western systems and many devs might not necessarily know
>> it fulfills the criteria (I certainly didn't from a brief scan).
>> This is of course if its decided by the end of this thread that it
>> be used generically! :)
>> -----Original Message-----
>> From: Steven Grimm [mailto:sgrimm at facebook.com]
>> Sent: 20 December 2007 18:32
>> To: Kieran Benton
>> Cc: Dustin Sallings; a.; memcached at lists.danga.com
>> Subject: Re: What is a valid key?
>> On Dec 20, 2007, at 10:14 AM, Kieran Benton wrote:
>>> Are we saying that as long as you use UTF-8 for the key, and that it
>>> not longer that 250 bytes, then all is fine with both text and
>>> protocols? If so then I think we should update the docs to say so
>>> and be
>>> happy :)
>> It has nothing to do with UTF-8. There is no good reason to specify
>> that in the documentation. It's just a bunch of bytes (or octets, if
>> you prefer) with some specific byte values forbidden. The server does
>> not check the bytes in the key to make sure they form valid UTF-8
>> sequences. You can use ASCII or UTF-8 or ISO-8859-1 or ISO-8859-5 or
>> KOI-8 or GB-18030 or a random-number generator, so long as you avoid
>> the forbidden bytes. It does not even have to be a human-readable
>> it could be a raw hash value with certain bytes escaped. (Though
>> obviously that makes ad-hoc debugging a bit painful.)
>> If we say "keys can be UTF-8" in the documentation, then some poor
>> Russian programmer, say, who is otherwise working in KOI-8 encoding
>> going to add unnecessary code to a client library to transform KOI-8
>> to UTF-8 so as to comply with the protocol spec.
More information about the memcached