What is a valid key?

Steven Grimm sgrimm at facebook.com
Thu Dec 20 19:36:14 UTC 2007


That seems fine to me, but we don't actually need to forbid 0x7F.  
Memcached doesn't do anything special with that byte.

-Steve


On Dec 20, 2007, at 11:34 AM, Aaron Stone wrote:

> This is pretty verbose, but hopefully will cut way down on this FAQ:
>
>  Keys are limited in length to 250 octets. Octets in the key MUST NOT
>  have value 0x20 or less, nor value 0x7F (corresponding to ASCII space
>  and all control characters below it, and ASCII del, respectively).
>  Octets MAY have their 'high bits' set.
>
>  Note: The UTF-8 character encoding produces output octets which meet
>  these requirements. Please be aware that some characters may be
>  represented as more than one octet. Refer to your language's string
>  length functions to ensure that you are producing keys of 250 or  
> fewer
>  _octets_ and not simply 250 or fewer _characters_.
>
> I forgot about that ascii 127 deal until I re-read 'man ascii' just  
> now.
> I assume we need to restrict that, too, so I put it in the text above.
>
> Do you think this text still inadvertently suggests that we require  
> UTF-8?
>
> Aaron
>
>
> On Thu, Dec 20, 2007, Kieran Benton <kieran.benton at synchro.co.uk>  
> said:
>
>> Point taken - that was something I hadn't considered.
>>
>> I still think it's a good idea to add a footnote into that section of
>> the docs to note that UTF8 is a "safe" encoding to use since it is so
>> popular in western systems and many devs might not necessarily know  
>> if
>> it fulfills the criteria (I certainly didn't from a brief scan).
>>
>> This is of course if its decided by the end of this thread that it  
>> can
>> be used generically! :)
>>
>> Cheers,
>> Kieran
>>
>> -----Original Message-----
>> From: Steven Grimm [mailto:sgrimm at facebook.com]
>> Sent: 20 December 2007 18:32
>> To: Kieran Benton
>> Cc: Dustin Sallings; a.; memcached at lists.danga.com
>> Subject: Re: What is a valid key?
>>
>> On Dec 20, 2007, at 10:14 AM, Kieran Benton wrote:
>>> Are we saying that as long as you use UTF-8 for the key, and that it
>>> is
>>> not longer that 250 bytes, then all is fine with both text and  
>>> binary
>>> protocols? If so then I think we should update the docs to say so
>>> and be
>>> happy :)
>>
>> It has nothing to do with UTF-8. There is no good reason to specify
>> that in the documentation. It's just a bunch of bytes (or octets, if
>> you prefer) with some specific byte values forbidden. The server does
>> not check the bytes in the key to make sure they form valid UTF-8
>> sequences. You can use ASCII or UTF-8 or ISO-8859-1 or ISO-8859-5 or
>> KOI-8 or GB-18030 or a random-number generator, so long as you avoid
>> the forbidden bytes. It does not even have to be a human-readable  
>> key;
>> it could be a raw hash value with certain bytes escaped. (Though
>> obviously that makes ad-hoc debugging a bit painful.)
>>
>> If we say "keys can be UTF-8" in the documentation, then some poor
>> Russian programmer, say, who is otherwise working in KOI-8 encoding  
>> is
>> going to add unnecessary code to a client library to transform KOI-8
>> to UTF-8 so as to comply with the protocol spec.
>>
>> -Steve
>>
>
> -- 
>
>
>



More information about the memcached mailing list