What is a valid key?

Thu Dec 20 19:34:33 UTC 2007

This is pretty verbose, but hopefully will cut way down on this FAQ:

  Keys are limited in length to 250 octets. Octets in the key MUST NOT
  have value 0x20 or less, nor value 0x7F (corresponding to ASCII space
  and all control characters below it, and ASCII del, respectively).
  Octets MAY have their 'high bits' set.

  Note: The UTF-8 character encoding produces output octets which meet
  these requirements. Please be aware that some characters may be
  represented as more than one octet. Refer to your language's string
  length functions to ensure that you are producing keys of 250 or fewer
  _octets_ and not simply 250 or fewer _characters_.

I forgot about that ascii 127 deal until I re-read 'man ascii' just now.
I assume we need to restrict that, too, so I put it in the text above.

Do you think this text still inadvertently suggests that we require UTF-8?

Aaron

On Thu, Dec 20, 2007, Kieran Benton <kieran.benton at synchro.co.uk> said:

> Point taken - that was something I hadn't considered.
> 
> I still think it's a good idea to add a footnote into that section of
> the docs to note that UTF8 is a "safe" encoding to use since it is so
> popular in western systems and many devs might not necessarily know if
> it fulfills the criteria (I certainly didn't from a brief scan). 
> 
> This is of course if its decided by the end of this thread that it can
> be used generically! :)
> 
> Cheers,
> Kieran
> 
> -----Original Message-----
> From: Steven Grimm [mailto:sgrimm at facebook.com] 
> Sent: 20 December 2007 18:32
> To: Kieran Benton
> Cc: Dustin Sallings; a.; memcached at lists.danga.com
> Subject: Re: What is a valid key?
> 
> On Dec 20, 2007, at 10:14 AM, Kieran Benton wrote:
>> Are we saying that as long as you use UTF-8 for the key, and that it  
>> is
>> not longer that 250 bytes, then all is fine with both text and binary
>> protocols? If so then I think we should update the docs to say so  
>> and be
>> happy :)
> 
> It has nothing to do with UTF-8. There is no good reason to specify  
> that in the documentation. It's just a bunch of bytes (or octets, if  
> you prefer) with some specific byte values forbidden. The server does  
> not check the bytes in the key to make sure they form valid UTF-8  
> sequences. You can use ASCII or UTF-8 or ISO-8859-1 or ISO-8859-5 or  
> KOI-8 or GB-18030 or a random-number generator, so long as you avoid  
> the forbidden bytes. It does not even have to be a human-readable key;  
> it could be a raw hash value with certain bytes escaped. (Though  
> obviously that makes ad-hoc debugging a bit painful.)
> 
> If we say "keys can be UTF-8" in the documentation, then some poor  
> Russian programmer, say, who is otherwise working in KOI-8 encoding is  
> going to add unnecessary code to a client library to transform KOI-8  
> to UTF-8 so as to comply with the protocol spec.
> 
> -Steve
> 

--