[PATCH] utf8 flag support on perl lib

Tomash Brechko tomash.brechko at gmail.com
Thu Jan 10 12:56:08 UTC 2008

On Thu, Jan 10, 2008 at 12:49:28 +0100, Peter J. Holzer wrote:
> > Cache::Memcached::Fast doesn't preserve UTF-8 and tainted flags
> > either.
> Losing the utf8 flag changes the value,

Not really.  If you had an UTF-8 string, and reset the UTF-8 flag, you
still have the same byte sequence.  It's only now Perl would treat
these bytes as such, thus length($str) and regexp classes won't work
character-wise.  You may call Encode::_utf8_on($str), and everything
would get to normal.

The reason this "feature" of C::M is seldom noticed is that most
scripts just pass data back and forth, not performing any
character-wise manipulations on it.  The scripts that do care may call
utf8::upgrade() (or Encode::_utf8_on() when sure).  Note that it's
actually dangerous to enable UTF-8 flag in scripts that do not "use
utf8;" or "use encoding 'utf-8';".  There actually may be a mix of
scripts each using it's own encoding (most often case is when the
script does not use any encoding and treat everything as bytes).
That's why DBI enables UTF-8 only when requested.

> > Besides, more often than not you use memcached client together with
> > some other means to get the data if it's missing from the cache.
> > While you may fix C::M(::F), not every other backend preserves these
> > flags automatically.
> These are the primary means for storing your data. If they can't handle
> your data you've got a problem :-). When you design your application you
> know (or should know) what data you want to store and choose your data
> model and storage system accordingly. If MySQL can't do it, use Oracle
> (or vice versa); if a varchar column can't do it, use a blob; etc.

Being UTF-8 or not is not a property of the data, but a property of
how you work with this data, so MySQL or Oracle has nothing to do with
this.  Just enable the flag if you want to work with characters, do
nothing if bytes are fine.  But it's up to the script to decide which
one is desired.

   Tomash Brechko

More information about the memcached mailing list