[PATCH] utf8 flag support on perl lib
Tomash Brechko
tomash.brechko at gmail.com
Thu Jan 10 17:03:13 UTC 2008
On Thu, Jan 10, 2008 at 17:29:42 +0100, Peter J. Holzer wrote:
> The same byte sequence, but not the same value. In C (on many systems)
> the single precision floating point number 3.1415927 and the integer
> 1078530011 have the same byte sequence (0xdb 0xf 0x49 0x40 on little
> endian systems), but they hardly have the same value.
OK, I've got your point, though it's more a question of a terminology.
Let me put it another way: my opinion is that C::M (and C::M::F)
itself should not save/restore UTF-8 flag. Instead, it should work
the same way other Perl data streams work. If you write a string to a
file, no magic flags are stored somewhere. Instead, when you _read_
it back you say, "alright, please set an UTF-8 flag on the data if it
looks like UTF-8 string". DBI works the same way (yes, DBD backends
actually, thanks for pointing that, but this doesn't make much
difference).
Actually, it's possible to store this flag in memcached, and _when
asked_ to set UTF-8 back, no string scan would be necessary to see if
the string is really in UTF-8. However, I think such optimization is
not worth the risk of missing some UTF-8 data that was uploaded though
some other memcached client that doesn't set any special flag, or of
setting UTF-8 flag on the string that was messed with append/prepend.
You correctly pointed that this flag is part of Perl's internals, so
it's better not to set it without additional precautions.
Of course, if the person responsible for C::M would accept
Tatsuki-san's patch, I'll reluctantly add the same functionality in
C::M::F. So let's see how it would go ;).
--
Tomash Brechko
More information about the memcached
mailing list