[PATCH] utf8 flag support on perl lib

Thu Jan 10 17:03:13 UTC 2008

On Thu, Jan 10, 2008 at 17:29:42 +0100, Peter J. Holzer wrote:
> The same byte sequence, but not the same value. In C (on many systems)
> the single precision floating point number 3.1415927 and the integer
> 1078530011 have the same byte sequence (0xdb 0xf 0x49 0x40 on little
> endian systems), but they hardly have the same value.

OK, I've got your point, though it's more a question of a terminology.

Let me put it another way: my opinion is that C::M (and C::M::F)
itself should not save/restore UTF-8 flag.  Instead, it should work
the same way other Perl data streams work.  If you write a string to a
file, no magic flags are stored somewhere.  Instead, when you _read_
it back you say, "alright, please set an UTF-8 flag on the data if it
looks like UTF-8 string".  DBI works the same way (yes, DBD backends
actually, thanks for pointing that, but this doesn't make much
difference).

Actually, it's possible to store this flag in memcached, and _when
asked_ to set UTF-8 back, no string scan would be necessary to see if
the string is really in UTF-8.  However, I think such optimization is
not worth the risk of missing some UTF-8 data that was uploaded though
some other memcached client that doesn't set any special flag, or of
setting UTF-8 flag on the string that was messed with append/prepend.
You correctly pointed that this flag is part of Perl's internals, so
it's better not to set it without additional precautions.

Of course, if the person responsible for C::M would accept
Tatsuki-san's patch, I'll reluctantly add the same functionality in
C::M::F.  So let's see how it would go ;).

-- 
   Tomash Brechko