[PATCH] utf8 flag support on perl lib

Sat Jan 12 16:09:08 UTC 2008

On Sat, Jan 12, 2008 at 13:38:38 +0100, Peter J. Holzer wrote:
> * A floating point value: Same as the integers, except that on at least
>   some perl implementations the FP -> string -> FP conversion may lose
>   some precision (but that's a bug in perl).

It's not a bug in Perl, but common user misunderstanding.  Being
dynamically typed language is different from having automatic
conversion into strings.  When you store floats to file you can either
stringify it, or pack() it, depending on what you are trying to
achieve.  _Stringification_ is simply not a substitute for
_serialization_.  C::M could just use Storable in all cases, but there
are actually two APIs in it: one is raw, byte-oriented, Perl-unaware
(passing scalar _buffers_), and one is Perl aware (passing references,
including references to scalars, they are not an exception).  It's a
mere coincidence that some scalar types are automatically converted to
such "octet buffers".  And that's why proposed solution has to call
Perl internal functions---it solves the wrong problem in a wrong way.

But this misunderstanding is very common in Perl world, and I myself
put $memd->set('key', 123); in example section of C::M::F (thankfully
it isn't 1.23 :)).  And since the reality is how one observers it,
there's no point in trying to change this.  If there are users who
demand the functionality, let's have it, wrong as it is ;).  As long
as it is disabled by default everyone should be satisfied.

I'm adding the following to my TODO list for C::M::F (can't help with
C::M, sorry):

  Add constructor parameter

    encoding => 'preserve' | 'force' (default: undef, i.e. neither)

'preserve' would mean "save on store and restore on fetch", 'force'
would mean "forcefully make Perl think it's a text string" (this would
be needed for the scenarios I outlined earlier, i.e. when the side
that does the store doesn't set any flag but fetching side is
confident that fetched data is the text in the right encoding).

Peter, could you please enlighten me with the expert opinion if I should
use Encode::is_utf8 and Encode::_utf8_on like Tatsuki-san did, or have
I use utf8::is_utf8(), utf8::upgrade(), utf8::downgrade(), to get
_any_ encoding work, not just UTF-8?  Would be nice if you outline the
algorithm then.  Thanks in advance!

Finally, I'd like to note that addition to the TODO list is not a
promise to implement the feature.  The matter requires more
consideration.

-- 
   Tomash Brechko