Field separators, again

Wed Jun 8 03:39:55 PDT 2005

Evan Martin wrote:
> On 6/7/05, Paul Crowley <paul at ciphergoth.org> wrote:
> 
>>[...] I think that's
>>overkill for now; everyone should be able to parse the format here with
>>hardly any code.
> 
> 
> But see, e.g., http://www.xml.com/pub/a/2003/10/15/dive.html , which
> lists difficulty in parsing that a similar as a flaw in the LJ API.
> 

I think that most of the objections to the LiveJournal format don't
apply here:
* They don't like the request being URLencoded, but there's little
choice here since the request is described in the query string.
* They don't like the different newline formats. We're mandating plain
old \n. (must be careful with Perl on Windows treating \n as \r\n when
not in binmode, though)
* Can't represent fields with newlines in them. We have no need to do
that here.

A simple FSA-style parser could parse that format in a few dozen lines
of code maximum, and most languages have at least *some* facilities to
make that easier.

> 
>>In Python, for example, you can do it with
>>
>>dict([(lambda x, y: (x, y.rstrip()))(*l.split(':',1)) for l in f])
>>
>>What do other poeple think?
> 
> Golf time!  Haskell can almost do it with just
> map (break (':' ==)) $ lines f
> ;)
> 

map{split(/:/)}split(/\n/,$f)
;)

(removing the whitespace is cheating a little, but Perl has a reputation
for being hard to read, no?)