btree support

Sat Dec 11 16:19:55 PST 2004

On 11 Dec 2004, at 20:37, Kevin A. Burton wrote:
> I've been thinking (blue sky) about the possibility of implementing 
> Lucene on top of memcached.  This would support storing posts directly 
> in memory and spanning the query across multiple machines.

Warning:  Lucene will not like losing part of its index.  :)

> With just a basic hashtable you can only do keyword query "Show me all 
> posts with 'linux'" but not a full prefix query "Show me all posts 
> with linux%".

You wouldn't need to; lucene indexes based on suffixes and the like, 
not based on query types.  As such, it would in theory be possible to 
build a set of indexes on specific stored terms and those terms' 
variant types; however, this is not how the actual Lucene API for 
writing indexes operates.

It thinks in files, and jumps around a binary file format.  That's the 
whole of the level of granularity you get.

So what you could do is the same that gets done in the SQL 
implementation of the Lucene store - you could store the contents in 
keys, and update using those mechanisms; where the key is built from...

indexName + fileName + 'sector' + byterange...

But then the moment anything drops out of cache, you're buggered - 
because you've just lost part of a file that it depended on existing.

Not a good match, architecturally.

> BTW this is the same path that MySQL took with their memory tables.  
> They initially were just hashtables but in MySQL 4.1.x they've added 
> btree indexing for sorts, etc.
>
> Just thinking out loud.
>

FS API in Lucene exists at the completely wrong level for this kind of 
implementation, even theoretically; and the loss of part of that 
information due to cache timeout would be catastrophic regardless of 
the case.