+------------------------------------------------+ | Suffix tree implementation library version 1.2 | +------------------------------------------------+
Fabien Menemenlis 2001-07-19
nihilist@dead-inside.org
This archive contains an implementation of a fast indexing library based on suffix trees. It features the basic functions you can expect from this kind of library: storing, retrieving, deleting and dumping/loading the database. All work is done on disk except for a custom cache used to save a few read/writes on nodes and thus requires fast disks but little memory. A performance of about 500 inserts per second has been achieved on an array of 3 * 18 GB 10000 rpm SCSI disks. Indexing time is also linear and lookup is extremely fast (requires only n + 2 disk access at most when looking up a word of n letters).
It has been developped on FreeBSD/gcc but should be fairly portable.
The source code "testsfx.c" show an example of how to use the library both for inserting, retrieving, and deleting data. There aren't many functions and comments should be enough to give you an idea of how to use the library. (read the header of the source file)
You should edit sfxdisk.h to suit your needs: you can change the alphabet size and the offset type. It should be OK to use "long long" 64 bits ints instead of long, in fact I tested it succesfully but haven't gone to the point of filling more than 2 GB of data (needless to say you need a 64 bits filesystem).
Two "tools" come with the library (new with version 1.2): dumpsfx and loadsfx.
dumpsfx is used to dump the database: dumpsfx <suffix tree file> [-s separator]
if you want to output the result as readable text or
dumpsfx <file.sfx> -h
to output it for reloading with loadsfx.
dumpsfx outputs on stdout and loadsfx reads from stdin.
loadsfx <suffix tree file to create> < dumped_file
Please note that deleting does "free" any space for future inserts, dumping
and reloading the database is necessary to retrieve freed space. This could
change in the future.
There are still a few french variables names, sorry about that, I'll translate
them for the next release.
There's still room for improvement... like partial key lookup. Contribute if
you can!
Any feedback will be greatly appreciated, mail me at nihilist@dead-inside.org
The library is provided under the GPL license. See COPYING file for more info.
