LDSE stands for "Local Domain Serach Engine".
This program is a result of my diploma thesis (with the same title). It is a distributed search engine. Each node should be responsible for indexing and querying a local node (ideally this is a single web server). The nodes are connected in a hierachical way. Every super node can execute a query with its own index and it can query all (or a subset) of its sub nodes. This done by determining the nodes which can give the best results for the query.
A complete description can be found in my diploma thesis.
Special features are:
- distributed search engines
- uses any relational database
- tested with InstantDB and Oracle Lite - tolerant against writing errors and other words formes - separated data server and data gatherer - can support many file formats via plugin mechanism, supported are
- plain text
- ZIP and gzip files - can gather data via HTTP or from local file system
- HTTP spider is resistent against loops (if a document links against itself, but in another path)
- HTTP Spider is resitsnet againss HTML errors (like missing "'s in parameters or non-quoted &'s)
For the fault tolerance, a so-called "trigram index" is used. This index takes
all trigrams (3-letter-combinations) which a word contains, and stores this
information in a reverse index. From the words to the documents there is another
This gives a high speed for queries and a tolerance against mis-written words. It can also find substrings in words.
For ideas, bug reports and everything else don't hesitate to contact me: email@example.com
The official web page can be found under www.hendriklipka.de.
This program is distributed under the GPL (see License).