- Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Sponsored Links

Latest News
  General News
  Press Releases
  Off Topic

Back to files

LDSE stands for "Local Domain Serach Engine".

This program is a result of my diploma thesis (with the same title). It is a distributed search engine. Each node should be responsible for indexing and querying a local node (ideally this is a single web server). The nodes are connected in a hierachical way. Every super node can execute a query with its own index and it can query all (or a subset) of its sub nodes. This done by determining the nodes which can give the best results for the query.

A complete description can be found in my diploma thesis.

Special features are:
- distributed search engines
- uses any relational database

  • tested with InstantDB and Oracle Lite - tolerant against writing errors and other words formes - separated data server and data gatherer - can support many file formats via plugin mechanism, supported are
  • PDF
  • HTML
  • plain text
  • ZIP and gzip files - can gather data via HTTP or from local file system
  • HTTP spider is resistent against loops (if a document links against itself, but in another path)
  • HTTP Spider is resitsnet againss HTML errors (like missing "'s in parameters or non-quoted &'s)

For the fault tolerance, a so-called "trigram index" is used. This index takes all trigrams (3-letter-combinations) which a word contains, and stores this information in a reverse index. From the words to the documents there is another reverse index.
This gives a high speed for queries and a tolerance against mis-written words. It can also find substrings in words.

For ideas, bug reports and everything else don't hesitate to contact me:

The official web page can be found under

This program is distributed under the GPL (see License).

Hendrik Lipka

Sponsored Links

Discussion Groups
  Networking / Security

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 All rights reserved.