Thursday, December 01, 2005

dotLucene Search Engine using C# and IFilter

Creating a search engine for a website using OpenSource systems is highly painful, even though it doesn't need to be.

It was my task (prescribed by myself) recently to create a search engine for an internal website for basically no money. Because the application will be built in ASP.Net, the code had to be in C#.

Searching around the net, I found a promising project by the Apache project called "Lucene". Lucene looked very powerful, and most of all, was OpenSource. The only problem with Apache's Lucene is that it's all in Java (which I don't want to use).

So doing a search for "Lucene C#", I came across several projects - both of which are dead:
It took me a lot more searching to find "dotLucene" which is still OpenSource, and is still active.

To cut a long story short, dotLucene did not work for me out-of-the-box, and its programming interfaces appear to have inherited the pain of the Java version. Plus dotLucene could not index Word or PDF files out-of-the-box (it only did HTML).

Things I tried to get dotLucene to work:
Through much trial-and-error, I finally got something that appears to "Just Work".


Note: this source code uses sources taken from all-over-the-place. I do not assert any of my own intellectual rights to any of this.

Labels:

0 Comments:

Post a Comment

<< Home