dotLucene Search Engine using C# and IFilter
Creating a search engine for a website using OpenSource systems is highly painful, even though it doesn't need to be.
It was my task (prescribed by myself) recently to create a search engine for an internal website for basically no money. Because the application will be built in ASP.Net, the code had to be in C#.
Searching around the net, I found a promising project by the Apache project called "Lucene". Lucene looked very powerful, and most of all, was OpenSource. The only problem with Apache's Lucene is that it's all in Java (which I don't want to use).
So doing a search for "Lucene C#", I came across several projects - both of which are dead:
- Lucene.Net is now commercial software
- NLucene is basically dead.
To cut a long story short, dotLucene did not work for me out-of-the-box, and its programming interfaces appear to have inherited the pain of the Java version. Plus dotLucene could not index Word or PDF files out-of-the-box (it only did HTML).
Things I tried to get dotLucene to work:
- read a promising article by dotLucene's author
- downloaded the dotLucene Demo
- Read the documentation on parsing Word and PDF files
- rammed my head against a thick wall
Note: this source code uses sources taken from all-over-the-place. I do not assert any of my own intellectual rights to any of this.
Labels: programming
0 Comments:
Post a Comment
<< Home