Posts Tagged ‘lucene’

Editing an index created by nutch

Thursday, November 20th, 2008

I have started to work with the Lucene java search engine and its Nutch Web crawler. My needs are a little bit special so after having nutch crawl Web sites I wan’t to run my own program that cleans up unrelevant documents from the index.

Using the Lucene API jar that is pritty straight forward but you have to be carefull of using the right version of Lucene. I couldn’t find anything info about the supported version of Lucene on the Nutch site but after trying several 2.1.0, http://archive.apache.org/dist/lucene/java/lucene-2.1.0-src.zip, seems to be the correct one for Nutch 0.9