Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Separate Nutch(crawl) and Lucene (index/search)

sb101h

2010-04-24


Author LoginPost Reply

I have a requirement where I want to index and search file system contents
(my local server contents), and at the same time crawl a select set of
web-sites on the same search query.

I have search for my local file system implemented through Lucene. I would
like to have Nutch just crawl the web-sites and produce content, so that my
Lucene search application could index and search the web content as well. I
would like to use standalone Lucene for index/search of web-content also
because I want to use same analyzer across the two and have more control on
the search results like, say, apply different boosts to local content vs
web-content. I want to use Nutch code for crawling and retrieving web-links
of search results, but I want to do indexing/searching/analysis using Lucene
itself.

Is there a solution where only the crawling part of Nutch is taken and is
integrated with Lucene?
--
Sent from the Nutch - User mailing list archive at Nabble.com.
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.