Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Parsing html

nachonieto3

2010-05-04


Author LoginPost Reply

Good afternoon,

Once I solved my problem with the other formats. Now I'm trying to figure
out how to solve another one.
I'm able to parse .html format but I get the ParseText in one line. I would
like to respect at least the paragraphs of the original document. Anyone
know how to do it?
Thank you in advance.
--
Sent from the Nutch - User mailing list archive at Nabble.com.
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.