Java Mailing List Archive

Home » nutch-user.lucene »

Parsing html



Author LoginPost Reply

Good afternoon,

Once I solved my problem with the other formats. Now I'm trying to figure
out how to solve another one.
I'm able to parse .html format but I get the ParseText in one line. I would
like to respect at least the paragraphs of the original document. Anyone
know how to do it?
Thank you in advance.
Sent from the Nutch - User mailing list archive at
©2008 - Jax Systems, LLC, U.S.A.