Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Parsing .ppt, .xls, .rtf and .doc

nachonieto3

2010-04-29

Replies: Find Java Web Hosting

Author LoginPost Reply

Hello everyone,

I'm using Nutch v0.9 I'm able to crawl, fetch and parse html and .pdf. But
when I try with .ppt, .xls, .rtf and .doc I don't have any problem but when
I use SegmentReader to get the information of each url I don't find any
parsetext in these formats. I configured the plugins and I allow them to
work. This is the result that I get when I try with a .xls format
http://n3.nabble.com/forum/FileDownload.jtp?type=n&id=765912&name=untitled2.bmp

Any suggestion about what I'm doing wrong??How can I check if the plugins
are parsing??

Thank you in advance
--
Sent from the Nutch - User mailing list archive at Nabble.com.
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.