Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

Error Parsing JavaScript

Mohamed Parvez

2009-09-11

Replies: Find Java Web Hosting

Author LoginPost Reply
I am getting this error :
--------------------------------
fetching
http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js
Error parsing:
http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js: *
UNKNOWN!(-53,0):* Content not JavaScript: 'application/javascript'


I have this, In the file parse-plugins.xml :
---------------------------------------------------------
  <mimeType name="application/x-javascript">
    <plugin id="parse-js" />
  </mimeType>

  <mimeType name="application/javascript">
    <plugin id="parse-js" />
  </mimeType>


I have this, in the nutch-site.xml :
------------------------------------------------
<name>plugin.includes</name>
<value>field-add|protocol-http|urlfilter-regex|parse-(text|html|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-js|suffix-urlfilter</value>
</property>

I am using the command :
-------------------------------------
bin/nutch crawl urls -depth 10 >crawl.log


I am using this in the urls/seed.txt :
---------------------------------------------------
http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb=true&_pageLabel=SMBPortal_page_main_support

Thanks/Regards,
Parvez
©2008 java2.5341.com - Jax Systems, LLC, U.S.A.