1、java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

  1. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
  2. at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:108)
  3. at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
  4. at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
  5. at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
  6. at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
  7. at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
  8. at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
  9. at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
  10. at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
  11. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  12. at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
  13. Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
  14. at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  15. at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  16. at java.security.AccessController.doPrivileged(Native Method)
  17. at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  18. at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  19. at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  20. at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  21. ... 11 more
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:108)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 11 more

官方文档说明如下:

  1. N.B. It's possible to encounter the following exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; this is caused by the fact that sometimes the hbase TEST jar is deployed in the lib dir. To resolve this just copy the lib over from your installed HBase dir into the build lib dir. (This issue is currently in progress).
N.B. It's possible to encounter the following exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; this is caused by the fact that sometimes the hbase TEST jar is deployed in the lib dir. To resolve this just copy the lib over from your installed HBase dir into the build lib dir. (This issue is currently in progress).

解决方法:

我们把$HBASE_HOME/lib下的所有包,拷贝到$NUTCH_HOME/runtime/local/lib目录下。运行即可

2、java.lang.NoSuchMethodError:org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V

HBASE官方JIRA BUG编号:HBASE-8273

这个是HBASE-5357引入的问题,原因是HBASE-5357将HColumnDescriptor.setMaxVersions
返回值修改成返回​​HColumnDescriptor,而不是返回void,所以改变了​​HColumnDescriptor setMaxVersions
方法的签名。所以它只会得到与Integer.intValue编译仍然不会找到setMaxVersions(INT)

Cloudera 官网说明

  1. Column family manipulations are binary-incompatible between CDH4.2 and CDH4.0/CDH4.1
  2. Because of HBASE-5357, code compiled against CDH4.0 and CDH4.1 will fail with java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V, if used with the CDH4.2 libraries. The reason is that the setter methods in HColumnDescriptor were modified to return HColumnDescriptor instead of void, which changes their signature. Code that only does data manipulations, using the HTable class, will still work without recompilation.
  3. Bug: HBASE-8273
  4. Severity: Medium
  5. Anticipated Resolution: None planned; use workaround.
  6. Workaround: Code compiled against CDH4.0 and 4.1 that uses HColumnDescriptor must be recompiled against CDH4.2 in order to work with the CDH4.2 libraries. Code compiled against CDH4.0 and CDH4.1 running with those libraries does not have this problem.
Column family manipulations are binary-incompatible between CDH4.2 and CDH4.0/CDH4.1
Because of HBASE-5357, code compiled against CDH4.0 and CDH4.1 will fail with java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V, if used with the CDH4.2 libraries. The reason is that the setter methods in HColumnDescriptor were modified to return HColumnDescriptor instead of void, which changes their signature. Code that only does data manipulations, using the HTable class, will still work without recompilation. Bug: HBASE-8273 Severity: Medium Anticipated Resolution: None planned; use workaround. Workaround: Code compiled against CDH4.0 and 4.1 that uses HColumnDescriptor must be recompiled against CDH4.2 in order to work with the CDH4.2 libraries. Code compiled against CDH4.0 and CDH4.1 running with those libraries does not have this problem.

原因:这边我使用的hadoop和hbase启动是没有问题的,也就是说是gora-hbase插件的问题

解决方法:

将gora-hbase插件中涉及使用到HColumnDescriptor的代码重新编译可解决。

具体要编译那些类后续会列出

3、java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore

  1. hadoop@nutch1:/data/projects/apache-nutch-2.2.1/runtime/local$ bin/nutch crawl urls/seed.txt -dir crawl -depth 3 -topN 5
  2. Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore
  3. at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  4. at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  5. at java.security.AccessController.doPrivileged(Native Method)
  6. at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  7. at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  8. at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  9. at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
  10. at java.lang.Class.forName0(Native Method)
  11. at java.lang.Class.forName(Class.java:188)
  12. at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:89)
  13. at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:73)
  14. at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
  15. at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
  16. at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
  17. at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
  18. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  19. at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
hadoop@nutch1:/data/projects/apache-nutch-2.2.1/runtime/local$ bin/nutch crawl urls/seed.txt -dir crawl -depth 3 -topN 5
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:188)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:89)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:73)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

解决方法:

方法1:下载gora-0.3,然后对该目录下的gora-hbase进行编译生成gora-hbase.jar,然后将jar包放到$NUTCH/runtime/local/lib目录下

方法2:修改$NUTCH_HOME/ivy/ivy.xml

将<dependency org="org.apache.gora" name="gora-hbase"
rev="0.3" conf="*->default" />去掉注释。然后再重新编译一次。这样ivy会为你生成gora-hbase的插件

4、java.lang.NullPointerException

  1. java.lang.NullPointerException
  2. at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
  3. at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
  4. at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
  5. at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
  6. at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
  7. at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 java.lang.NullPointerException
at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

查看GeneratorReducer第100行代码如下:

batchId = newUtf8(conf.get(GeneratorJob.BATCH_ID));

可以看到是获取GeneratorJob.BATCH_ID。也就是generate.batch.id这个值的时候报空了!

解决方法:

方法1:
在nutch-site.xml中添加generate.batch.id配置项,value不为空即可;但是这种做法不是很好,因为查看源码里面batchId是用随机数生成的。可能有其他地方有限制。
方法2:

修改GeneratorJob中的public Map<String,Object> run(Map<String,Object> args) 方法。

添加以下三行

  1. // generate batchId
  2. int randomSeed = Math.abs(new Random().nextInt());
  3. String batchId = (curTime / 1000) + "-" + randomSeed;
  4. getConf().set(BATCH_ID, batchId);
 // generate batchId
int randomSeed = Math.abs(new Random().nextInt());
String batchId = (curTime / 1000) + "-" + randomSeed;
getConf().set(BATCH_ID, batchId);

【Apache Nutch系列】Nutch2.0配置安装异常集锦的更多相关文章

  1. Apache Maven 3.5.0配置安装

    1.maven 3.5 下载地址:http://maven.apache.org/download.cgi 2.下载了解压到 3.配置环境变量 4.测试看是否安装成功 5.maven配置(全局配置,用 ...

  2. Apache Maven 3.6.1配置安装

    Apache Maven 3.6.1配置安装 一.下载 maven下载地址:http://maven.apache.org/download.cgi 二.安装 1,解压即可用 2,环境变量配置 MAV ...

  3. Apache Maven 3.6.3配置安装

    1.maven 3.5 下载地址:http://maven.apache.org/download.cgi 2.下载了解压到 3.配置环境变量 4.测试看是否安装成功 5.maven配置(全局配置,用 ...

  4. windows Server2012 之 IIS8.0配置安装完整教程

    IIS8.0是windows Server2012自带的服务器管理系统,和以往不同,IIS8.0安装和操作都比较简单,界面很简洁,安装也很迅速.今天我们重点完整的演示下Internet Informa ...

  5. windows Server2012 IIS8.0配置安装完整教程

    IIS8.0是windows Server2012自带的服务器管理系统,和以往不同,IIS8.0安装和操作都比较简单,界面很简洁,安装也很迅速.今天我们重点完整的演示下Internet Informa ...

  6. 【Apache KafKa系列之一】KafKa安装部署

    kafka是一种高吞吐量的分布式发布订阅消息系统,她有如下特性: 通过O(1)的磁盘数据结构提供消息的持久化,这种结构对于即使数以TB的消息存储也能够保持长时间的稳定性能. 高吞吐量:即使是非常普通的 ...

  7. RedHat下apache\ftp\mysql 4.0 的安装方法

    RedHat下安装这三个服务的方法大同小异 Apache服务: 找到Apache安装包: rpm -ivh httpd-2.0.40-21.i386.rpm 等待安装完成即可 检查安装结果: rpm ...

  8. 【转】Oozie4.2.0配置安装实战

    什么是Oozie? Oozie是一种Java Web应用程序,它运行在Java servlet容器——即Tomcat——中,并使用数据库来存储以下内容: 工作流定义 当前运行的工作流实例,包括实例的状 ...

  9. Oozie4.2.0配置安装实战

    软件版本号: Oozie4.2.0.Hadoop2.6.0,Spark1.4.1.Hive0.14.Pig0.15.0.Maven3.2.JDK1.7,zookeeper3.4.6.HBase1.1. ...

随机推荐

  1. html中table,tr,td

    table表格,tr表格中的行,tr表格中的列,等级关系是table>tr>td, 当然表格中还包括thead,tbody,tfoot,th,但由于浏览器支持缘故很少使用.另外table在 ...

  2. 40.VUE学习之--组件之间的数据传参父组件向子组件里传参,props的使用实例操作

    父组件向子组件里传参,props的使用实例 <!DOCTYPE html> <html> <head> <meta charset="utf-8&q ...

  3. Linux下 VI 编辑器操作

    VI编辑器的三种模式:命令模式.输入模式.末行模式. 1.命令模式:vi启动后默认进入的是命令模式,从这个模式使用命令可以切换到另外两种模式,同时无论在何种模式下,[Esc]键都可以回到命令模式.在命 ...

  4. SQL-批量插入和批量更新

    批量插入 表结构一样或类似 如果两张表的结构一样,例如一个表的结构和另一个表的结构一样,只是其中一张是临时表,而另一张表是存储数据的表,我们需要进行一次表的迁移的话,我们可以这样. insert in ...

  5. Apache2服务配置ubuntu16.04+django1.11

    话不多说直接上步骤 环境 Ubuntu 16.04 Python 3.5.2 Django 1.11 Apache 2.4 1.Apache2安装 sudo apt-get install apach ...

  6. 处理nginx访问日志,筛选时间大于1秒的请求

    #!/usr/bin/env python ''' 处理访问日志,筛选时间大于1秒的请求 ''' with open('test.log','a+',encoding='utf-8') as f_a: ...

  7. caioj:1682: 【贪心】买一送一

    题目描述 [题意]    CH最近在网上发现干草买一送一的一笔交易,他每买一捆尺寸为A的干草,就可以免费获赠一捆尺寸为 B (1 ≤ B < A)的干草,注意B < A.    然而,这个 ...

  8. Servlet HttpRequest 中【getAttribute】和【getParameter】的区别

    1.获取的值不同 getAttribute表示从request范围取得设置的属性,必须要通过setAttribute设置属性,才能通过getAttribute取得.设置和取得的值都是Object类型. ...

  9. TouTiao开源项目 分析笔记9 实现一个问答主页面

    1.根据API返回创建几个基础的Bean 1.1.WendaArticleDataBean类 API返回的数据如下: /** * cell_type : 36 * extra : {"wen ...

  10. Android学习记录(4)—在java中学习多线程下载的基本原理和基本用法①

    多线程下载在我们生活中非常常见,比如迅雷就是我们常用的多线程的下载工具,当然还有断点续传,断点续传我们在下一节来讲,android手机端下载文件时也可以用多线程下载,我们这里是在java中写一个测试, ...