【Apache Nutch系列】Nutch2.0配置安装异常集锦
1、java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
 - at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:108)
 - at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
 - at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
 - at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
 - at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
 - at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
 - at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
 - at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
 - at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
 - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 - at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
 - Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
 - at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 - at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 - at java.security.AccessController.doPrivileged(Native Method)
 - at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 - at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 - at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 - ... 11 more
 
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:108)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 11 more
官方文档说明如下:
- N.B. It's possible to encounter the following exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; this is caused by the fact that sometimes the hbase TEST jar is deployed in the lib dir. To resolve this just copy the lib over from your installed HBase dir into the build lib dir. (This issue is currently in progress).
 
N.B. It's possible to encounter the following exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration; this is caused by the fact that sometimes the hbase TEST jar is deployed in the lib dir. To resolve this just copy the lib over from your installed HBase dir into the build lib dir. (This issue is currently in progress).
解决方法:
我们把$HBASE_HOME/lib下的所有包,拷贝到$NUTCH_HOME/runtime/local/lib目录下。运行即可
2、java.lang.NoSuchMethodError:org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V
HBASE官方JIRA BUG编号:HBASE-8273
这个是HBASE-5357引入的问题,原因是HBASE-5357将HColumnDescriptor.setMaxVersions
 返回值修改成返回HColumnDescriptor,而不是返回void,所以改变了HColumnDescriptor setMaxVersions
 方法的签名。所以它只会得到与Integer.intValue编译仍然不会找到setMaxVersions(INT)
Cloudera 官网说明
- Column family manipulations are binary-incompatible between CDH4.2 and CDH4.0/CDH4.1
 - Because of HBASE-5357, code compiled against CDH4.0 and CDH4.1 will fail with java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V, if used with the CDH4.2 libraries. The reason is that the setter methods in HColumnDescriptor were modified to return HColumnDescriptor instead of void, which changes their signature. Code that only does data manipulations, using the HTable class, will still work without recompilation.
 - Bug: HBASE-8273
 - Severity: Medium
 - Anticipated Resolution: None planned; use workaround.
 - Workaround: Code compiled against CDH4.0 and 4.1 that uses HColumnDescriptor must be recompiled against CDH4.2 in order to work with the CDH4.2 libraries. Code compiled against CDH4.0 and CDH4.1 running with those libraries does not have this problem.
 
Column family manipulations are binary-incompatible between CDH4.2 and CDH4.0/CDH4.1
Because of HBASE-5357, code compiled against CDH4.0 and CDH4.1 will fail with java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V, if used with the CDH4.2 libraries. The reason is that the setter methods in HColumnDescriptor were modified to return HColumnDescriptor instead of void, which changes their signature. Code that only does data manipulations, using the HTable class, will still work without recompilation. Bug: HBASE-8273 Severity: Medium Anticipated Resolution: None planned; use workaround. Workaround: Code compiled against CDH4.0 and 4.1 that uses HColumnDescriptor must be recompiled against CDH4.2 in order to work with the CDH4.2 libraries. Code compiled against CDH4.0 and CDH4.1 running with those libraries does not have this problem.
原因:这边我使用的hadoop和hbase启动是没有问题的,也就是说是gora-hbase插件的问题
解决方法:
将gora-hbase插件中涉及使用到HColumnDescriptor的代码重新编译可解决。
具体要编译那些类后续会列出
3、java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore
- hadoop@nutch1:/data/projects/apache-nutch-2.2.1/runtime/local$ bin/nutch crawl urls/seed.txt -dir crawl -depth 3 -topN 5
 - Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore
 - at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 - at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 - at java.security.AccessController.doPrivileged(Native Method)
 - at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 - at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 - at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 - at java.lang.Class.forName0(Native Method)
 - at java.lang.Class.forName(Class.java:188)
 - at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:89)
 - at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:73)
 - at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
 - at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
 - at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
 - at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
 - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 - at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
 
hadoop@nutch1:/data/projects/apache-nutch-2.2.1/runtime/local$ bin/nutch crawl urls/seed.txt -dir crawl -depth 3 -topN 5
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.hbase.store.HBaseStore
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:188)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:89)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:73)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
解决方法:
方法1:下载gora-0.3,然后对该目录下的gora-hbase进行编译生成gora-hbase.jar,然后将jar包放到$NUTCH/runtime/local/lib目录下
方法2:修改$NUTCH_HOME/ivy/ivy.xml
将<dependency org="org.apache.gora" name="gora-hbase"
 rev="0.3" conf="*->default" />去掉注释。然后再重新编译一次。这样ivy会为你生成gora-hbase的插件
4、java.lang.NullPointerException
- java.lang.NullPointerException
 - at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
 - at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
 - at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
 - at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
 - at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
 - at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 
java.lang.NullPointerException
at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
查看GeneratorReducer第100行代码如下:
batchId = newUtf8(conf.get(GeneratorJob.BATCH_ID));
可以看到是获取GeneratorJob.BATCH_ID。也就是generate.batch.id这个值的时候报空了!
解决方法:
修改GeneratorJob中的public Map<String,Object> run(Map<String,Object> args) 方法。
添加以下三行
- // generate batchId
 - int randomSeed = Math.abs(new Random().nextInt());
 - String batchId = (curTime / 1000) + "-" + randomSeed;
 - getConf().set(BATCH_ID, batchId);
 
// generate batchId
int randomSeed = Math.abs(new Random().nextInt());
String batchId = (curTime / 1000) + "-" + randomSeed;
getConf().set(BATCH_ID, batchId);
【Apache Nutch系列】Nutch2.0配置安装异常集锦的更多相关文章
- Apache Maven 3.5.0配置安装
		
1.maven 3.5 下载地址:http://maven.apache.org/download.cgi 2.下载了解压到 3.配置环境变量 4.测试看是否安装成功 5.maven配置(全局配置,用 ...
 - Apache Maven 3.6.1配置安装
		
Apache Maven 3.6.1配置安装 一.下载 maven下载地址:http://maven.apache.org/download.cgi 二.安装 1,解压即可用 2,环境变量配置 MAV ...
 - Apache Maven 3.6.3配置安装
		
1.maven 3.5 下载地址:http://maven.apache.org/download.cgi 2.下载了解压到 3.配置环境变量 4.测试看是否安装成功 5.maven配置(全局配置,用 ...
 - windows Server2012 之  IIS8.0配置安装完整教程
		
IIS8.0是windows Server2012自带的服务器管理系统,和以往不同,IIS8.0安装和操作都比较简单,界面很简洁,安装也很迅速.今天我们重点完整的演示下Internet Informa ...
 - windows Server2012 IIS8.0配置安装完整教程
		
IIS8.0是windows Server2012自带的服务器管理系统,和以往不同,IIS8.0安装和操作都比较简单,界面很简洁,安装也很迅速.今天我们重点完整的演示下Internet Informa ...
 - 【Apache KafKa系列之一】KafKa安装部署
		
kafka是一种高吞吐量的分布式发布订阅消息系统,她有如下特性: 通过O(1)的磁盘数据结构提供消息的持久化,这种结构对于即使数以TB的消息存储也能够保持长时间的稳定性能. 高吞吐量:即使是非常普通的 ...
 - RedHat下apache\ftp\mysql 4.0 的安装方法
		
RedHat下安装这三个服务的方法大同小异 Apache服务: 找到Apache安装包: rpm -ivh httpd-2.0.40-21.i386.rpm 等待安装完成即可 检查安装结果: rpm ...
 - 【转】Oozie4.2.0配置安装实战
		
什么是Oozie? Oozie是一种Java Web应用程序,它运行在Java servlet容器——即Tomcat——中,并使用数据库来存储以下内容: 工作流定义 当前运行的工作流实例,包括实例的状 ...
 - Oozie4.2.0配置安装实战
		
软件版本号: Oozie4.2.0.Hadoop2.6.0,Spark1.4.1.Hive0.14.Pig0.15.0.Maven3.2.JDK1.7,zookeeper3.4.6.HBase1.1. ...
 
随机推荐
- PHP CI框架学习
			
CI框架的URL辅助函数使用 URL 辅助函数文件包含一些在处理 URL 中很有用的函数 加载辅助函数 在使用CI框架的使用经常碰到跳转和路径方面的问题,site_url()和base_url()很容 ...
 - python入门基本知识
			
1. 什么是语言 语言是一个事物与另外一个事物沟通的介质. python则是人(程序员)与计算机沟通的介质. 2. 什么是编程 编程就是程序员将自己想要让计算机做的事情用编程语言翻译出来写到一系列的文 ...
 - C语言基础篇(二)运算符
			
导航: 2.1 算数运算符 2.2 逻辑运算符 2.3 位运算 2.4 赋值运算 2.5 内存访问符号 ----->x<------------->x<------------ ...
 - [Codeforces958F2]Lightsabers (medium)(思维)
			
Description 题目链接 Solution 设一个l指针指向当前数列左边,从左往右扫描一遍,将当前颜色记录, 当所有颜色都得到后,进行判断,如果当前l指向的颜色大于需要的颜色,l后移一位,然后 ...
 - sql查询题目
			
--1.查询在1981年入职的员工信息select * from emp where hiredate between '01-1月-1981'and '31-12月-1981'; select * ...
 - 原理剖析-Netty之服务端启动工作原理分析(下)
			
一.大致介绍 1.由于篇幅过长难以发布,所以本章节接着上一节来的,上一章节为[原理剖析(第 010 篇)Netty之服务端启动工作原理分析(上)]: 2.那么本章节就继续分析Netty的服务端启动,分 ...
 - 清空Fragment回退栈中某个Fragment之上的所有Fragment
			
根据debug信息查看Fragment回退栈的情况,具体debug代码如下: int num = getActivity().getSupportFragmentManager().getBackSt ...
 - PHP.27-TP框架商城应用实例-后台4-使用Gii生成品牌表的代码
			
Gii安装[GII适用于商城项目] 将Gii文件夹复到application 是,访问http://xx.com/index.php/gii Gii规则[Gii使用规则与建表规则密切相关] 1.建表字 ...
 - P1182 数列分段Section II
			
P1182 数列分段Section II 题目描述 对于给定的一个长度为N的正整数数列A[i],现要将其分成M(M≤N)段,并要求每段连续,且每段和的最大值最小. 关于最大值最小: 例如一数列4 2 ...
 - laravel5.5路由
			
目录 1. routes/web.php 2. routes/api.php 3. 重定向路由 4. 路由参数 5. 约束 6. 命名路由 7. 路由组 8 路由模型绑定 9 表单方法伪造 10 访问 ...