测试flume,将数据送到hive表中,首先建表。

create table order_flume(
order_id string,
user_id string,
eval_set string,
order_number string,
order_dow string,
order_hour_of_day string,
days_since_prior_order string)
clustered by (order_id) into 5 buckets
stored as orc; # 记得加 orc,不然后面会出错

  flume conf 配置如下:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /root/code/flume_exec_test.txt # Describe the sink
a1.sinks.k1.type = hive
a1.sinks.k1.hive.metastore = thrift://master:9083
a1.sinks.k1.hive.database = hw
a1.sinks.k1.hive.table = order_flume
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = ","
a1.sinks.k1.serializer.serdeSeparator = ','
a1.sinks.k1.serializer.fieldnames = order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000 # 设置大一点,默认是1000
a1.channels.c1.transactionCapacity = 1000 # 设置大一点,默认是100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

  这个时候如果启动flume的话会报错,需要将hive中的jar包移动到flume 中。

/usr/local/src/apache-hive-1.2.2-bin/hcatalog/share/hcatalog/* $FLUME_HOME/lib

/usr/local/src/apache-hive-1.2.2-bin/lib/* $FLUME_HOME/lib

  此时,在修改修改 hive-site.xml,将下面的值进行修改。

      <property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>

  上面的配置完成之后,先启动 hive metastore,在启动 flume。

# 一定要重启
# 先重启 mysql
# 再重启 hadoop
# 再启动 hive metstore
# 再启动 flume service mysql restart start-dfs.sh
start-yarn.sh hive --service metastore flume-ng agent -c conf -f conf/hive.conf -n a1 -Dflume.root.logger=INFO,console

  一开始调试的时候,配置的都是对的,无论怎么跑都是连不上,都提示失败。

org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://master:9083', database='hw', table='order_flume', partitionVals=[] }
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:343)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$TxnBatchException: Failed acquiring Transaction Batch from EndPoint: {metaStoreUri='thrift://master:9083', database='hw', table='oreder_flume', partitionVals=[] }
at org.apache.flume.sink.hive.HiveWriter.nextTxnBatch(HiveWriter.java:400)
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:90)
... 6 more
Caused by: org.apache.hive.hcatalog.streaming.TransactionBatchUnAvailable: Unable to acquire transaction batch on end point: {metaStoreUri='thrift://master:9083', database='hw', table='order_flume', partitionVals=[] }
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:514)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:464)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:351)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:331)
at org.apache.flume.sink.hive.HiveWriter$9.call(HiveWriter.java:395)
at org.apache.flume.sink.hive.HiveWriter$9.call(HiveWriter.java:392)
at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:428)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_open_txns(ThriftHiveMetastore.java:4195)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.open_txns(ThriftHiveMetastore.java:4182)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxns(HiveMetaStoreClient.java:1988)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.openTxnImpl(HiveEndPoint.java:523)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:507)
... 10 more

  重启了几次还是同样的错误,后面尝试先启动 hive,去看一下其中表的情况,发现启动hive失败。

FAILED: LockException [Error 10280]: Error communicating with the metastore

  再次尝试重启 metastore,在尝试启动 hive,一开始尝试2次,发现还是同样的错误,结果过了几分钟,再次重新启动后,发现它自己好了,一直没有弄明白为什么,可能是之前没重启成功,重新编辑一下 hive-site.xml文件,在次保存退出,在重启可能会好。

最终flume 链接 hive 成功,信息如下。

2019-07-20 12:26:28,968 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:342)] k1: Creating Writer to Hive end point : {metaStoreUri='thrift://master:9083', database='hw', table='order_flume', partitionVals=[] }
2019-07-20 12:26:30,747 (hive-k1-call-runner-0) [INFO - org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:377)] Trying to connect to metastore with URI thrift://master:9083
2019-07-20 12:26:30,865 (hive-k1-call-runner-0) [INFO - org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:473)] Connected to metastore.
2019-07-20 12:26:31,494 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:377)] Trying to connect to metastore with URI thrift://master:9083
2019-07-20 12:26:31,516 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:473)] Connected to metastore.
2019-07-20 12:26:35,677 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hive.HiveWriter.nextTxnBatch(HiveWriter.java:335)] Acquired Txn Batch TxnIds=[1...100] on endPoint = {metaStoreUri='thrift://master:9083', database='hw', table='order_flume', partitionVals=[] }. Switching to first txn
2019-07-20 12:26:38,808 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hive.HiveWriter.commitTxn(HiveWriter.java:275)] Committing Txn 1 on EndPoint: {metaStoreUri='thrift://master:9083', database='hw', table='order_flume', partitionVals=[] }

  

flume 测试 hive sink的更多相关文章

  1. Flume + HDFS + Hive日志收集系统

    最近一段时间,负责公司的产品日志埋点与收集工作,搭建了基于Flume+HDFS+Hive日志搜集系统. 一.日志搜集系统架构: 简单画了一下日志搜集系统的架构图,可以看出,flume承担了agent与 ...

  2. Flume 测试 Kafka 案例

    Flume Kafka 测试案例,Flume 的配置. a1.sources = s1 a1.channels = c1 a1.sinks = k1 a1.sources.s1.type = netc ...

  3. 简单的Flume和hive的结合

    1. 日志格式 #Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: -- :: #Fields: d ...

  4. Flume配置Failover Sink Processor

    1 官网内容 2 看一张图一目了然 3 详细配置 source配置文件 #配置文件: a1.sources= r1 a1.sinks= k1 k2 a1.channels= c1 #负载平衡 a1.s ...

  5. 自定义flume的hbase sink 的序列化程序

    package com.hello.hbase; import java.nio.charset.Charset; import java.text.SimpleDateFormat; import ...

  6. 从0到1搭建基于Kafka、Flume和Hive的海量数据分析系统(一)数据收集应用

    大数据时代,一大技术特征是对海量数据采集.存储和分析的多组件解决方案.而其中对来自于传感器.APP的SDK和各类互联网应用的原生日志数据的采集存储则是基本中的基本.本系列文章将从0到1,概述一下搭建基 ...

  7. 流量分析系统---flume(测试flume+kafka)

    1.在flume官方网站下载最新的flume     wget http://124.205.69.169/files/A1540000011ED5DB/mirror.bit.edu.cn/apach ...

  8. Flume的Avro Sink和Avro Source研究之二 : Avro Sink

    啊,AvroSink要复杂好多:< 好吧,先确定主要问题: AvroSink为啥这么多代码?有必要吗?它都有哪些逻辑需要实现? 你看,avro-rpc-quickstart里是这么建client ...

  9. Hadoop实战-Flume之自定义Sink(十九)

    import java.io.File; import java.io.FileNotFoundException; import java.io.FileOutputStream; import j ...

随机推荐

  1. 11g包dbms_parallel_execute在海量数据处理过程中的应用

    11g包dbms_parallel_execute在海量数据处理过程中的应用 一.1  BLOG文档结构图 一.2  前言部分 一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也 ...

  2. PHP、JS 中 encode/decode

    PHP : urlencode() urldecode() JS : encodeURIComponent() decodeURIComponent() 同一字符串,编码后的结果一样 1

  3. thinkphp5.x命令执行漏洞复现及环境搭建

    楼主Linux环境是Centos7,LAMP怎么搭不用我废话吧,别看错了 一.thinkphp5.X系列 1.安装composer yum -y install composer 安装php拓展 yu ...

  4. Centos7服务器搭建部署显卡计算环境以及常用软件的安装使用

    安装好anaconda的服务器上会more你已经安装好jupyter notebook,执行下面的命令可以提供链接地址允许远程浏览器打开并访问: jupyter notebook --no-brows ...

  5. CentOS7:sorry,that didn't work.please try again!

    参考以下解决方案,重点是vi etc/selinux/config 把 enforcing 改为 disable 应用场景 linux管理员忘记root密码,需要进行找回操作.注意事项:本文基于cen ...

  6. 后台返回的Json为null的字段不显示的方法

    如果引入的是谷歌的gson的话,需要引入依赖: <dependency> <groupId>com.fasterxml.jackson.core</groupId> ...

  7. socket mac终端调试工具 nc netcat

    今天想学点socket ,因此搜索socket 工具,找到了netCat工具.可以打开两个终端window ,实现终端之间的socket的收发信息,为以后学习socket调试做准备用吧.两个终端分别打 ...

  8. Thinkphp下记录和统计时间(微秒)和内存使用情况

    * 记录和统计时间(微秒)和内存使用情况 * 使用方法: * <code> * G('begin'); // 记录开始标记位 * // ... 区间运行代码 * G('end'); // ...

  9. 给jenkins更换工作空间

    如果使用jenkins的默认工作空间,它默认安放在 /var/lib/jenkins 目录下,但这个在分配Linux磁盘的时候,一般为40G,时间长或者项目多的话,很容易将磁盘空间占满,所以我们需要将 ...

  10. 小米BL不解锁刷机

    关于小米NOTE顶配近期解锁的问题中发现还有很多人不会用9008模式刷机,现出个简单教程方便米粉们救砖.硬件:小米NOTE顶配手机 win10系统的电脑 手机与电脑相连的数据线软件:老版本的mifla ...