kudu tserver占用内存过高后会拒绝部分写请求,日志如下: 19/06/01 13:34:12 INFO AsyncKuduClient: Invalidating location 34b1c13d04664cc8bae6689d39b08b77($kudu_tserver:7050) for tablet 858c055c456549569af77d14eaf997e5: Service unavailable: Soft memory limit exceeded (at 92.3…
核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,follower是FOLLOWING,leader是LEADING,observer是OBSERVING: public enum LearnerType { PARTICIPANT, OBSERVER; } 简单来说,zookeeper启动的核心类是QuorumPeerMain,启动之后会加载配置,…
kudu 1.7 官方:https://kudu.apache.org/ 一 简介 kudu有很多概念,有分布式文件系统(HDFS),有一致性算法(Zookeeper),有Table(Hive Table),有Tablet(Hive Table Partition),有列式存储(Parquet),有顺序和随机读取(HBase),所以看起来kudu是一个轻量级的 HDFS + Zookeeper + Hive + Parquet + HBase,除此之外,kudu还有自己的特点,快速写入+读取,使…
kudu加减数据盘不能直接修改配置fs_data_dirs后重启,否则会报错: Check failed: _s.ok() Bad status: Already present: FS layout already exists; not overwriting existing layout: FSManager roots already exist: /data0/kudu/data 官方解释如下: When Kudu starts, it checks each configured…
spark2.4.3+kudu1.9 1 批量读 val df = spark.read.format("kudu") .options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test_db.test_table")) .load df.createOrReplaceTempView("tmp_tabl…
应用一:kafka数据同步到kudu 1 准备kafka topic # bin/kafka-topics.sh --zookeeper $zk:2181/kafka -create --topic test_sync --partitions 2 --replication-factor 2 WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could coll…
对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test_word.log|sort|uniq -c|sort -rn|head -10 2 Spark分布式处理(Scala) val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) sc.textFile("test_wo…
impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon. impala是hadoop上的开源分析性数据库:C++和java语言开发: Do BI-style Queries on Hadoop Im…
一 架构 Impala is a massively-parallel query execution engine, which runs on hundreds of machines in existing Hadoop clusters. It is decoupled from the underlying storage engine, unlike traditional relational database management systems where the query…
tpc 官方:http://www.tpc.org/ 一 简介 The TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. TPC(The Transaction Processing Perform…