【原创】大叔经验分享（70）marathon重启app后一直处于waiting状态

marathon重启app后一直处于waiting状态,查看marathon日志 # journalctl -u marathon -f 有如下日志: Jun 14 12:58:38 DataOne-002 marathon[15801]: [2019-06-14 12:58:38,321] INFO Offer [2a0fb98b-f8df-44e8-965c-54ad7203fa45-O1623]. Constraints for run spec [/app] not satisfied.…

【原创】大叔经验分享（81）marathon上app无法重启

通过api调用marathon重启app后出现deployment,但是app不会重启,配置如下: "constraints": [ [ "hostname", "UNIQUE" ], [ "hostname", "LIKE", "HOST-00[12]" ] ] 指定app只能在2个服务器上启动,并且每个服务器只能启动1个instance, 解决方法如下: "upgradeS…

【原创】经验分享：一个小小emoji尽然牵扯出来这么多东西？

前言之前也分享过很多工作中踩坑的经验: 一个线上问题的思考:Eureka注册中心集群如何实现客户端请求负载及故障转移? [原创]经验分享:一个Content-Length引发的血案(almost....) 今天再来分享工作中一个真实的案例: 商品评价列表页,显示每条用户的评价详情,为了保护用户隐私,要求显示用户昵称时只能显示第一位和最后一位,其他的用※代替. 例如输入:,输出:*** 看似一个平淡无奇的需求,我也没有太在意.服务端将用户的评论信息存储到db中,评价列表接口就是将数据库中该商品的…

【原创】大叔经验分享（88）jenkins假死

jenkins安装启动后,使用systemctl来进行进程监控 # systemctl enable jenkins 但是还是经常发生jenkins进程挂了,不会自动重启,通过systemctl查看状态为: # systemctl status jenkins ● jenkins.service - LSB: Jenkins Automation Server Loaded: loaded (/etc/rc.d/init.d/jenkins; bad; vendor preset: disabl…

【原创】大叔经验分享（87）marathon重启应用过程服务不可用

marathon提供多种健康检查方式常用的有TCP和HTTP, TCP检查端口是否存在,存在则认为实例健康: HTTP检查指定URL的HTTP返回码,返回码正常(2xx.3xx)则认为实例健康: 这两种方式在重启过程中会有差别: 1)TCP:由于端口存在到服务可用之间有一段时间,这段时间新的实例被认为‘健康’,但是无法响应服务,同时旧的实例被停止,导致有一段时间服务不可用: 2)HTTP:返回码正常的前提是服务可用: 如果希望重启过程服务一直可用,需要使用HTTP方式做健康检查:…

【原创】大叔经验分享（50）hue访问mysql（librdbms）

cloudera manager安装hue后想开启访问mysql(librdbms)需要在这里配置(hue_safety_valve.ini) 添加配置如下 [librdbms] # The RDBMS app can have any number of databases configured in the databases # section. A database is known by its section name # (IE sqlite, mysql, psql, and o…

【原创】大叔经验分享（46）用户提交任务到yarn报错

用户提交任务到yarn时有可能遇到下面的错误: 1) Requested user anything is not whitelisted and has id 980,which is below the minimum allowed 1000 这是因为yarn中配置min.user.id=1000,yarn认为id小于1000的是超级用户,yarn禁止超级用户提交任务: Each account must have a user ID that is greater than or equ…

【原创】大叔经验分享（21）yarn中查看每个应用实时占用的内存和cpu资源

在yarn中的application详情页面 http://resourcemanager/cluster/app/$applicationId 或者通过application命令 yarn application -status $applicationId 只能看到应用启动以来占用的资源*时间统计,比如: Aggregate Resource Allocation : 3962853 MB-seconds, 1466 vcore-seconds 到处都找不到这个应用当前实时的资源占用情况,比…

【原创】大叔经验分享（18）hive2.0以后通过beeline执行sql没有进度信息

一问题在hive1.2中使用hive或者beeline执行sql都有进度信息,但是升级到hive2.0以后,只有hive执行sql还有进度信息,beeline执行sql完全silence,在等待结果的过程中完全不知道执行到哪了 1 hive执行sql过程(有进度信息) hive> select count(1) from test_table;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the…

【原创】大叔经验分享（16）Context namespace element 'component-scan' and its parser class [org.springframework.context.annotation.ComponentScanBeanDefinitionParser] are only available on JDK 1.5 and higher

今天尝试运行一个古老的工程,配置好之后编译通过,结果运行时报错: org.springframework.beans.factory.BeanDefinitionStoreException: Unexpected exception parsing XML document from class path resource [applicationContext.xml]; nested exception is java.lang.IllegalStateException: Context…

【原创】大叔经验分享（12）如何程序化kill提交到spark thrift上的sql

spark 2.1.1 hive正在执行中的sql可以很容易的中止,因为可以从console输出中拿到当前在yarn上的application id,然后就可以kill任务, WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or usin…

【原创】大叔经验分享（11）python引入模块报错ImportError: No module named pandas numpy

python应用通常需要一些库,比如numpy.pandas等,安装也很简单,直接通过pip # pip install numpyRequirement already satisfied: numpy in /export/App/anaconda2/lib/python2.7/site-packages # pip install pandasRequirement already satisfied: pandas in /export/App/anaconda2/lib/python2…

【原创】大叔经验分享（6）Oozie如何查看提交到Yarn上的任务日志

通过oozie job id可以查看流程详细信息,命令如下: oozie job -info 0012077-180830142722522-oozie-hado-W 流程详细信息如下: Job ID : 0012077-180830142722522-oozie-hado-W --------------------------------------------------------------------------------------------------------------…

【原创】大叔经验分享（41）hdfs开启kerberos之后报错Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled

hdfs开启kerberos之后,namenode报错,连不上journalnode 2019-03-15 18:54:46,504 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/server-03.bj@TEST.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.securi…

【原创】大叔经验分享（38）beeline连接hiveserver2报错impersonate

beeline连接hiveserver2报错 Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.Authorizatio…

【原创】大叔经验分享（37）CM清理磁盘空间

定期清理cloudera manager server的磁盘空间 1 停止Service Monitor和Host Monitor 2 删除日志 # /bin/rm /var/lib/cloudera-host-monitor/ts/*/partition*/* -rf# /bin/rm /var/lib/cloudera-service-monitor/ts/*/partition*/* -rf 3 重启Service Monitor和Host Monitor 确认新的日志和元数据文件是否生成…

【原创】大叔经验分享（36）CM部署kafka

1 下载kafka parcel http://archive.cloudera.com/kafka/parcels/latest/KAFKA-3.1.1-1.3.1.1.p0.2-el7.parcelKAFKA-3.1.1-1.3.1.1.p0.2-el7.parcel.sha1 # mv KAFKA-3.1.1-1.3.1.1.p0.2-el7.parcel.sha1 KAFKA-3.1.1-1.3.1.1.p0.2-el7.parcel.sha# cp KAFKA-3.1.1-1.3.1.…

【原创】大叔经验分享（31）CM金丝雀Canary报错

CM金丝雀Canary报错 1 HDFS 金丝雀Canary 测试无法为 /tmp/.cloudera_health_monitoring_canary_files 创建父目录. 2 Hive Metastore CanaryHive Metastore canary 创建 hue hdfs 主目录失败. 检查: 1)hdfs是否处于safemode,正常是off # hdfs dfsadmin -safemode getSafe mode is OFF 2)hdfs datanode是否健康,…

【原创】大叔经验分享（30）CM开启kerberos

kerberos安装详见:https://www.cnblogs.com/barneywill/p/10394164.html 一为CM创建用户 # kadmin.local -q "addprinc scm/admin" 名字和密码任意,后续配置中会使用二 CM配置过程 1 启用Kerberos 2 全部选中 3 按照/etc/krb5.conf填写 4 5 填写刚才创建的用户名密码 6 下一步 7 下一步 8 可以查看cm创建的用户 # kadmin.local -q 'lis…

【原创】大叔经验分享（53）kudu报错unable to find SASL plugin: PLAIN

kudu安装后运行不正常,master中找不到任何tserver,查看tserver日志发现有很多报错: Failed to heartbeat to master:7051: Invalid argument: Failed to ping master at master:7051: Client connection negotiation failed: client connection to master:7051: unable to find SASL plugin: PLAIN…

【原创】大叔经验分享（52）ClouderaManager修改配置报错

Cloudera Manager中修改配置可能报错: Incorrect string value: '\xE7\xA8\x8B\xE5\xBA\x8F...' for column 'MESSAGE' at row 1 这是一个mysql的字符集问题,极有可能创建scm数据库时使用默认的latin1编码导致,涉及的表为: CREATE TABLE `REVISIONS` ( `REVISION_ID` bigint(20) NOT NULL, `OPTIMISTIC_LOCK_VERSION`…

【原创】大叔经验分享（89）docker启动openjdk执行jmap报错

docker启动openjdk后,可以查看进程 # docker exec -it XXX jps 10 XXX.jar 可见启动的java进程id一直为10,然后可以执行jvm命令,比如 # docker exec -it XXX jstack 10 # docker exec -it XXX jstat -gcutil 10 # docker exec -it XXX jmap -histo 10 但是执行jmap -heap或者-dump时会报错: Attaching to process…

【原创】大叔经验分享（76）confluence和jira配置

一下载 confluence https://product-downloads.atlassian.com/software/confluence/downloads/atlassian-confluence-6.15.6-x64.bin jira https://product-downloads.atlassian.com/software/jira/downloads/atlassian-jira-software-8.2.2-x64.bin 二安装三配置 1 如果要在confl…

【原创】大叔经验分享（67）spring boot启动报错

spring boot 启动报错: Caused by: java.lang.IllegalArgumentException: LoggerFactory is not a Logback LoggerContext but Logback is on the classpath. Either remove Logback or the competing implementation (class org.slf4j.impl.Log4jLoggerFactory loaded from…

【原创】大叔经验分享（63）kudu vs parquet

一对比存储空间对比: 查询性能对比: 二设计方案将数据拆分为:历史数据(hdfs+parquet+snappy)+ 近期数据(kudu),可以兼具各种优点: 1)整体低于10%的磁盘占用: 2)更少的查询耗时: 3)近期数据实时更新: 4)近期数据可修改: 5)kudu集群重启时间降低90%: 6)impala并行scan:scan kudu + scan hdfs: 三改造方案利用视图 create view v_table asselect * from parquet_tabl…

【原创】大叔经验分享（54）flume kudu sink运行一段时间kudu client报错

flume kudu sink运行一段时间报错: 19/05/05 10:15:56 WARN client.ConnectToCluster: Error receiving a response from: master:7051 org.apache.kudu.client.RecoverableException: [Peer master-master:7051] Connection disconnected at org.apache.kudu.client.TabletClien…

【原创】大叔经验分享（51）docker报错Exited (137)

docker container启动失败,报错:Exited (137) *** ago,比如 Exited (137) 16 seconds ago 这时通过docker logs查不到任何日志,从mesos上看stderr相关的只有一句 I0409 16:56:26.408077 8583 executor.cpp:736] Container exited with status 137 通过docker inspect查看container状态为 "State": { &quo…

【原创】大叔经验分享（49）hue访问hdfs报错/hue访问oozie editor页面卡住

hue中使用hue用户(hue admin)访问hdfs报错: Cannot access: /. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup". 其他症状:oozie editor页面卡住不动检查过程如下: 1 hdfs配置 hadoop.proxyuser.hue.hosts=*hadoop.proxyuse…

【原创】大叔经验分享（48）oozie中通过shell执行impala

oozie中通过shell执行impala,脚本如下: $ cat test_impala.sh #!/bin/sh /usr/bin/kinit -kt /tmp/impala.keytab impala/server04 /usr/bin/impala-shell -i server04:21000 -q 'show databases' 直接执行shell脚本正常,在oozie中执行报错: Traceback (most recent call last): File "/usr/lib/…

【原创】大叔经验分享（47）yarn开启日志归集

yarn开启日志归集功能,除了配置之外 yarn.log-aggregation-enable=true 还要检查/tmp/logs目录是否存在以及权限,尤其是在开启kerberos之后,有些目录可能不能自动创建成功,需要手工创建: $ hdfs dfs -mkdir /tmp$ hdfs dfs -chmod 777 /tmp 每个应用的hdfs日志目录: /tmp/logs/$user/logs/$applicationId…

【【原创】大叔经验分享（70）marathon重启app后一直处于waiting状态】的更多相关文章