[Warning] Aborted connection 11203 to db: 'ide' user: 'nuc' host: 'prd01.mb.com' (Got an error writi
PS:一台物理机扯分了3个虚拟机,一个主db,一个主备,一个从备。
切换到0301的时候
Sep 6 09:16:16 prddb0301 mysqld: 130906 9:16:16 [Warning] Aborted connection 11203 to db: 'ide' user: 'nuc' host: 'prd01.mb.com' (Got an error writing communication packets)
Sep 6 09:16:16 prddb0301 mysqld: 130906 9:16:16 [Warning] Aborted connection 12498 to db: 'ide' user: 'nuc' host: 'prd02.mb.com' (Got an error writing communication packets)
Sep 6 09:16:16 prddb0301 mysqld: 130906 9:16:16 [Warning] Aborted connection 13503 to db: 'ide' user: 'nuc' host: 'prd03.mb.com' (Got an error writing communication packets)
Sep 6 09:16:17 prddb0301 mysqld: 130906 9:16:17 [Warning] Aborted connection 6681 to db: 'ide' user: 'nuc' host: 'prd11.mb.com' (Got an error writing communication packets)
Sep 6 09:16:18 prddb0301 mysqld: 130906 9:16:18 [Warning] Aborted connection 15070 to db: 'ide' user: 'nuc' host: 'prd12.mb.com' (Got an error writing communication packets)
刚才切换到0301的时候,connection 1800的时候,后台error报错。
大概原因:shard卡住会让app server都卡住,最差情况下,所以即使一个shard可能影响也挺大,比较担心。
1 check 用show engine innodb status\G;进行分析
看到:所有的thread都在d estimating records in index range;有很多类似的很多的SQL等待:
select ENTITLEMENT_ID, USER_ID, PRODUCT_ID, GRANT_DATE, EXPIRATION_DATE, DATE_CREATED, STATUS, CREATED_BY, MODIFIED_BY, DATE_MODIFIED, STATUS_REASON_CODE, ENTITLEMENT_TAG, VERSION, PRODUCT_CA
TALOG, USE_COUNT, GROUP_ID, ENTITLEMENT_SOURCE, ENTITLEMENT_TYPE, PROJECT_ID, DEVICE_ID, MANAGED_LIFECYCLE, CONSUMABLE, ORIGIN_PERMISSIONS, EXTERNAL_TYPE, EXTERNAL_ID from ide.entil
T as ent where 1 = 1 and STATUS = 1 and USER_ID = 2331523206069 and DATE_MODIFIED >= '2013-09-06 01:39:00' order by ENTITLEMENT_ID DESC limit 0, 5000
我觉得还是内存问题,把hugepage 拿掉,现在的行为就是机器不繁忙但所有的资源都消耗在iowait上,导致卡死,刚才我们改动过的就是内存。
http://www.51testing.com/?uid-225738-action-viewspace-itemid-235472
2 check memory
[ed@prdkvm35 ~]$ free -g
total used free shared buffers cached
Mem: 125 124 1 0 1 0
-/+ buffers/cache: 122 3
看到物理机器上面一共3G内存,把物理机器上面的vm内存再弄小点,给物理机器的内存设置大一些。修改完后,重启动vm,然后启动Mysql,并且failover writer 从0302到0301上面。
3 failover之后,connection过多
failover过后,看到0301的db上面的conntion client猛增到2000多个,超过正常范围值500多4倍了,而且client还在不停的增长,马上failover writer 0301db到0302db,connection恢复正常了。
4 check,检查0301的my.cnf配置文件
将login audit去掉,然后改小innodb_buffer_pool_size,从60G改称48G。再进行failover writer 从0302到0301上面。
5 check again
failover还是暴增到2000多个,超过正常范围值500多4倍了,而且client还在不停的增长,马上failover writer 0301db到0302db,connection恢复正常了。
6 内存再设置小一些,
把主备vm和从备vm的内存再设置小一些,每个去掉4G,修改完后,再次重启vm以及vm上面的MySQL服务。再failover writer 从0302到0301上面。check之后发现conntions还是t猛增到2000多个,超过正常范围值500多4倍了。而且IO wait好高啊!然后failover回去到db0302 ,failover writer 从0302到0301上面。
7 每次IO好高,那么去看看my.cnf里面的innodb_flush_log_at_trx_commit之类innodb writer的参数是否合理。
发现my.cnf里面innodb_flush_log_at_trx_commit = 1; God,貌似找到问题所在了,马上修改my.cnf 将innodb_flush_log_at_trx_commit = 0;
ok,然后再failover writer 从0302到0301上面,check conntion clients,猛增到800多后稳定下来1分钟后,connection clients稳定在500左右。OK,成了!
PS:这个prod上面的my.cnf是原来的DBA设置的,在每次failover的时候,都需要保持 Seconds_Behind_Master=0;
追得太慢NOC又call我了,看到sync_binlog=1 了,我改成0了,这样很快Seconds_Behind_Master=0了。
[Warning] Aborted connection 11203 to db: 'ide' user: 'nuc' host: 'prd01.mb.com' (Got an error writi的更多相关文章
- Aborted connection 1055898 to db: 'xxx' user: 'yyy' host: 'xxx.xxx.xxx.xxx' (Got timeout reading communication packets)
mysql错误日志中,发现大量以下类似信息:(mysql 5.7.18) [Note] Aborted connection 1055898 to db: 'xxx' user: 'yyy' host ...
- 关于Aborted connection告警日志的分析
前言: 有时候,连接MySQL的会话经常会异常退出,错误日志里会看到"Got an error reading communication packets"类型的告警.本篇文章我们 ...
- Mac 下locate命令使用问题WARNING: The locate database (/var/db/locate.database) does not exist.
想在Mac下使用locate时,提醒数据库没创建: WARNING: The locate database (/var/db/locate.database) does not exist. To ...
- WARNING: inbound connection timed out (ORA-3136)
WARNING: inbound connection timed out (ORA-3136) WARNING: inbound connection timed out (ORA-3136) Ta ...
- input01.sh: line 11: warning: here-document at line 4 delimited by end-of-file (wanted `EOF') input01.sh: line 12: syntax error: unexpected end of file
写了个脚本用cat>>EOF报错如下: input01.sh: line 11: warning: here-document at line 4 delimited by end-of- ...
- An existing connection was forcibly closed by the remote host
StackOverflow https://stackoverflow.com/questions/5420656/unable-to-read-data-from-the-transport-con ...
- Database mirroring connection error 4 'An error occurred while receiving data: '10054(An existing connection was forcibly closed by the remote host.)
公司一SQL Server镜像发生了故障转移(主备切换),检查SQL Server镜像发生主备切换的原因,在错误日志中发现下面错误: Date 2019/8/31 14:09:17 ...
- Invalid connection string format, a valid format is: "host:port:sid"
报错信息: Caused by: java.sql.SQLException: Io 异常: Invalid connection string format, a valid format is: ...
- Start Failed, Internal error: recovering IDE to the working state after the critical startup error
Start Failed, Internal error: recovering IDE to the working state after the critical startup error F ...
随机推荐
- linux grep详解
Table of Contents 1. grep简介 2. grep正则表达式元字符集(基本集) 3. 用于egrep和 grep -E的元字符扩展集 4. POSIX字符类 5. Grep命令选项 ...
- 已经上架的app(可供销售)在AppStore上搜不到的解决办法
这两天很是头大, 因为3天前手动发布的app到现在都还没在AppStore上看到,打了无数电话给苹果和发邮件给review团队. 下面说说怎么解决我们在iTunes后台看到是绿灯(可供销售)但是就是在 ...
- 【转载】Java重构示例【1】
序言 本文通过Java示例代码片段展示了常用重构原则和技巧,供初级开发人员参考.精致的代码能够清楚传达作者的意图,精致的代码是最好的注释,精致的代码非常容易维护和扩展.程序员阅读精致的代码如同大众欣赏 ...
- 我的cocos2d-x集成sharesdk之旅(转)
链接地址:http://blog.csdn.net/yeungxuguang/article/details/18227153 本文出自:http://www.iteye.com/topic/1130 ...
- pkg_utility
创建包名: CREATE OR REPLACE PACKAGE BODY PKG_UTILITY AS --字符串转换到索引表 PROCEDURE STR_TO_LIST(PI_STR IN VARC ...
- HDU 3909 DLX
http://blog.csdn.net/sr_19930829/article/details/39756513 http://www.kuangbin.net/archives/hdu4069-d ...
- 关于socket的关闭:close和shutdown
通过两种方式可以关闭一个socket:close和shutdown.直接调用close关闭socket存在以下两个问题: 1. close只是将socket 描述字的访问计数减1,仅当描述字的访问计数 ...
- 使用jstl 截取字符串
时常碰见这样的 问题:获取数据库中的文本域的时候经常是在p标签中的,在页面显示的时候也是带着p标签,如何去除p标签呢 这里提供一个使用jstl的方式 1.首先导入jstl的函数标签库 <%@ t ...
- Android中使用"running services"查看service进程内存
从Android 2.0开始,在Settings中加入了一个新的activity("Running Services" activity),它用于显示当前运行的每个Services ...
- HDU 5009 DP
2014 ACM/ICPC Asia Regional Xi'an Online 对于N个数 n(1 ≤ n ≤ 5×104), 把N个数分成随意个区间,每一个区间的值是该区间内不同数字个数的平方和, ...