下午正在开周会,然后收到短信,说是X.X.X.X的机器ping不通了,一轮测试过后,发现是某台数据库服务器挂了,先不急着重启,问下tencent客服。。。

乖乖的好家伙,母机的主板坏了。。。。一个小时候,母机起来了,看下数据库起来么。。

[root@VM_145_57_tlinux ~]# mysql -uroot -p1234
ERROR (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# mysql -uroot -p
Enter password:
ERROR (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root : ? :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : ? :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: grep mysql
[root@VM_145_57_tlinux ~]# mysql -uroot -p1234 -S /tmp/mysql.sock
ERROR (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[root@VM_145_57_tlinux ~]# ls /tmp/mys^C
[root@VM_145_57_tlinux ~]# /etc/init.d/mysqld restart
Stopping mysqld: [ OK ]
Starting mysqld: [ OK ]
[root@VM_145_57_tlinux ~]# ps -ef | msyql
-bash: msyql: command not found
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root : ? :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : ? :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : pts/ :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: grep mysql

这下亮了,出现了double进程,没发解释啊。。。【下面这步觉得error】

[root@VM_145_57_tlinux ~]# kill -
[root@VM_145_57_tlinux ~]# kill -
[root@VM_145_57_tlinux ~]# ps -ef |grep mysql
root : pts/ :: /bin/sh /usr/bin/mysqld_safe --datadir=/data/mysql/var --socket=/tmp/mysql.sock --pid-file=/data/mysql/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql : pts/ :: /usr/libexec/mysqld --basedir=/usr --datadir=/data/mysql/var --user=mysql --log-error=/data/mysql/mysqld/mysqld.log --pid-file=/data/mysql/mysqld/mysqld.pid --socket=/tmp/mysql.sock --port=
root : pts/ :: grep mysql

之前起来的进程被我kill -9之后,发现能连上数据库了,可是innodb存储引擎没起来。

mysql> show engines;
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| Engine | Support | Comment | Transactions | XA | Savepoints |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
| CSV | YES | CSV storage engine | NO | NO | NO |
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
+------------+---------+-----------------------------------------------------------+--------------+------+------------+
rows in set (0.00 sec)

似乎能看出问题了,进程起来了,socket没有建立,存储引擎没有启动-->innodb正在后台线程操作!!查下errorlog一探究竟:

 :: mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
:: InnoDB: Initializing buffer pool, size = .0G
:: InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn
:: InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number

事实上,那个还没创建socket的进程是正在执行double write的回滚工作,继续往下翻页:

InnoDB: Doing recovery: scanned up to log sequence number
:: mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
:: InnoDB: Initializing buffer pool, size = .0G
:: InnoDB: Error: cannot allocate bytes of
InnoDB: memory with malloc! Total allocated memory
InnoDB: by InnoDB bytes. Operating system errno:
InnoDB: Check if you should increase the swap file or
InnoDB: ulimits of your operating system.
InnoDB: On FreeBSD check you have compiled the OS with
InnoDB: a big enough maximum process size.
InnoDB: Note that in most -bit computers the process
InnoDB: memory space is limited to GB or GB.
InnoDB: We keep retrying the allocation for seconds...
InnoDB: Doing recovery: scanned up to log sequence number

这个是我在innodb引擎正在执行recovery的时候强行启动mysql的报错,提示内存不足。。继续往下翻页:

InnoDB: Doing recovery: scanned up to log sequence number
::03InnoDB: Fatal error: cannot allocate the memory for the buffer pool
:: [ERROR] Plugin 'InnoDB' init function returned error.
:: [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
:: [Note] Event Scheduler: Loaded events
:: [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61' socket: '/tmp/mysql.sock' port: Source distribution
InnoDB: Doing recovery: scanned up to log sequence number 3612187648
[此处省略若干行]
InnoDB: Doing recovery: scanned up to log sequence number
:: [Note] /usr/libexec/mysqld: Normal shutdown :: [Note] Event Scheduler: Purging the queue. events
:: [Note] /usr/libexec/mysqld: Shutdown complete :: mysqld_safe mysqld from pid file /data/mysql/mysqld/mysqld.pid ended
:: mysqld_safe Starting mysqld daemon with databases from /data/mysql/var
:: InnoDB: Initializing buffer pool, size = .0G
:: InnoDB: Completed initialization of buffer pool
InnoDB: Log scan progressed past the checkpoint lsn 1808 1368235506 [kill -9的结果]
:: InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number

看到上面的错误,才恍然大悟,前面的操作是有多危险,要不是mysql的recovery不那么强悍的话,恐怕数据就被我这样弄没了。。。好悬

InnoDB: Doing recovery: scanned up to log sequence number
InnoDB: Doing recovery: scanned up to log sequence number
:: InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents:
InnoDB: Apply batch completed
:: InnoDB: Started; log sequence number
:: [Note] Event Scheduler: Loaded events
:: [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.61' socket: '/tmp/mysql.sock' port: Source distribution

回滚完成,happy启动服务。

下面来讲下innodb_log_file_size在innodb异常时回滚机制:

下面我们参考 http://www.cnblogs.com/zuoxingyu/archive/2012/10/25/2738864.html 大牛的博客,算一下innodb_log_file_size到底多大为最合适:【作为一个粗略的规则,你可以让这个日志足够大到能容纳最多一小时左右的日志】

mysql> pager grep sequence
PAGER set to 'grep sequence'
mysql> show engine innodb status\G select sleep(); show engine innodb status\G
Log sequence number
row in set (0.00 sec) row in set ( min 0.00 sec) Log sequence number
row in set (0.00 sec)

(411580730-446445181)/1024/1024/60=1999M,向上取整2G,由于默认有2个日志文件,所以按照当前高峰期计算,设为1G为最佳值。

cvm母机宕机重启后数据库修复的更多相关文章

  1. 数据库主库从库宕机重启后binlog数据同步

    由于阿里云经典网络迁移到专用网络,一不小心没有先预备方案调整网段, 导致实例无法以内网IP形式访问数据库,被迫进行数据库停机后网络网段调整,导致宕机了几个小时...被客户各种投诉爆了.. 基于这次数据 ...

  2. 一次Oracle宕机切换后产生ORA错误的处理过程

    问题背景 机房意外断电后Oracle主服务器启动失败,Oracle备机接管 为了安全,管理员对于数据库做expdp的逻辑备份.但备份时发现AttributeInstance表备份失败,提示ORA-01 ...

  3. openstack环境-解决windows虚机重启后比当前时间晚8小时问题

    背景: 生产环境下,发现windows虚机每次重启,时间都会倒退到虚机的格林威治时间(+8小时才是北京时间),也就是比当前时间晚8小时.测试发现,windows虚机所用的镜像,缺少了一个os_type ...

  4. keepalived 容器在宿主机重启后无法启动问题:报错:daemon is already running

    初步猜测原因是:keepalived容器内的keepalived.pid文件在keepalived容器非正常退出时,没有正确删除,造成第二次启动时容器检查到pid文件已经存在,认为该进程已经存在,因为 ...

  5. 由Redis的hGetAll函数所引发的一次服务宕机事件

    昨晚通宵生产压测,终于算是将生产服务宕机的原因定位到了,心累.这篇博客,算作一个复盘和记录吧... 先来看看Redis的缓存淘汰算法思维导图: 说明:当实际占用的内存超过Redis配置的maxmemo ...

  6. 分享:Windows2008重启后提示系统恢复选项的解决办法

    如题:WINdows2008服务器. 重启后提示系统恢复选项的解决办法 使用windows 2008后,不能启动的问题,重启后出现 修复系统选项 采用下面帖子中的部分命令搞定之. 我自己是直接使用:选 ...

  7. 记一次 oracle 数据库在宕机后的恢复

    系统:redhat 6.6 oracle版本: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production 问题描述: ...

  8. oracle 归档模式开启后数据库宕机解决过程

    首先按照网友说的shutdown immediately,结果hang了半个小时也么反应. 然后检查日志,全盘搜索.trc,发现 (D:\app\oracle\diag\rdbms\cms1u\cms ...

  9. 万答#4,延迟从库加上MASTER_DELAY,主库宕机后如何快速恢复服务

    欢迎来到 GreatSQL社区分享的MySQL技术文章,如有疑问或想学习的内容,可以在下方评论区留言,看到后会进行解答 当主库宕机后,延迟从库如何才能"取消"主动延迟,以便恢复服务 ...

随机推荐

  1. Stub, Mock and Proxy Testing

    Table of Contents Stubs, Mocks, and Proxies Stub, Mock, and Proxy Testing with Testimonial Mock test ...

  2. 重新认识被人遗忘的HTTP头注入

    前言 注入类漏洞经久不衰,多年保持在owasp Top 10的首位.今天就聊聊那些被人遗忘的http头注入.用简单的实际代码进行演示,让每个人更深刻的去认识该漏洞. HOST注入 在以往http1.0 ...

  3. JavaWeb项目实现文件下载

    File file = new File(path);// path是根据日志路径和文件名拼接出来的 String filename = file.getName();// 获取日志文件名称 Inpu ...

  4. web-inf文件夹的使用

    web-inf文件夹在正常情况下系统是不允许外界直接访问的,访问里面的文件就会出现404错误,但是系统可以自身进行访问,所有要使系统去访问的话就需要使用拦截控制器去接受外界命令,由控制器来转发访问请求 ...

  5. 2017年开发者生态报告:Python最多人想尝试的编程语言(转载)

    在过去的十年里,Python 语言获得了最大的增长幅度,已经成为最受欢迎的程序设计语言之一.JetBrains 近日发布了 2017 开发者生态报告,该报告包含开发人员对 11 种编程语言以及数据库和 ...

  6. CentOS7环境下SSH端口修改笔记

    CentOS7环境下SSH端口修改笔记 说明: CentOS7服务器环境,默认SSH端口为22,考虑到安全方面问题,欲修改端口为62231(机器内网IP为192.168.1.31) ssh配置文件和i ...

  7. mac os x+paralles使用source insight

    将Mac OS X下的目录共享到Paralles后,source insight创建工程.但是当再次打开时却打开失败.提示:there was an error opening project 网上对 ...

  8. Codeforces #263 div2 解题报告

    比赛链接:http://codeforces.com/contest/462 这次比赛的时候,刚刚注冊的时候非常想好好的做一下,可是网上喝了个小酒之后.也就迷迷糊糊地看了题目,做了几题.一觉醒来发现r ...

  9. Android学习(三) 自动完成的使用

    1.AutoCompleteTextView 自动完成功能,在文本框中输入字符,会出现匹配的自动提示.类似百度搜索. XML代码 <?xml version="1.0" en ...

  10. 已经入了vim的坑

    一.移动光标 1.左移h.右移l.下移j.上移k 2.向下翻页ctrl + f,向上翻页ctrl + b 3.向下翻半页ctrl + d,向上翻半页ctrl + u 4.移动到行尾$,移动到行首0(数 ...