科技知识动态:从库crash一直自动重启(mysqld got signal 11)问题解决

导读 跟大家讲解下有关从库crash一直自动重启(mysqld got signal 11)问题解决,相信小伙伴们对这个话题应该也很关注吧,现在就为小伙伴们说说

跟大家讲解下有关从库crash一直自动重启(mysqld got signal 11)问题解决,相信小伙伴们对这个话题应该也很关注吧,现在就为小伙伴们说说从库crash一直自动重启(mysqld got signal 11)问题解决,小编也收集到了有关从库crash一直自动重启(mysqld got signal 11)问题解决的相关资料,希望大家看到了会喜欢。

一:问题描述 今天收到邮件报警,遂进数据库查看slave状态,发现io进程和sql进程都为NO. mysql show slave status \G;*************************** 1. row*************************** Slave_IO_State: Master_Host: 此处不予显示,哈哈 Master_User: replic

一:问题描述

今天收到邮件报警,遂进数据库查看slave状态,发现io进程和sql进程都为NO.

mysql> show slave status \G;*************************** 1. row*************************** Slave_IO_State: Master_Host: 此处不予显示,哈哈 Master_User: replica Master_Port: 3306 Connect_Retry: 60 Master_Log_File:master1-bin.001191 Read_Master_Log_Pos: 29214749 Relay_Log_File:web_appdb_10-relay-bin.000663 Relay_Log_Pos: 29213639 Relay_Master_Log_File: master1-bin.001191 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: ccda.%,eip_fileservice.% Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 29213491 Relay_Log_Space: 29215212 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 01 row in set (0.01 sec) ERROR:No query specified尝试启动,然后再次查看状态,竟然报错,说连不上数据库。

mysql> start slave;Query OK, 0 rows affected (0.00 sec) mysql> show slave status \G;ERROR 2006 (HY000): MySQL server has goneawayNo connection. Trying to reconnect...ERROR 2002 (HY000): Can't connect to localMySQL server through socket '/tmp/mysql.sock' (2)ERROR:Can't connect to the server ERROR:No query specified

连续尝试多次,可以登录数据库了,再次查询复制,发现状态还是NO.

mysql> show slave status \G;No connection. Trying to reconnect...Connection id: 1Current database: *** NONE *** *************************** 1. row*************************** Slave_IO_State: Master_Host: 10.0.3.34 Master_User: replica Master_Port: 3306 Connect_Retry: 60 Master_Log_File:master1-bin.001191 Read_Master_Log_Pos: 29214749 Relay_Log_File:web_appdb_10-relay-bin.000663 Relay_Log_Pos: 29213639 Relay_Master_Log_File: master1-bin.001191 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: ccda.%,eip_fileservice.% Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 29213491 Relay_Log_Space: 29215426 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 01 row in set (0.00 sec) ERROR:No query specified

发现只要start slave,该服务器数据库就会自动重启。

而且start slave io_thread没问题,当start slave sql_thread时,才会导致数据库自动重启。

查看错误日志:

160429 9:09:00 [Note] Event Scheduler: Loaded 0 events160429 9:09:00 [Note] /usr/local/mysql/bin/mysqld: ready for connections.Version: '5.5.19-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution160429 11:04:47 [Note] Slave SQL threadinitialized, starting replication in log 'master1-bin.001191' at position29213491, relay log './web_appdb_10-relay-bin.000663' position: 29213639160429 11:04:47 - mysqld got signal 11 ;This could be because you hit a bug. It isalso possible that this binaryor one of the libraries it was linkedagainst is corrupt, improperly built,or misconfigured. This error can also becaused by malfunctioning hardware.We will try our best to scrape up some infothat will hopefully help diagnosethe problem, but since we have alreadycrashed, something is definitely wrongand this may fail. key_buffer_size=268435456read_buffer_size=6291456max_used_connections=3max_threads=2000thread_count=2connection_count=2It is possible that mysqld could use up tokey_buffer_size + (read_buffer_size +sort_buffer_size)*max_threads = 20764878 Kbytes of memoryHope that's ok; if not, decrease somevariables in the equation. Thread pointer: 0x2ab2f1b54740Attempting backtrace. You can use thefollowing information to find outwhere mysqld died. If you see no messagesafter this, something wentterribly wrong...stack_bottom = 0x594310e8 thread_stack0x30000/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x33)[0x765df3]/usr/local/mysql/bin/mysqld(handle_segfault+0x36e)[0x4ee4fe]/lib64/libpthread.so.0[0x31a640ebe0]/usr/local/mysql/bin/mysqld(_ZNK9table_def15compatible_withEP3THDP14Relay_log_infoP5TABLEPS5_+0x31a)[0x74c29a]/usr/local/mysql/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0xcdc)[0x6f0d3c]/usr/local/mysql/bin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x14d)[0x5021ed]/usr/local/mysql/bin/mysqld[0x504b19]/usr/local/mysql/bin/mysqld(handle_slave_sql+0xc0a)[0x5061ea]/lib64/libpthread.so.0[0x31a640677d]/lib64/libc.so.6(clone+0x6d)[0x31a54d49ad] Trying to get some variables.Some pointers may be invalid and cause thedump to abort.Query ((nil)): is an invalid pointerConnection ID (thread ID): 353Status: NOT_KILLED The manual page athttp://dev.mysql.com/doc/mysql/en/crashing.html containsinformation that should help you find outwhat is causing the crash.160429 11:04:48 mysqld_safe Number ofprocesses running now: 0160429 11:04:48 mysqld_safe mysqldrestarted160429 11:04:48 InnoDB: The InnoDB memoryheap is disabled160429 11:04:48 InnoDB: Mutexes andrw_locks use GCC atomic builtins160429 11:04:48 InnoDB: Compressed tablesuse zlib 1.2.3160429 11:04:48 InnoDB: Initializing bufferpool, size = 32.0G160429 11:04:50 InnoDB: Completedinitialization of buffer pool160429 11:04:50 InnoDB: highest supportedfile format is Barracuda.InnoDB: The log sequence number in ibdatafiles does not matchInnoDB: the log sequence number in theib_logfiles!160429 11:04:50 InnoDB: Database was not shut down normally!InnoDB: Starting crash recovery.InnoDB: Reading tablespace information fromthe .ibd files...InnoDB: Restoring possible half-writtendata pages from the doublewriteInnoDB: buffer...InnoDB: Last MySQL binlog file position 0112571, file name ./mysql-bin.048292160429 11:04:52 InnoDB: Waiting for the background threads tostart160429 11:04:53 InnoDB: 1.1.8 started; logsequence number 5992159806777160429 11:04:53 [Note] Recovering after acrash using mysql-bin160429 11:04:53 [Note] Starting crashrecovery...160429 11:04:53 [Note] Crash recoveryfinished.160429 11:04:53 [Warning] Neither--relay-log nor --relay-log-index were used; so replication may break when thisMySQL server acts as a slave and has his hostname changed!! Please use'--relay-log=web_appdb_10-relay-bin' to avoid this problem.160429 11:04:53 [Note] Event Scheduler:Loaded 0 events160429 11:04:53 [Note]/usr/local/mysql/bin/mysqld: ready for connections.Version: '5.5.19-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution

对于这个错误”mysqld got signal 11”,我在网上查了,有的说是磁盘空间满了,有的说是内存问题,也有可能是硬件错误,也有可能是中继日志重放位置的sql导致的。

查看中继日志该位置执行的语句:

Relay_Log_File:web_appdb_10-relay-bin.000663

Relay_Log_Pos: 29213639

# at 29213639#160428 21:29:32 server id 1 end_log_pos 29213559 Query thread_id=624506 exec_time=0 error_code=0SET TIMESTAMP=1461850172;;SET@@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=33;BEGIN;# at 29213707#160428 21:29:32 server id 1 end_log_pos 29213657 Table_map: `ccda`.`ess_accounting_relation`mapped to number 311993# at 29213805#160428 21:29:32 server id 1 end_log_pos 29213757 Table_map:`ccda`.`ess_accounting_relation_1` mapped to number 311994# at 29213905#160428 21:29:32 server id 1 end_log_pos 29214051 Update_rows: table id 311993 flags:STMT_END_F BINLOG 'PBAiVxMBAAAAYgAAANnDvQEAALnCBAAAAAEABGNjZGEAF2Vzc19hY2NvdW50aW5nX3JlbGF0aW9uAA4DDw8PDw8PDw8PDw8MAxYsATwAPAAGACwBPAAsAZYAlgCWAJYA/D8=PBAiVxMBAAAAZAAAAD3EvQEAALrCBAAAAAEABGNjZGEAGWVzc19hY2NvdW50aW5nX3JlbGF0aW9uXzEADgMPDw8PDw8PDw8PDwwDFiwBPAA8AAYALAE8ACwBlgCWAJYAlgD8Pw==PBAiVxgBAAAAJgEAAGPFvQEAALnCBAAAAAEADv////8Q4Dnl7wATADEyMTEwMTAxMTE1MTEwMDE2MzUS5bqU5LuY5LuY5qy+5Yet6K+BA0VBUwAACeaKpei0puWNlRYAemhhbmd5MTA0NzE1MTExODEzMTYyMhcxMTExMDExNTExMTg1MDUyMzA2MjE2MQAAAI3GveRVEgAAEOA55e8AEwAxMjExMDEwMTExNTExMDAxNjM1EuW6lOS7mOS7mOasvuWHreivgQNFQVMUADExMTEwMTE1MTEwOTEwMzgwMDg5CeaKpei0puWNlRYAemhhbmd5MTA0NzE1MTExODEzMTYyMhcxMTExMDExNTExMTg1MDUyMzA2MjE2MQAAAI3GveRVEgAA';### UPDATE `ccda`.`ess_accounting_relation`### WHERE### @1=15721785### @2='1211010111511001635'### @3='应付付款凭证'### @4='EAS'### @5=NULL### @6=''### @7='报账单'### @8='zhangy1047151118131622'### @9='11110115111850523062161'### @10=''### @11=''### @12=''### @13=2016-01-19 16:25:09### @14=NULL### SET### @1=15721785### @2='1211010111511001635'### @3='应付付款凭证'### @4='EAS'### @5=NULL### @6='11110115110910380089'### @7='报账单'### @8='zhangy1047151118131622'### @9='11110115111850523062161'### @10=''### @11=''### @12=''### @13=2016-01-19 16:25:09### @14=NULL

先备份一下该记录,然后手动在从库上更新一下,看是否报错。

UPDATE `ccda`.`ess_accounting_relation`

SET attachId='11110115110910380089'

WHERE id = 15721785;

结果发现在从库上也可以正常update呀。

后来我想查看下该表表结构,结果出现错误:

mysql> show create table`ccda`.`ess_accounting_relation` \G;

ERROR 144 (HY000): Table'./ccda/ess_accounting_relation_1' is marked as crashed and last (automatic?)repair failed

ERROR:

No query specified

mysql> select count(*) fromccda.ess_accounting_relation_1;

ERROR 144 (HY000): Table'./ccda/ess_accounting_relation_1' is marked as crashed and last (automatic?)repair failed

诡异,刚才还能更新呢,现在却又不能正常访问了。

然后,尝试修复出问题的表:

check table ccda.ess_accounting_relation_1;

repair table ccda.ess_accounting_relation_1;

修复成功后,查看ccda.ess_accounting_relation 表结构,发现该表是个合并表,ess_accounting_relation_1是myisam引擎:

CREATE TABLE `ess_accounting_relation` (

……)

ENGINE=MRG_MyISAM DEFAULT CHARSET=utf8 INSERT_METHOD=LASTUNION=(`ess_accounting_relation_1`)

原本以为修复该表成功后,start slave,就正常了。结果还是会导致数据库重启。

此时,再检查ess_accounting_relation_1,也是正常的,没有显示崩溃。

我试验,在从库跳过该表的操作(用change master to或者set global sql_slave_skip_counter=n),当执行其他表的操作时,并没有导致从库重启。

我试验在从库配置文件里添加参数:replicate_ignore_table=ccda.ess_accounting_relation过滤掉这个表,然后重启数据库,再start slave,没有导致从库重启。

最后,我大胆试验下,在主库直接操作ess_accounting_relation_1的某条数据(前提是已经注释掉了上面的参数replicate_ignore_table),发现从库在应用相应数据时,并没有导致重启。

所以,问题就出在了这个mrg_myisam存储引擎。

这个表,其实每天也都有update,可是为什么最近才出现了这个问题,那就不知道了。因为这个表引用的子表数据量太大了吗?该表大概1600万数据。不晓得。

二:出错原因

一个mrg_myisam存储引擎的合并表,引用了一个myisam引擎的子表,更新前者导致slave数据库一直自动重启,且偶尔子表也会发生崩溃。

这估计是mysql的一个bug吧。

三:解决办法

由于ess_accounting_relation表数据只来源于ccda.ess_accounting_relation_1这一个表,实际上并没有合并的意义,而且,通过了解发现,这个myisam表经常更新。myisam容易崩溃,且不支持行锁,故建议将ccda.ess_accounting_relation_1改成innodb存储引擎(可以先mysqldump备份下这个表,然后在备份文件里将MyISAM改成innodb),删掉ccda.ess_accounting_relation,将ccda.ess_accounting_relation_1重命名为ccda.ess_accounting_relation。

--关于mrg_myisam介绍,请参考http://blog.csdn.net/yabingshi_tech/article/details/51320701

--备注:mysql版本5.5.19

来源:php中文网

免责声明:本文由用户上传,如有侵权请联系删除!