如何解决kernel BUG xfs错误?

CentOS(RHEL)

1 前言

1.1 系统崩溃日志

我们最近发现有的服务器无端重启,然后使用如下命令查看系统崩溃的日志,

cat /var/crash/127.0.0.1-2020-11-29-17\:29\:26/vmcore-dmesg.txt

可见如下提示,

#...
[6163981.330162] ------------[ cut here ]------------
[6163981.330210] kernel BUG at fs/xfs/xfs_aops.c:1062!
[6163981.330246] invalid opcode: 0000 [#1] SMP 
[6163981.330280] Modules linked in: binfmt_misc xt_REDIRECT nf_nat_redirect ip_vs_rr xt_ipvs ip_vs xt_nat veth vxlan ip6_udp_tunnel udp_tunnel iptable_mangle xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) ext4 mbcache jbd2 arc4 md4 nls_utf8 cifs dns_resolver team_mode_activebackup team intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd bnx2x ipmi_devintf iTCO_wdt iTCO_vendor_support sg ipmi_ssif pcspkr ptp sb_edac pps_core ioatdma hpilo hpwdt mdio edac_core lpc_ich dca shpchp ipmi_si wmi ipmi_msghandler acpi_power_meter pcc_cpufreq
[6163981.330887]  ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul crct10dif_common sysimgblt crc32c_intel fb_sys_fops ttm serio_raw ata_piix drm libata hpsa i2c_core scsi_transport_sas fjes dm_mirror dm_region_hash dm_log dm_mod
[6163981.331136] CPU: 5 PID: 11005 Comm: kworker/u129:2 Tainted: G        W      ------------ T 3.10.0-514.el7.x86_64 #1
[6163981.331205] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014
[6163981.331257] Workqueue: writeback bdi_writeback_workfn (flush-253:3)
[6163981.331305] task: ffff88040426bec0 ti: ffff88187c964000 task.ti: ffff88187c964000
[6163981.331356] RIP: 0010:[<ffffffffa023b2fb>]  [<ffffffffa023b2fb>] xfs_vm_writepage+0x58b/0x5d0 [xfs]
[6163981.331465] RSP: 0018:ffff88187c967948  EFLAGS: 00010246
[6163981.331503] RAX: 006011c30000002d RBX: ffff882fd0b37e48 RCX: 000000000000000c
[6163981.331552] RDX: 0000000000000008 RSI: ffff88187c967c40 RDI: ffffea007ba1b480
[6163981.331601] RBP: ffff88187c9679f0 R08: 0000000000000000 R09: 000000000001a100
[6163981.331652] R10: ffff88303ffd9000 R11: 0000000000000000 R12: ffff882fd0b37e48
[6163981.331700] R13: ffff88187c967c40 R14: ffff882fd0b37cf8 R15: ffffea007ba1b480
[6163981.331750] FS:  0000000000000000(0000) GS:ffff8817df740000(0000) knlGS:0000000000000000
[6163981.331805] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6163981.331845] CR2: 00007f7c48aefb10 CR3: 00000000019ba000 CR4: 00000000001407e0
[6163981.331894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[6163981.331944] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[6163981.331992] Stack:
[6163981.332009]  0000000000008000 ffff88187c967988 ffff88187c967c40 ffff8817def46ca8
[6163981.332067]  ffff8817def46ca8 ffffea007ba1b480 0000000000001000 00007f2537b1e000
[6163981.332124]  0000000000001000 ffffffff811ba911 0000000000000000 0000000000000000
[6163981.332181] Call Trace:
[6163981.332208]  [<ffffffff811ba911>] ? page_mkclean+0x1b1/0x1f0
[6163981.332252]  [<ffffffff8118b3b3>] __writepage+0x13/0x50
[6163981.332292]  [<ffffffff8118bed1>] write_cache_pages+0x251/0x4d0
[6163981.332335]  [<ffffffff8118b3a0>] ? global_dirtyable_memory+0x70/0x70
[6163981.332382]  [<ffffffff8118c19d>] generic_writepages+0x4d/0x80
[6163981.332450]  [<ffffffffa023a063>] xfs_vm_writepages+0x53/0x90 [xfs]
[6163981.332496]  [<ffffffff8118d24e>] do_writepages+0x1e/0x40
[6163981.332537]  [<ffffffff81228730>] __writeback_single_inode+0x40/0x210
[6163981.332584]  [<ffffffff8122941e>] writeback_sb_inodes+0x25e/0x420
[6163981.332629]  [<ffffffff8122967f>] __writeback_inodes_wb+0x9f/0xd0
[6163981.334794]  [<ffffffff81229ec3>] wb_writeback+0x263/0x2f0
[6163981.336631]  [<ffffffff8121878c>] ? get_nr_inodes+0x4c/0x70
[6163981.338464]  [<ffffffff8122bebb>] bdi_writeback_workfn+0x2cb/0x460
[6163981.340304]  [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
[6163981.342098]  [<ffffffff810a8d76>] worker_thread+0x126/0x410
[6163981.343847]  [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
[6163981.345597]  [<ffffffff810b052f>] kthread+0xcf/0xe0
[6163981.347787]  [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[6163981.349309]  [<ffffffff81696518>] ret_from_fork+0x58/0x90
[6163981.350783]  [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[6163981.352227] Code: e0 80 3d 4d b4 06 00 00 0f 85 a4 fe ff ff be d7 03 00 00 48 c7 c7 4a a0 28 a0 e8 61 a6 e4 e0 c6 05 2f b4 06 00 01 e9 87 fe ff ff <0f> 0b 8b 4d a4 e9 e8 fb ff ff 41 b9 01 00 00 00 e9 69 fd ff ff 
[6163981.355534] RIP  [<ffffffffa023b2fb>] xfs_vm_writepage+0x58b/0x5d0 [xfs]
[6163981.357018]  RSP <ffff88187c967948>

1.2 当前系统内核版本

uname -a

命令查询当前内核版本,可见如下显示,

Linux hd07.cmdschool.org 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

2 最佳实践

2.1 设置系统源

yum install -y http://centos.sae.com.hk/7.3.1611/os/x86_64/Packages/centos-release-7-3.1611.el7.centos.x86_64.rpm

2.2 更新系统包

yum update -y

2.3 确认内核更新

uname -a

命令查询当前内核版本,可见如下显示,

Linux hd07.cmdschool.org 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

注:“kernel-3.10.0-514.16.1.el7”以上版本即可解决此问题。

2.4 回退到旧版本的内核

2.4.1 查看当前内核

grub2-editenv list

可见如下显示,

saved_entry=CentOS Linux (3.10.0-514.26.2.el7.x86_64) 7 (Core)

2.4.2 查看可选内核

grep ^menuentry /boot/grub2/grub.cfg

可见如下显示,

menuentry 'CentOS Linux (3.10.0-514.26.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-514.el7.x86_64-advanced-aaff4895-29ee-4c9d-a83b-b3b809f94972' {
menuentry 'CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-514.el7.x86_64-advanced-aaff4895-29ee-4c9d-a83b-b3b809f94972' {
menuentry 'CentOS Linux (0-rescue-510cd1c4afcf450d964b198693353422) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-510cd1c4afcf450d964b198693353422-advanced-aaff4895-29ee-4c9d-a83b-b3b809f94972' {

2.4.3 回退到旧内核

grub2-set-default "CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)"

然后,可用如下命令确认,

grub2-editenv list

可见如下显示,

saved_entry=CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)

然后,重启系统使配置生效,

reboot

参阅文档
=================
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e83b79b2d6c78bf1b4aa227938a214dcbddc83f
https://bugs.centos.org/view.php?id=14073
https://bugzilla.redhat.com/show_bug.cgi?id=1396941
https://access.redhat.com/solutions/2779111

没有评论

发表评论

CentOS(RHEL)
如何配置系统网桥?

1 前言 一个问题,一篇文章,一出故事。 笔者手里有一个1U的8端口服务器,正好当交换机用。 2 最 …

CentOS(RHEL)
如何修复非XFS文件系统逻辑坏道?

1 前言 一个问题,一篇文章,一出故事。 笔者服务器根分区出现逻辑坏道,于是整理此文。 另外,如果你 …

CentOS(RHEL)
如何破解CentOS或RHEL 7的root密码?

1 前言 一个问题,一篇文章,一出故事。 本章将讲述如何破解系统root的密码。 2 最佳实践 2. …