RHEL-Like
1 前言
1.1 系统崩溃日志
我们最近发现有的服务器无端重启,然后使用如下命令查看系统崩溃的日志,
cat /var/crash/127.0.0.1-2020-11-29-17\:29\:26/vmcore-dmesg.txt
可见如下提示,
#... [6163981.330162] ------------[ cut here ]------------ [6163981.330210] kernel BUG at fs/xfs/xfs_aops.c:1062! [6163981.330246] invalid opcode: 0000 [#1] SMP [6163981.330280] Modules linked in: binfmt_misc xt_REDIRECT nf_nat_redirect ip_vs_rr xt_ipvs ip_vs xt_nat veth vxlan ip6_udp_tunnel udp_tunnel iptable_mangle xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) ext4 mbcache jbd2 arc4 md4 nls_utf8 cifs dns_resolver team_mode_activebackup team intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd bnx2x ipmi_devintf iTCO_wdt iTCO_vendor_support sg ipmi_ssif pcspkr ptp sb_edac pps_core ioatdma hpilo hpwdt mdio edac_core lpc_ich dca shpchp ipmi_si wmi ipmi_msghandler acpi_power_meter pcc_cpufreq [6163981.330887] ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul crct10dif_common sysimgblt crc32c_intel fb_sys_fops ttm serio_raw ata_piix drm libata hpsa i2c_core scsi_transport_sas fjes dm_mirror dm_region_hash dm_log dm_mod [6163981.331136] CPU: 5 PID: 11005 Comm: kworker/u129:2 Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1 [6163981.331205] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/02/2014 [6163981.331257] Workqueue: writeback bdi_writeback_workfn (flush-253:3) [6163981.331305] task: ffff88040426bec0 ti: ffff88187c964000 task.ti: ffff88187c964000 [6163981.331356] RIP: 0010:[<ffffffffa023b2fb>] [<ffffffffa023b2fb>] xfs_vm_writepage+0x58b/0x5d0 [xfs] [6163981.331465] RSP: 0018:ffff88187c967948 EFLAGS: 00010246 [6163981.331503] RAX: 006011c30000002d RBX: ffff882fd0b37e48 RCX: 000000000000000c [6163981.331552] RDX: 0000000000000008 RSI: ffff88187c967c40 RDI: ffffea007ba1b480 [6163981.331601] RBP: ffff88187c9679f0 R08: 0000000000000000 R09: 000000000001a100 [6163981.331652] R10: ffff88303ffd9000 R11: 0000000000000000 R12: ffff882fd0b37e48 [6163981.331700] R13: ffff88187c967c40 R14: ffff882fd0b37cf8 R15: ffffea007ba1b480 [6163981.331750] FS: 0000000000000000(0000) GS:ffff8817df740000(0000) knlGS:0000000000000000 [6163981.331805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6163981.331845] CR2: 00007f7c48aefb10 CR3: 00000000019ba000 CR4: 00000000001407e0 [6163981.331894] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6163981.331944] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [6163981.331992] Stack: [6163981.332009] 0000000000008000 ffff88187c967988 ffff88187c967c40 ffff8817def46ca8 [6163981.332067] ffff8817def46ca8 ffffea007ba1b480 0000000000001000 00007f2537b1e000 [6163981.332124] 0000000000001000 ffffffff811ba911 0000000000000000 0000000000000000 [6163981.332181] Call Trace: [6163981.332208] [<ffffffff811ba911>] ? page_mkclean+0x1b1/0x1f0 [6163981.332252] [<ffffffff8118b3b3>] __writepage+0x13/0x50 [6163981.332292] [<ffffffff8118bed1>] write_cache_pages+0x251/0x4d0 [6163981.332335] [<ffffffff8118b3a0>] ? global_dirtyable_memory+0x70/0x70 [6163981.332382] [<ffffffff8118c19d>] generic_writepages+0x4d/0x80 [6163981.332450] [<ffffffffa023a063>] xfs_vm_writepages+0x53/0x90 [xfs] [6163981.332496] [<ffffffff8118d24e>] do_writepages+0x1e/0x40 [6163981.332537] [<ffffffff81228730>] __writeback_single_inode+0x40/0x210 [6163981.332584] [<ffffffff8122941e>] writeback_sb_inodes+0x25e/0x420 [6163981.332629] [<ffffffff8122967f>] __writeback_inodes_wb+0x9f/0xd0 [6163981.334794] [<ffffffff81229ec3>] wb_writeback+0x263/0x2f0 [6163981.336631] [<ffffffff8121878c>] ? get_nr_inodes+0x4c/0x70 [6163981.338464] [<ffffffff8122bebb>] bdi_writeback_workfn+0x2cb/0x460 [6163981.340304] [<ffffffff810a7f3b>] process_one_work+0x17b/0x470 [6163981.342098] [<ffffffff810a8d76>] worker_thread+0x126/0x410 [6163981.343847] [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460 [6163981.345597] [<ffffffff810b052f>] kthread+0xcf/0xe0 [6163981.347787] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140 [6163981.349309] [<ffffffff81696518>] ret_from_fork+0x58/0x90 [6163981.350783] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140 [6163981.352227] Code: e0 80 3d 4d b4 06 00 00 0f 85 a4 fe ff ff be d7 03 00 00 48 c7 c7 4a a0 28 a0 e8 61 a6 e4 e0 c6 05 2f b4 06 00 01 e9 87 fe ff ff <0f> 0b 8b 4d a4 e9 e8 fb ff ff 41 b9 01 00 00 00 e9 69 fd ff ff [6163981.355534] RIP [<ffffffffa023b2fb>] xfs_vm_writepage+0x58b/0x5d0 [xfs] [6163981.357018] RSP <ffff88187c967948>
1.2 当前系统内核版本
uname -a
命令查询当前内核版本,可见如下显示,
Linux hd07.cmdschool.org 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
2 最佳实践
2.1 设置系统源
yum install -y http://centos.cmdschool.org/7.3.1611/os/x86_64/Packages/centos-release-7-3.1611.el7.centos.x86_64.rpm
2.2 更新系统包
yum update -y
2.3 确认内核更新
uname -a
命令查询当前内核版本,可见如下显示,
Linux hd07.cmdschool.org 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
注:“kernel-3.10.0-514.16.1.el7”以上版本即可解决此问题。
2.4 回退到旧版本的内核
2.4.1 查看当前内核
grub2-editenv list
可见如下显示,
saved_entry=CentOS Linux (3.10.0-514.26.2.el7.x86_64) 7 (Core)
2.4.2 查看可选内核
grep ^menuentry /boot/grub2/grub.cfg
可见如下显示,
menuentry 'CentOS Linux (3.10.0-514.26.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-514.el7.x86_64-advanced-aaff4895-29ee-4c9d-a83b-b3b809f94972' { menuentry 'CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-514.el7.x86_64-advanced-aaff4895-29ee-4c9d-a83b-b3b809f94972' { menuentry 'CentOS Linux (0-rescue-510cd1c4afcf450d964b198693353422) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-510cd1c4afcf450d964b198693353422-advanced-aaff4895-29ee-4c9d-a83b-b3b809f94972' {
2.4.3 回退到旧内核
grub2-set-default "CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)"
然后,可用如下命令确认,
grub2-editenv list
可见如下显示,
saved_entry=CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)
然后,重启系统使配置生效,
reboot
参阅文档
=================
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e83b79b2d6c78bf1b4aa227938a214dcbddc83f
https://bugs.centos.org/view.php?id=14073
https://bugzilla.redhat.com/show_bug.cgi?id=1396941
https://access.redhat.com/solutions/2779111
没有评论