案例环境

操作系统 :Oracle Linux Server release 5.7 64bit 虚拟机

硬件配置 : 物理机型号为DELL R720

资源配置 :RAM 8G Intel(R) Xeon(R) CPU E5-2690 8核

案例描述

早晨发现桂林那边一台Linux服务器(虚拟机)网络无法ping通,于是联系那边的系统管理员通过Lync共享桌面给我,通过他的电脑VMware vSphere Client登录后,发现在控制台亦无响应。无法登录、无法操作,输入操作无响应。也就是说系统宕机了。没有办法,只能在虚拟机“电源”选项,通过“关闭电源”、“打开电源”选项重启Linux服务器,然后重启了Tomcat服务和Oracle数据库服务。检查了Oracle数据库的告警日志,没有发现任何错误。我的领导通过分析Linux系统日志后,发现在8月1号晚上2:22左右出现,出现内存不足,Linux出于保护机制,把一些无关紧要的进程杀掉。具体错误信息如下所示(服务器名称做了下混淆)

Aug  1 01:36:09 G*******LNX01 ntpd[3555]: kernel time sync enabled 4001

Aug  1 01:53:13 G*******LNX01 ntpd[3555]: kernel time sync enabled 0001

Aug  1 02:22:36 G*******LNX01 kernel: hald invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0

Aug  1 02:22:37 G*******LNX01 kernel: hald cpuset=/ mems_allowed=0

Aug  1 02:22:37 G*******LNX01 kernel: Pid: 3408, comm: hald Not tainted 2.6.32-200.13.1.el5uek #1

Aug  1 02:22:37 G*******LNX01 kernel: Call Trace:

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810a0b66>] ? cpuset_print_task_mems_allowed+0x92/0x9e

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810d9fbc>] ? select_bad_process+0xbc/0x102

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810da03f>] __out_of_memory+0x3d/0x86

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810da30f>] out_of_memory+0xfc/0x195

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810dd75e>] __alloc_pages_nodemask+0x487/0x595

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff811075ac>] alloc_page_vma+0xb9/0xc8

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810ff0a7>] read_swap_cache_async+0x52/0xf1

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810ff1a3>] swapin_readahead+0x5d/0x9c

Aug  1 02:22:38 G*******LNX01 kernel:  [<ffffffff810d725a>] ? find_get_page+0x22/0x69

Aug  1 02:22:38 G*******LNX01 kernel:  [<ffffffff810f1ea3>] handle_mm_fault+0x44b/0x80f

Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81043696>] ? should_resched+0xe/0x2f

Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81456006>] do_page_fault+0x210/0x299

Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81453fd5>] page_fault+0x25/0x30

Aug  1 02:22:40 G*******LNX01 kernel: Mem-Info:

Aug  1 02:22:43 G*******LNX01 kernel: Node 0 DMA per-cpu:

Aug  1 02:22:44 G*******LNX01 kernel: CPU    0: hi:    0, btch:   1 usd:   0

Aug  1 02:22:44 G*******LNX01 kernel: CPU    1: hi:    0, btch:   1 usd:   0

Aug  1 08:51:04 G*******LNX01 syslogd 1.4.1: restart.

Aug  1 08:51:04 G*******LNX01 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuset

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpu

OOM Killer,说白了 OOM Killer 就是一层保护机制,用于避免在内存不足的时候不至于出现严重问题,把一些无关的进程优先杀掉,即在内存严重不足时,系统为了继续运转,内核会挑选一个进程,将其杀掉,以释放内存,缓解内存不足情况,不过这种保护是有限的,不能完全的保护进程的运行。

但是这个时间点是发生在凌晨2点多,于是我继续检查/var/log/messages日志信息,发现系统启动时出现了

“Phoenix BIOS detected: BIOS may corrupt low RAM, working around it”错误。于是google搜索这个错误信息。

Aug  1 08:51:04 G*******LNX01 syslogd 1.4.1: restart.

Aug  1 08:51:04 G*******LNX01 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuset

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpu

Aug  1 08:51:04 G*******LNX01 kernel: Linux version 2.6.32-200.13.1.el5uek (mockbuild@ca-build9.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Wed Jul 27 2

1:02:33 EDT 2011

Aug  1 08:51:04 G*******LNX01 kernel: Command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet

Aug  1 08:51:04 G*******LNX01 kernel: KERNEL supported cpus:

Aug  1 08:51:04 G*******LNX01 kernel:   Intel GenuineIntel

Aug  1 08:51:04 G*******LNX01 kernel:   AMD AuthenticAMD

Aug  1 08:51:04 G*******LNX01 kernel:   Centaur CentaurHauls

Aug  1 08:51:04 G*******LNX01 kernel: BIOS-provided physical RAM map:

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000000000000 - 000000000009f800 (usable)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000000100000 - 00000000bfee0000 (usable)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bfee0000 - 00000000bfeff000 (ACPI data)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bfeff000 - 00000000bff00000 (ACPI NVS)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bff00000 - 00000000c0000000 (usable)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000100000000 - 0000000240000000 (usable)

Aug  1 08:51:04 G*******LNX01 kernel: DMI present.

Aug  1 08:51:04 G*******LNX01 kernel: Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.

Aug  1 08:51:04 G*******LNX01 kernel: last_pfn = 0x240000 max_arch_pfn = 0x400000000

Aug  1 08:51:04 G*******LNX01 kernel: x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106

Aug  1 08:51:04 G*******LNX01 kernel: total RAM covered: 8192M

Aug  1 08:51:04 G*******LNX01 kernel: Found optimal setting for mtrr clean up

Aug  1 08:51:04 G*******LNX01 kernel:  gran_size: 64K  chunk_size: 64K         num_reg: 4      lose cover RAM: 0G

Aug  1 08:51:04 G*******LNX01 kernel: x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106

Aug  1 08:51:04 G*******LNX01 kernel: last_pfn = 0xc0000 max_arch_pfn = 0x400000000

Aug  1 08:51:04 G*******LNX01 kernel: init_memory_mapping: 0000000000000000-00000000c0000000

Aug  1 08:51:04 G*******LNX01 kernel: init_memory_mapping: 0000000100000000-0000000240000000

Aug  1 08:51:04 G*******LNX01 kernel: RAMDISK: 37c4d000 - 37fef894

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: RSDP 00000000000f6940 00024 (v02 PTLTD )

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: XSDT 00000000bfeefddc 0005C (v01 INTEL  440BX    06040000 VMW  01324272)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: FACP 00000000bfefee98 000F4 (v04 INTEL  440BX    06040000 PTL  000F4240)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: DSDT 00000000bfef0230 0EC68 (v01 PTLTD  Custom   06040000 MSFT 03000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: FACS 00000000bfefffc0 00040

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: BOOT 00000000bfef0208 00028 (v01 PTLTD  $SBFTBL$ 06040000  LTP 00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: APIC 00000000bfef0156 000B2 (v01 PTLTD  ? APIC   06040000  LTP 00000000)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: MCFG 00000000bfef011a 0003C (v01 PTLTD  $PCITBL$ 06040000  LTP 00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: SRAT 00000000bfeefed8 00128 (v02 VMWARE MEMPLUG  06040000 VMW  00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: HPET 00000000bfeefea0 00038 (v01 VMWARE VMW HPET 06040000 VMW  00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: WAET 00000000bfeefe78 00028 (v01 VMWARE VMW WAET 06040000 VMW  00000001)

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 0 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 1 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 2 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 3 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 4 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 5 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 6 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 7 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 0-a0000

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 100000-c0000000

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 100000000-240000000

Aug  1 08:51:04 G*******LNX01 kernel: Bootmem setup node 0 0000000000000000-0000000240000000

Aug  1 08:51:04 G*******LNX01 kernel:   NODE_DATA [000000000001b840 - 000000000003183f]

Aug  1 08:51:04 G*******LNX01 kernel:   bootmap [0000000000032000 -  0000000000079fff] pages 48

Aug  1 08:51:04 G*******LNX01 kernel: (9 early reservations) ==> bootmem [0000000000 - 0240000000]

Aug  1 08:51:04 G*******LNX01 kernel:   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]

Aug  1 08:51:04 G*******LNX01 kernel:   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]

Aug  1 08:51:04 G*******LNX01 kernel:   #2 [0001000000 - 000224b73c]    TEXT DATA BSS ==> [0001000000 - 000224b73c]

Aug  1 08:51:04 G*******LNX01 kernel:   #3 [0037c4d000 - 0037fef894]          RAMDISK ==> [0037c4d000 - 0037fef894]

Aug  1 08:51:04 G*******LNX01 kernel:   #4 [000009f800 - 0000100000]    BIOS reserved ==> [000009f800 - 0000100000]

Aug  1 08:51:04 G*******LNX01 kernel:   #5 [000224c000 - 000224c1e8]              BRK ==> [000224c000 - 000224c1e8]

Aug  1 08:51:04 G*******LNX01 kernel:   #6 [0000010000 - 0000012000]          PGTABLE ==> [0000010000 - 0000012000]

Aug  1 08:51:04 G*******LNX01 kernel:   #7 [0000012000 - 0000017000]          PGTABLE ==> [0000012000 - 0000017000]

Aug  1 08:51:04 G*******LNX01 kernel:   #8 [0000017000 - 000001b840]       MEMNODEMAP ==> [0000017000 - 000001b840]

Aug  1 08:51:04 G*******LNX01 kernel: found SMP MP-table at [ffff8800000f69b0] f69b0

Aug  1 08:51:04 G*******LNX01 kernel: Zone PFN ranges:

Aug  1 08:51:04 G*******LNX01 kernel:   DMA      0x00000010 -> 0x00001000

Aug  1 08:51:04 G*******LNX01 kernel:   DMA32    0x00001000 -> 0x00100000

Aug  1 08:51:04 G*******LNX01 kernel:   Normal   0x00100000 -> 0x00240000

Aug  1 08:51:04 G*******LNX01 kernel: Movable zone start PFN for each node

Aug  1 08:51:04 G*******LNX01 kernel: early_node_map[4] active PFN ranges

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00000010 -> 0x0000009f

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00000100 -> 0x000bfee0

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x000bff00 -> 0x000c0000

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00100000 -> 0x00240000

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: PM-Timer IO Port: 0x1008

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])

Aug  1 08:51:04 G*******LNX01 kernel: IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)

Aug  1 08:51:04 G*******LNX01 kernel: Using ACPI (MADT) for SMP configuration information

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: HPET id: 0x8086af01 base: 0xfed00000

Aug  1 08:51:04 G*******LNX01 kernel: SMP: Allowing 8 CPUs, 0 hotplug CPUs

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 000000000009f000 - 00000000000a0000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000a0000 - 00000000000ca000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000ca000 - 00000000000cc000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000cc000 - 00000000000dc000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000dc000 - 0000000000100000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000bfee0000 - 00000000bfeff000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000bfeff000 - 00000000bff00000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000c0000000 - 00000000e0000000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fee01000 - 00000000fffe0000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fffe0000 - 0000000100000000

Aug  1 08:51:04 G*******LNX01 kernel: Allocating PCI resources starting at c0000000 (gap: c0000000:20000000)

Aug  1 08:51:04 G*******LNX01 kernel: Booting paravirtualized kernel on bare hardware

Aug  1 08:51:04 G*******LNX01 kernel: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:8 nr_node_ids:1

Aug  1 08:51:04 G*******LNX01 kernel: PERCPU: Embedded 29 pages/cpu @ffff880028200000 s88280 r8192 d22312 u262144

Aug  1 08:51:04 G*******LNX01 kernel: pcpu-alloc: s88280 r8192 d22312 u262144 alloc=1*2097152

Aug  1 08:51:04 G*******LNX01 kernel: pcpu-alloc: [0] 0 1 2 3 4 5 6 7 

Aug  1 08:51:04 G*******LNX01 kernel: Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2064641

Aug  1 08:51:04 G*******LNX01 kernel: Policy zone: Normal

Aug  1 08:51:04 G*******LNX01 kernel: Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet

Aug  1 08:51:04 G*******LNX01 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes)

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#0

Aug  1 08:51:04 G*******LNX01 kernel: xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340

Aug  1 08:51:04 G*******LNX01 kernel: Checking aperture...

Aug  1 08:51:04 G*******LNX01 kernel: No AGP bridge found

Aug  1 08:51:04 G*******LNX01 kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

Aug  1 08:51:04 G*******LNX01 kernel: Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000

Aug  1 08:51:04 G*******LNX01 kernel: software IO TLB at phys 0x20000000 - 0x24000000

Aug  1 08:51:04 G*******LNX01 kernel: Memory: 8183576k/9437184k available (4454k kernel code, 1049156k absent, 204452k reserved, 7191k data, 1720k init)

Aug  1 08:51:04 G*******LNX01 kernel: Hierarchical RCU implementation.

Aug  1 08:51:04 G*******LNX01 kernel: NR_IRQS:4352 nr_irqs:472

Aug  1 08:51:04 G*******LNX01 kernel: Extended CMOS year: 2000

Aug  1 08:51:04 G*******LNX01 kernel: Console: colour VGA+ 80x25

Aug  1 08:51:04 G*******LNX01 kernel: console [tty0] enabled

Aug  1 08:51:04 G*******LNX01 kernel: allocated 83886080 bytes of page_cgroup

Aug  1 08:51:04 G*******LNX01 kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups

Aug  1 08:51:04 G*******LNX01 kernel: TSC freq read from hypervisor : 2900.001 MHz

Aug  1 08:51:04 G*******LNX01 kernel: Detected 2900.001 MHz processor.

Aug  1 08:51:04 G*******LNX01 kernel: Calibrating delay loop (skipped) preset value.. 5800.00 BogoMIPS (lpj=2900001)

Aug  1 08:51:04 G*******LNX01 kernel: Security Framework initialized

Aug  1 08:51:04 G*******LNX01 kernel: SELinux:  Initializing.

Aug  1 08:51:04 G*******LNX01 kernel: Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)

Aug  1 08:51:04 G*******LNX01 kernel: Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)

Aug  1 08:51:04 G*******LNX01 kernel: Mount-cache hash table entries: 256

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys ns

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuacct

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys memory

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys devices

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys freezer

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys net_cls

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 0/0x0 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: Performance Events: Nehalem/Corei7 events, Intel PMU driver.

Aug  1 08:51:04 G*******LNX01 kernel: ... version:                3

Aug  1 08:51:04 G*******LNX01 kernel: ... bit width:              48

Aug  1 08:51:04 G*******LNX01 kernel: ... generic registers:      4

Aug  1 08:51:04 G*******LNX01 kernel: ... value mask:             0000ffffffffffff

Aug  1 08:51:04 G*******LNX01 kernel: ... max period:             000000007fffffff

Aug  1 08:51:04 G*******LNX01 kernel: ... fixed-purpose events:   3

Aug  1 08:51:04 G*******LNX01 kernel: ... event mask:             000000070000000f

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: Core revision 20090903

Aug  1 08:51:04 G*******LNX01 kernel: ftrace: converting mcount calls to 0f 1f 44 00 00

Aug  1 08:51:04 G*******LNX01 kernel: ftrace: allocating 26665 entries in 105 pages

Aug  1 08:51:04 G*******LNX01 kernel: Setting APIC routing to flat

Aug  1 08:51:04 G*******LNX01 kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1

Aug  1 08:51:04 G*******LNX01 kernel: CPU0: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 1 APIC 0x1 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 1/0x1 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU1: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Skipping synchronization checks as TSC is reliable.

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 2 APIC 0x2 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#2

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 2/0x2 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU2: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 3 APIC 0x3 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#3

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 3/0x3 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU3: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 4 APIC 0x4 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#4

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 2

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 4/0x4 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU4: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

--More--(17%)

搜索了很多资料,等我还在茫茫大海中苦苦查证时,又连接不上该服务器,该服务器又宕机了。由于没有权限,Lync共享桌面又奇慢无比。求助于我们这边的系统管理员(拥有那边的虚拟机服务器权限),他用vSphere Client连接上去检查时,终于发现一个让人既郁闷又震惊的原因。

这台物理机DELL R720只有32G内存,上面有一个Linux系统、几个Windows系统。但是有于那边管理员添加了测试服务器,所有系统总共分配的内存加起来43G已经超过了原来物理机32G内存,导致系统间资源争用。出现内存资源不足的情况。最后Linux直接宕机的情况。

出现这种情况,一来是由于管理员疏忽,没有注意到实际内存资源分配情况。二来我不知情,信息不足(由于权限问题,我并不了解那边物理机与虚拟机的情况),一直运行的好好的系统,突然出现这个问题,导致我局限在数据库、应用程序、操作系统层面去查找原因。而没有纵览全局,从架构、资源层面去查找问题。导致一直没有查找到根本原因之所在。

解决方案

关闭测试服务器,释放出足够的内存资源。问题解决。然后系统从8月1号运行到现在再也没有出现过这个问题。

参考资料:

http://dbanotes.net/database/linux_outofmemory_oom_killer.html

http://www.huomo.cn/os/article-16bb4.html

Linux服务器宕机案例一则的更多相关文章

  1. Linux服务器宕机案例第二则

    邮件告警发现海外工厂一Linux服务器连接不上,DPA(Database Performance Analyzer)系统也发现其出现问题,ping这台服务器发现网络不通,联系不到当地系统管理员,邮件咨 ...

  2. linux服务器宕机分析/性能瓶颈分析

    linux服务器宕机分析/性能瓶颈分析   服务器宕机原因很多,资源不足.应用.硬件.系统内核bug等,以下一个小例子 服务器宕机了,首先得知道服务器宕机的时间点,然后分析日志查找原因 1.last ...

  3. ORA-04031错误导致宕机案例分析

    今天遇到一起ORACLE数据库宕机案例,下面是对这起数据库宕机案例的原因进行分析.解读.分析过程中顺便记录一下这个案例的前因后果,攒点经验值,培养一下分析.解决问题的能力. 案例环境:   操作系统 ...

  4. 远离服务器宕机,腾讯WeTest正式推出服务器深度性能测试服务

    WeTest 导读 随着城市发展趋向智慧化,不仅移动互联网应用正迅速融入出行.金融.医疗.娱乐等传统行业,跟随移动互联网成长起来的,还有用户对应用使用与消费的理性意识. 而在用户不断增加的同时,如何避 ...

  5. 服务器宕机,mysql无法启动,job for mysql.service failed because the process exited with error code,数据库备份与恢复

    [问题现象] 服务器在运行过程中,因人为意外导致电源被拔,服务器宕机,mysql重启不成功,报错如下 根据提示,输入systemctl status mysql.service和journalctl ...

  6. PHP载入GIF图像造成服务器宕机(CVE-2018-5711)的漏洞复现

    参考链接: http://www.freebuf.com/vuls/161262.html 今日看新漏洞发现一个UC编辑部的标题,CVE-2018-5711:一张GIF图片就能让服务器宕机的PHP漏洞 ...

  7. nginx解决服务器宕机、解决跨域问题、配置防盗链、防止DDOS流量攻击

    解决服务器宕机 配置nginx.cfg配置文件,在映射拦截地址中加入代理地址响应方案 location / { proxy_connect_timeout 1; proxy_send_timeout ...

  8. Nginx配置服务器宕机策略

    Nginx解决服务器宕机问题,Nginx配置服务器宕机策略,如果服务器宕机,会找下一台机器进行访问        配置nginx.cfg配置文件,在映射拦截地址中加入代理地址响应方案 location ...

  9. Nginx解决服务器宕机问题

    1.Nginx解决服务器宕机问题,Nginx配置服务器宕机策略,如果服务器宕机,会找下一台机器进行访问 配置nginx.cfg配置文件,在映射拦截地址中加入代理地址响应方案 location / { ...

随机推荐

  1. objective-c 语法快速过(8)

    Block(oc 的数据类型,很常用,本质是c结构体) 类似内联函数,从源代码层看,有函数的结构,而在编译后,却不具备函数的性质.编译时,类似宏替换,使用函数体替换调用处的函数名 Block封装了一段 ...

  2. 11.struts2文件上传

    文件上传 1.上传单个文件 2.上传多个文件   1.上传单个文件 实现步骤: (1)导入一个Jar包:commons-io-1.3.2.jar.只所以要导入这个Jar包,是因为要用到一个工具类Fil ...

  3. “全能”选手—Django 1.10文档中文版Part4

    第一部分传送门 第二部分传送门 第三部分传送门 3.2 模型和数据库Models and databases 3.2.2 查询操作making queries 3.3.8 会话sessions 2.1 ...

  4. ASP.NET Core中的ActionFilter与DI

    一.简介 前几篇文章都是讲ASP.NET Core MVC中的依赖注入(DI)与扩展点的,也许大家都发现在ASP.NET CORE中所有的组件都是通过依赖注入来扩展的,而且面向一组功能就会有一组接口或 ...

  5. 创建ASP.NET Core MVC应用程序(6)-添加验证

    创建ASP.NET Core MVC应用程序(6)-添加验证 DRY原则 DRY("Don't Repeat Yourself")是MVC的设计原则之一.ASP.NET MVC鼓励 ...

  6. [WCF编程]8.服务实例的生命周期

    一.服务实例的生命周期概览 我们已经直到,通过显式调用Close方法或等待默认的超时时间到来,都可以释放服务实例.但是,在会话连接里,经常需要按一定顺序调用方法. 二.分步操作 会话契约的操作有时隐含 ...

  7. MessageBox的常用方法

    一 函数原型及参数 function MessageBox(hWnd: HWND; Text, Caption: PChar; Type: Word): Integer; hWnd:对话框父窗口句柄, ...

  8. poj2186--tarjan+缩点

    题目大意:       每一头牛的愿望就是变成一头最受欢迎的牛.现在有N头牛,给你M对整数(A,B),表示牛A认为牛B受欢迎. 这 种关系是具有传递性的,如果A认为B受欢迎,B认为C受欢迎,那么牛A也 ...

  9. jquery css事件编程 位置 操作

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...

  10. luogg_java学习_09_泛型_集合

    这篇博客总结了半天,希望自己以后返回来看的时候理解更深刻,也希望可以起到帮助初学者的作用. 转载请注明 出自 : luogg的博客园 , 泛型 泛型介绍 1).类内部的属性的类型可以由外部决定: 2) ...