案例环境

操作系统 :Oracle Linux Server release 5.7 64bit 虚拟机

硬件配置 : 物理机型号为DELL R720

资源配置 :RAM 8G Intel(R) Xeon(R) CPU E5-2690 8核

案例描述

早晨发现桂林那边一台Linux服务器(虚拟机)网络无法ping通,于是联系那边的系统管理员通过Lync共享桌面给我,通过他的电脑VMware vSphere Client登录后,发现在控制台亦无响应。无法登录、无法操作,输入操作无响应。也就是说系统宕机了。没有办法,只能在虚拟机“电源”选项,通过“关闭电源”、“打开电源”选项重启Linux服务器,然后重启了Tomcat服务和Oracle数据库服务。检查了Oracle数据库的告警日志,没有发现任何错误。我的领导通过分析Linux系统日志后,发现在8月1号晚上2:22左右出现,出现内存不足,Linux出于保护机制,把一些无关紧要的进程杀掉。具体错误信息如下所示(服务器名称做了下混淆)

Aug  1 01:36:09 G*******LNX01 ntpd[3555]: kernel time sync enabled 4001

Aug  1 01:53:13 G*******LNX01 ntpd[3555]: kernel time sync enabled 0001

Aug  1 02:22:36 G*******LNX01 kernel: hald invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0

Aug  1 02:22:37 G*******LNX01 kernel: hald cpuset=/ mems_allowed=0

Aug  1 02:22:37 G*******LNX01 kernel: Pid: 3408, comm: hald Not tainted 2.6.32-200.13.1.el5uek #1

Aug  1 02:22:37 G*******LNX01 kernel: Call Trace:

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810a0b66>] ? cpuset_print_task_mems_allowed+0x92/0x9e

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810d9fbc>] ? select_bad_process+0xbc/0x102

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810da03f>] __out_of_memory+0x3d/0x86

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810da30f>] out_of_memory+0xfc/0x195

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810dd75e>] __alloc_pages_nodemask+0x487/0x595

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff811075ac>] alloc_page_vma+0xb9/0xc8

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810ff0a7>] read_swap_cache_async+0x52/0xf1

Aug  1 02:22:37 G*******LNX01 kernel:  [<ffffffff810ff1a3>] swapin_readahead+0x5d/0x9c

Aug  1 02:22:38 G*******LNX01 kernel:  [<ffffffff810d725a>] ? find_get_page+0x22/0x69

Aug  1 02:22:38 G*******LNX01 kernel:  [<ffffffff810f1ea3>] handle_mm_fault+0x44b/0x80f

Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81043696>] ? should_resched+0xe/0x2f

Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81456006>] do_page_fault+0x210/0x299

Aug  1 02:22:40 G*******LNX01 kernel:  [<ffffffff81453fd5>] page_fault+0x25/0x30

Aug  1 02:22:40 G*******LNX01 kernel: Mem-Info:

Aug  1 02:22:43 G*******LNX01 kernel: Node 0 DMA per-cpu:

Aug  1 02:22:44 G*******LNX01 kernel: CPU    0: hi:    0, btch:   1 usd:   0

Aug  1 02:22:44 G*******LNX01 kernel: CPU    1: hi:    0, btch:   1 usd:   0

Aug  1 08:51:04 G*******LNX01 syslogd 1.4.1: restart.

Aug  1 08:51:04 G*******LNX01 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuset

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpu

OOM Killer,说白了 OOM Killer 就是一层保护机制,用于避免在内存不足的时候不至于出现严重问题,把一些无关的进程优先杀掉,即在内存严重不足时,系统为了继续运转,内核会挑选一个进程,将其杀掉,以释放内存,缓解内存不足情况,不过这种保护是有限的,不能完全的保护进程的运行。

但是这个时间点是发生在凌晨2点多,于是我继续检查/var/log/messages日志信息,发现系统启动时出现了

“Phoenix BIOS detected: BIOS may corrupt low RAM, working around it”错误。于是google搜索这个错误信息。

Aug  1 08:51:04 G*******LNX01 syslogd 1.4.1: restart.

Aug  1 08:51:04 G*******LNX01 kernel: klogd 1.4.1, log source = /proc/kmsg started.

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuset

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpu

Aug  1 08:51:04 G*******LNX01 kernel: Linux version 2.6.32-200.13.1.el5uek (mockbuild@ca-build9.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Wed Jul 27 2

1:02:33 EDT 2011

Aug  1 08:51:04 G*******LNX01 kernel: Command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet

Aug  1 08:51:04 G*******LNX01 kernel: KERNEL supported cpus:

Aug  1 08:51:04 G*******LNX01 kernel:   Intel GenuineIntel

Aug  1 08:51:04 G*******LNX01 kernel:   AMD AuthenticAMD

Aug  1 08:51:04 G*******LNX01 kernel:   Centaur CentaurHauls

Aug  1 08:51:04 G*******LNX01 kernel: BIOS-provided physical RAM map:

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000000000000 - 000000000009f800 (usable)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000000100000 - 00000000bfee0000 (usable)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bfee0000 - 00000000bfeff000 (ACPI data)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bfeff000 - 00000000bff00000 (ACPI NVS)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000bff00000 - 00000000c0000000 (usable)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)

Aug  1 08:51:04 G*******LNX01 kernel:  BIOS-e820: 0000000100000000 - 0000000240000000 (usable)

Aug  1 08:51:04 G*******LNX01 kernel: DMI present.

Aug  1 08:51:04 G*******LNX01 kernel: Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.

Aug  1 08:51:04 G*******LNX01 kernel: last_pfn = 0x240000 max_arch_pfn = 0x400000000

Aug  1 08:51:04 G*******LNX01 kernel: x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106

Aug  1 08:51:04 G*******LNX01 kernel: total RAM covered: 8192M

Aug  1 08:51:04 G*******LNX01 kernel: Found optimal setting for mtrr clean up

Aug  1 08:51:04 G*******LNX01 kernel:  gran_size: 64K  chunk_size: 64K         num_reg: 4      lose cover RAM: 0G

Aug  1 08:51:04 G*******LNX01 kernel: x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106

Aug  1 08:51:04 G*******LNX01 kernel: last_pfn = 0xc0000 max_arch_pfn = 0x400000000

Aug  1 08:51:04 G*******LNX01 kernel: init_memory_mapping: 0000000000000000-00000000c0000000

Aug  1 08:51:04 G*******LNX01 kernel: init_memory_mapping: 0000000100000000-0000000240000000

Aug  1 08:51:04 G*******LNX01 kernel: RAMDISK: 37c4d000 - 37fef894

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: RSDP 00000000000f6940 00024 (v02 PTLTD )

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: XSDT 00000000bfeefddc 0005C (v01 INTEL  440BX    06040000 VMW  01324272)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: FACP 00000000bfefee98 000F4 (v04 INTEL  440BX    06040000 PTL  000F4240)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: DSDT 00000000bfef0230 0EC68 (v01 PTLTD  Custom   06040000 MSFT 03000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: FACS 00000000bfefffc0 00040

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: BOOT 00000000bfef0208 00028 (v01 PTLTD  $SBFTBL$ 06040000  LTP 00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: APIC 00000000bfef0156 000B2 (v01 PTLTD  ? APIC   06040000  LTP 00000000)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: MCFG 00000000bfef011a 0003C (v01 PTLTD  $PCITBL$ 06040000  LTP 00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: SRAT 00000000bfeefed8 00128 (v02 VMWARE MEMPLUG  06040000 VMW  00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: HPET 00000000bfeefea0 00038 (v01 VMWARE VMW HPET 06040000 VMW  00000001)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: WAET 00000000bfeefe78 00028 (v01 VMWARE VMW WAET 06040000 VMW  00000001)

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 0 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 1 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 2 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 3 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 4 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 5 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 6 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: PXM 0 -> APIC 7 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 0-a0000

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 100000-c0000000

Aug  1 08:51:04 G*******LNX01 kernel: SRAT: Node 0 PXM 0 100000000-240000000

Aug  1 08:51:04 G*******LNX01 kernel: Bootmem setup node 0 0000000000000000-0000000240000000

Aug  1 08:51:04 G*******LNX01 kernel:   NODE_DATA [000000000001b840 - 000000000003183f]

Aug  1 08:51:04 G*******LNX01 kernel:   bootmap [0000000000032000 -  0000000000079fff] pages 48

Aug  1 08:51:04 G*******LNX01 kernel: (9 early reservations) ==> bootmem [0000000000 - 0240000000]

Aug  1 08:51:04 G*******LNX01 kernel:   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]

Aug  1 08:51:04 G*******LNX01 kernel:   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]

Aug  1 08:51:04 G*******LNX01 kernel:   #2 [0001000000 - 000224b73c]    TEXT DATA BSS ==> [0001000000 - 000224b73c]

Aug  1 08:51:04 G*******LNX01 kernel:   #3 [0037c4d000 - 0037fef894]          RAMDISK ==> [0037c4d000 - 0037fef894]

Aug  1 08:51:04 G*******LNX01 kernel:   #4 [000009f800 - 0000100000]    BIOS reserved ==> [000009f800 - 0000100000]

Aug  1 08:51:04 G*******LNX01 kernel:   #5 [000224c000 - 000224c1e8]              BRK ==> [000224c000 - 000224c1e8]

Aug  1 08:51:04 G*******LNX01 kernel:   #6 [0000010000 - 0000012000]          PGTABLE ==> [0000010000 - 0000012000]

Aug  1 08:51:04 G*******LNX01 kernel:   #7 [0000012000 - 0000017000]          PGTABLE ==> [0000012000 - 0000017000]

Aug  1 08:51:04 G*******LNX01 kernel:   #8 [0000017000 - 000001b840]       MEMNODEMAP ==> [0000017000 - 000001b840]

Aug  1 08:51:04 G*******LNX01 kernel: found SMP MP-table at [ffff8800000f69b0] f69b0

Aug  1 08:51:04 G*******LNX01 kernel: Zone PFN ranges:

Aug  1 08:51:04 G*******LNX01 kernel:   DMA      0x00000010 -> 0x00001000

Aug  1 08:51:04 G*******LNX01 kernel:   DMA32    0x00001000 -> 0x00100000

Aug  1 08:51:04 G*******LNX01 kernel:   Normal   0x00100000 -> 0x00240000

Aug  1 08:51:04 G*******LNX01 kernel: Movable zone start PFN for each node

Aug  1 08:51:04 G*******LNX01 kernel: early_node_map[4] active PFN ranges

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00000010 -> 0x0000009f

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00000100 -> 0x000bfee0

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x000bff00 -> 0x000c0000

Aug  1 08:51:04 G*******LNX01 kernel:     0: 0x00100000 -> 0x00240000

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: PM-Timer IO Port: 0x1008

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])

Aug  1 08:51:04 G*******LNX01 kernel: IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)

Aug  1 08:51:04 G*******LNX01 kernel: Using ACPI (MADT) for SMP configuration information

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: HPET id: 0x8086af01 base: 0xfed00000

Aug  1 08:51:04 G*******LNX01 kernel: SMP: Allowing 8 CPUs, 0 hotplug CPUs

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 000000000009f000 - 00000000000a0000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000a0000 - 00000000000ca000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000ca000 - 00000000000cc000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000cc000 - 00000000000dc000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000000dc000 - 0000000000100000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000bfee0000 - 00000000bfeff000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000bfeff000 - 00000000bff00000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000c0000000 - 00000000e0000000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fee01000 - 00000000fffe0000

Aug  1 08:51:04 G*******LNX01 kernel: PM: Registered nosave memory: 00000000fffe0000 - 0000000100000000

Aug  1 08:51:04 G*******LNX01 kernel: Allocating PCI resources starting at c0000000 (gap: c0000000:20000000)

Aug  1 08:51:04 G*******LNX01 kernel: Booting paravirtualized kernel on bare hardware

Aug  1 08:51:04 G*******LNX01 kernel: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:8 nr_node_ids:1

Aug  1 08:51:04 G*******LNX01 kernel: PERCPU: Embedded 29 pages/cpu @ffff880028200000 s88280 r8192 d22312 u262144

Aug  1 08:51:04 G*******LNX01 kernel: pcpu-alloc: s88280 r8192 d22312 u262144 alloc=1*2097152

Aug  1 08:51:04 G*******LNX01 kernel: pcpu-alloc: [0] 0 1 2 3 4 5 6 7 

Aug  1 08:51:04 G*******LNX01 kernel: Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2064641

Aug  1 08:51:04 G*******LNX01 kernel: Policy zone: Normal

Aug  1 08:51:04 G*******LNX01 kernel: Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet

Aug  1 08:51:04 G*******LNX01 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes)

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#0

Aug  1 08:51:04 G*******LNX01 kernel: xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340

Aug  1 08:51:04 G*******LNX01 kernel: Checking aperture...

Aug  1 08:51:04 G*******LNX01 kernel: No AGP bridge found

Aug  1 08:51:04 G*******LNX01 kernel: PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

Aug  1 08:51:04 G*******LNX01 kernel: Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000

Aug  1 08:51:04 G*******LNX01 kernel: software IO TLB at phys 0x20000000 - 0x24000000

Aug  1 08:51:04 G*******LNX01 kernel: Memory: 8183576k/9437184k available (4454k kernel code, 1049156k absent, 204452k reserved, 7191k data, 1720k init)

Aug  1 08:51:04 G*******LNX01 kernel: Hierarchical RCU implementation.

Aug  1 08:51:04 G*******LNX01 kernel: NR_IRQS:4352 nr_irqs:472

Aug  1 08:51:04 G*******LNX01 kernel: Extended CMOS year: 2000

Aug  1 08:51:04 G*******LNX01 kernel: Console: colour VGA+ 80x25

Aug  1 08:51:04 G*******LNX01 kernel: console [tty0] enabled

Aug  1 08:51:04 G*******LNX01 kernel: allocated 83886080 bytes of page_cgroup

Aug  1 08:51:04 G*******LNX01 kernel: please try 'cgroup_disable=memory' option if you don't want memory cgroups

Aug  1 08:51:04 G*******LNX01 kernel: TSC freq read from hypervisor : 2900.001 MHz

Aug  1 08:51:04 G*******LNX01 kernel: Detected 2900.001 MHz processor.

Aug  1 08:51:04 G*******LNX01 kernel: Calibrating delay loop (skipped) preset value.. 5800.00 BogoMIPS (lpj=2900001)

Aug  1 08:51:04 G*******LNX01 kernel: Security Framework initialized

Aug  1 08:51:04 G*******LNX01 kernel: SELinux:  Initializing.

Aug  1 08:51:04 G*******LNX01 kernel: Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)

Aug  1 08:51:04 G*******LNX01 kernel: Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)

Aug  1 08:51:04 G*******LNX01 kernel: Mount-cache hash table entries: 256

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys ns

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys cpuacct

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys memory

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys devices

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys freezer

Aug  1 08:51:04 G*******LNX01 kernel: Initializing cgroup subsys net_cls

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 0/0x0 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: Performance Events: Nehalem/Corei7 events, Intel PMU driver.

Aug  1 08:51:04 G*******LNX01 kernel: ... version:                3

Aug  1 08:51:04 G*******LNX01 kernel: ... bit width:              48

Aug  1 08:51:04 G*******LNX01 kernel: ... generic registers:      4

Aug  1 08:51:04 G*******LNX01 kernel: ... value mask:             0000ffffffffffff

Aug  1 08:51:04 G*******LNX01 kernel: ... max period:             000000007fffffff

Aug  1 08:51:04 G*******LNX01 kernel: ... fixed-purpose events:   3

Aug  1 08:51:04 G*******LNX01 kernel: ... event mask:             000000070000000f

Aug  1 08:51:04 G*******LNX01 kernel: ACPI: Core revision 20090903

Aug  1 08:51:04 G*******LNX01 kernel: ftrace: converting mcount calls to 0f 1f 44 00 00

Aug  1 08:51:04 G*******LNX01 kernel: ftrace: allocating 26665 entries in 105 pages

Aug  1 08:51:04 G*******LNX01 kernel: Setting APIC routing to flat

Aug  1 08:51:04 G*******LNX01 kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1

Aug  1 08:51:04 G*******LNX01 kernel: CPU0: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 1 APIC 0x1 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 1/0x1 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU1: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Skipping synchronization checks as TSC is reliable.

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 2 APIC 0x2 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#2

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 2/0x2 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU2: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 3 APIC 0x3 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#3

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 1

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 3/0x3 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU3: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

Aug  1 08:51:04 G*******LNX01 kernel: Booting processor 4 APIC 0x4 ip 0x6000

Aug  1 08:51:04 G*******LNX01 kernel: Initializing CPU#4

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Physical Processor ID: 2

Aug  1 08:51:04 G*******LNX01 kernel: CPU: Processor Core ID: 0

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L1 I cache: 32K, L1 D cache: 32K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L2 cache: 256K

Aug  1 08:51:04 G*******LNX01 kernel: CPU: L3 cache: 20480K

Aug  1 08:51:04 G*******LNX01 kernel: CPU 4/0x4 -> Node 0

Aug  1 08:51:04 G*******LNX01 kernel: mce: CPU supports 0 MCE banks

Aug  1 08:51:04 G*******LNX01 kernel: CPU4: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz stepping 07

--More--(17%)

搜索了很多资料,等我还在茫茫大海中苦苦查证时,又连接不上该服务器,该服务器又宕机了。由于没有权限,Lync共享桌面又奇慢无比。求助于我们这边的系统管理员(拥有那边的虚拟机服务器权限),他用vSphere Client连接上去检查时,终于发现一个让人既郁闷又震惊的原因。

这台物理机DELL R720只有32G内存,上面有一个Linux系统、几个Windows系统。但是有于那边管理员添加了测试服务器,所有系统总共分配的内存加起来43G已经超过了原来物理机32G内存,导致系统间资源争用。出现内存资源不足的情况。最后Linux直接宕机的情况。

出现这种情况,一来是由于管理员疏忽,没有注意到实际内存资源分配情况。二来我不知情,信息不足(由于权限问题,我并不了解那边物理机与虚拟机的情况),一直运行的好好的系统,突然出现这个问题,导致我局限在数据库、应用程序、操作系统层面去查找原因。而没有纵览全局,从架构、资源层面去查找问题。导致一直没有查找到根本原因之所在。

解决方案

关闭测试服务器,释放出足够的内存资源。问题解决。然后系统从8月1号运行到现在再也没有出现过这个问题。

参考资料:

http://dbanotes.net/database/linux_outofmemory_oom_killer.html

http://www.huomo.cn/os/article-16bb4.html

Linux服务器宕机案例一则的更多相关文章

  1. Linux服务器宕机案例第二则

    邮件告警发现海外工厂一Linux服务器连接不上,DPA(Database Performance Analyzer)系统也发现其出现问题,ping这台服务器发现网络不通,联系不到当地系统管理员,邮件咨 ...

  2. linux服务器宕机分析/性能瓶颈分析

    linux服务器宕机分析/性能瓶颈分析   服务器宕机原因很多,资源不足.应用.硬件.系统内核bug等,以下一个小例子 服务器宕机了,首先得知道服务器宕机的时间点,然后分析日志查找原因 1.last ...

  3. ORA-04031错误导致宕机案例分析

    今天遇到一起ORACLE数据库宕机案例,下面是对这起数据库宕机案例的原因进行分析.解读.分析过程中顺便记录一下这个案例的前因后果,攒点经验值,培养一下分析.解决问题的能力. 案例环境:   操作系统 ...

  4. 远离服务器宕机,腾讯WeTest正式推出服务器深度性能测试服务

    WeTest 导读 随着城市发展趋向智慧化,不仅移动互联网应用正迅速融入出行.金融.医疗.娱乐等传统行业,跟随移动互联网成长起来的,还有用户对应用使用与消费的理性意识. 而在用户不断增加的同时,如何避 ...

  5. 服务器宕机,mysql无法启动,job for mysql.service failed because the process exited with error code,数据库备份与恢复

    [问题现象] 服务器在运行过程中,因人为意外导致电源被拔,服务器宕机,mysql重启不成功,报错如下 根据提示,输入systemctl status mysql.service和journalctl ...

  6. PHP载入GIF图像造成服务器宕机(CVE-2018-5711)的漏洞复现

    参考链接: http://www.freebuf.com/vuls/161262.html 今日看新漏洞发现一个UC编辑部的标题,CVE-2018-5711:一张GIF图片就能让服务器宕机的PHP漏洞 ...

  7. nginx解决服务器宕机、解决跨域问题、配置防盗链、防止DDOS流量攻击

    解决服务器宕机 配置nginx.cfg配置文件,在映射拦截地址中加入代理地址响应方案 location / { proxy_connect_timeout 1; proxy_send_timeout ...

  8. Nginx配置服务器宕机策略

    Nginx解决服务器宕机问题,Nginx配置服务器宕机策略,如果服务器宕机,会找下一台机器进行访问        配置nginx.cfg配置文件,在映射拦截地址中加入代理地址响应方案 location ...

  9. Nginx解决服务器宕机问题

    1.Nginx解决服务器宕机问题,Nginx配置服务器宕机策略,如果服务器宕机,会找下一台机器进行访问 配置nginx.cfg配置文件,在映射拦截地址中加入代理地址响应方案 location / { ...

随机推荐

  1. [c++] Class

    也是醉了,一个.h文件就有这么多细节问题: 初始化列表,使用{} 也可以. 类中的引用和const变量,必须立即在初始化列表中提前初始化. 常成员函数,const 放在函数后, 常成员函数即不能改变成 ...

  2. Java类初始化

    Java类初始化 成员变量的初始化和构造器 如果类的成员变量在定义时没有进行显示的初始化赋值,Java会给每个成员变量一个默认值 对于  char.short.byte.int.long.float. ...

  3. WebSocket 介绍(二)-WebSocket API

    这一章介绍如何用WebSocket API来控制协议和创建应用,运用http://websocket.org 提供的现有WebSocket服务器,我们可以收发消息.创建一些简单的WebSocket应用 ...

  4. 解决java代码测试http协议505错误

    代码功能:通过java代码获取网页源代码: 所用工具:Myclipse8.5+tomcat6.0+浏览器 系统环境:windows xp旗舰版 火狐浏览器版本: IE浏览器版本: 测试http协议有错 ...

  5. 7.4 数据注解属性--Required

    Required attribute can be applied to a property of a domain class. EF Code-First will create a NOT N ...

  6. 9.Configure One-to-One(配置一对一关系)【Code-First系列】

    现在,开始学习怎么配置一对一的关系,众所周知,一对一的关系是:一个表中的主键,在另外一个表中,同时是主键和外键[实际上是一对零或者一对一]. 请注意:一对一的关系,在MS SQL Server中,技术 ...

  7. C#+JQuery+.Ashx+百度Echarts实现全国省市地图和饼状图动态数据图形报表的统计

    在目前的一个项目中,需要用到报表表现数据,这些数据有多个维度,需要同时表现出来,同时可能会有大量数据呈现的需求,经过几轮挑选,最终选择了百度的echarts作为报表基础类库.echarts功能强大,界 ...

  8. Windows 10 密钥分享

    Windows 10 Technical Preview for Enterprise:KEY:PBHCJ-Q2NYD-2PX34-T2TD6-233PKhttp://technet.microsof ...

  9. ASP.NET Core 开发-中间件(Middleware)

    ASP.NET Core开发,开发并使用中间件(Middleware). 中间件是被组装成一个应用程序管道来处理请求和响应的软件组件. 每个组件选择是否传递给管道中的下一个组件的请求,并能之前和下一组 ...

  10. C# 项目提交过程中感受

    C# 项目提交过程中感受 新到一家互联网公司,昨天第一次提交代码,遇到了不少问题,而且大多数是代码格式问题,特此将范的错误记录下来,自我警示. 1. 代码对齐,这个虽然一直也都在注意,不过还是有一行代 ...