android_kernel_google_msm/arch/x86
Feng Tang b9909d5051 x86/reboot: Fix a warning message triggered by stop_other_cpus()
commit 55c844a4dd upstream.

When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
always see this warning msg:

Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
Modules linked in: igb [last unloaded: scsi_wait_scan]
Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
Call Trace:
 <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
 [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
 [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
 [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
 [<ffffffff81036768>] update_process_times+0x60/0x70
 [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
 [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
 [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
 [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
 [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
 [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
 <EOI>  [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
 [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
 [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
 [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
 [<ffffffff81018926>] machine_shutdown+0xa/0xc
 [<ffffffff8101897e>] native_machine_restart+0x20/0x32
 [<ffffffff810189b0>] machine_restart+0xa/0xc
 [<ffffffff8103b196>] kernel_restart+0x47/0x4c
 [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
 [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
 [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
 [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
 [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
 [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
---[ end trace 320af5cb1cb60c5b ]---

The root cause seems to be the
default_send_IPI_mask_allbutself_phys() takes quite some time (I
measured it could be several ms) to complete sending NMIs to all
the other 23 CPUs, and for HZ=250/1000 system, the time is long
enough for a timer interrupt to happen, which will in turn
trigger to kick load balance to a stopped CPU and cause this
warning in native_smp_send_reschedule().

So disabling the local irq before stop_other_cpu() can fix this
problem (tested 25 times reboot ok), and it is fine as there
should be nobody caring the timer interrupt in such reboot
stage.

The latest 3.4 kernel slightly changes this behavior by sending
REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
fails, and this patch is still needed to prevent the problem.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Vinson Lee <vlee@twopensource.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19 11:40:31 +08:00
..
boot x86, build: Pass in additional -mno-mmx, -mno-sse options 2014-06-07 16:02:08 -07:00
configs x86/kconfig: Remove CONFIG_TR=y from the defconfigs 2012-03-24 08:18:03 +01:00
crypto crypto: aesni - fix memory usage in GCM decryption 2015-06-19 11:40:28 +08:00
ia32 x86-64: Replace left over sti/cli in ia32 audit exit code 2013-02-11 08:47:18 -08:00
include/asm x86, cpu, amd: Add workaround for family 16h, erratum 793 2015-04-14 17:34:03 +08:00
kernel x86/reboot: Fix a warning message triggered by stop_other_cpus() 2015-06-19 11:40:31 +08:00
kvm KVM: emulate: fix CMPXCHG8B on 32-bit hosts 2015-06-19 11:40:19 +08:00
lguest x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal 2013-04-16 21:27:27 -07:00
lib x86-64: Fix the failure case in copy_user_handle_tail() 2013-03-28 12:12:26 -07:00
math-emu x86: Rename trap_no to trap_nr in thread_struct 2012-03-13 06:24:09 +01:00
mm x86, mm/ASLR: Fix stack randomization on 64-bit systems 2015-04-14 17:33:58 +08:00
net x86: bpf_jit: support negative offsets 2014-03-30 21:40:30 -07:00
oprofile oprofile, x86: Fix wrapping bug in op_x86_get_ctrl() 2012-10-28 10:14:13 -07:00
pci xen/pci: We don't do multiple MSI's. 2013-03-14 11:29:41 -07:00
platform x86/efi: Fix dummy variable buffer allocation 2014-06-07 16:02:10 -07:00
power perf,x86: fix kernel crash with PEBS/BTS after suspend/resume 2013-03-20 13:04:59 -07:00
syscalls x86, x32: Use compat shims for io_{setup,submit} 2014-06-30 20:01:33 -07:00
tools x86, relocs: Add jiffies and jiffies_64 to the relative whitelist 2012-06-01 15:18:26 +08:00
um x86, um: actually mark system call tables readonly 2015-04-14 17:33:49 +08:00
vdso x86/vdso: Fix the build on GCC5 2015-06-19 11:40:25 +08:00
video
xen xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline 2014-03-11 16:10:06 -07:00
.gitignore
Kbuild
Kconfig x86, espfix: Make it possible to disable 16-bit support 2014-08-07 12:00:11 -07:00
Kconfig.cpu x86: Tighten dependencies of CPU_SUP_*_32 2012-03-08 10:57:34 +01:00
Kconfig.debug
Makefile kbuild: Fix gcc -x syntax 2012-10-13 05:38:37 +09:00
Makefile.um um: fix linker script generation 2012-04-09 13:59:00 -04:00
Makefile_32.cpu