android_kernel_google_msm/kernel
Ben Greear b674b0adae Fix lockup related to stop_machine being stuck in __do_softirq.
commit 34376a50fb upstream.

The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.

To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.

Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.

This was introduced in 3.9 by commit c10d73671a ("softirq: reduce
latencies").

It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.

The hang stack traces look something like this:

    ------------[ cut here ]------------
    WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
    Watchdog detected hard LOCKUP on cpu 2
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
    Call Trace:
     <NMI>   warn_slowpath_common+0x85/0x9f
      warn_slowpath_fmt+0x46/0x48
      watchdog_overflow_callback+0x9c/0xa7
      __perf_event_overflow+0x137/0x1cb
      perf_event_overflow+0x14/0x16
      intel_pmu_handle_irq+0x2dc/0x359
      perf_event_nmi_handler+0x19/0x1b
      nmi_handle+0x7f/0xc2
      do_nmi+0xbc/0x304
      end_repeat_nmi+0x1e/0x2e
     <<EOE>>
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0
    ---[ end trace 4947dfa9b0a4cec3 ]---
    BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    irq event stamp: 835637905
    hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
    hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
    softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
    softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
    CPU 1
    Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: tasklet_hi_action+0xf0/0xf0
    Process migration/1
    Call Trace:
     <IRQ>
      __do_softirq+0x117/0x257
      irq_exit+0x5f/0xbb
      smp_apic_timer_interrupt+0x8a/0x98
      apic_timer_interrupt+0x72/0x80
     <EOI>
      printk+0x4d/0x4f
      stop_machine_cpu_stop+0x22c/0x274
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0

Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Pekka Riikonen <priikone@iki.fi>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[xr: Backported to 3.4: Adjust context]
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19 11:40:32 +08:00
..
debug kdb: fix incorrect counts in KDB summary command output 2015-06-19 11:40:17 +08:00
events perf: Fix irq_work 'tail' recursion 2015-06-19 11:40:28 +08:00
gcov
irq genirq: Prevent proc race against freeing of irq descriptors 2015-04-14 17:33:46 +08:00
power PM / Sleep: fix recovery during resuming from hibernation 2015-02-02 17:05:05 +08:00
sched sched: Fix RLIMIT_RTTIME when PI-boosting to RT 2015-06-19 11:40:29 +08:00
time ntp: Fixup adjtimex freq validation on 32-bit systems 2015-04-14 17:34:04 +08:00
trace ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled 2015-06-19 11:40:23 +08:00
.gitignore
acct.c
async.c Fix a dead loop in async_synchronize_full() 2012-10-02 10:30:35 -07:00
audit.c audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE 2014-04-14 06:44:15 -07:00
audit.h
audit_tree.c audit: keep inode pinned 2015-02-02 17:05:19 +08:00
audit_watch.c
auditfilter.c
auditsc.c auditsc: audit_krule mask accesses need bounds checking 2014-06-16 13:45:46 -07:00
backtracetest.c
bounds.c
capability.c
cgroup.c move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
cgroup_freezer.c cgroup: cgroup_subsys->fork() should be called after the task is added to css_set 2014-03-11 16:10:03 -07:00
compat.c compat: Fix RT signal mask corruption via sigprocmask 2012-05-10 08:58:33 -07:00
configs.c
cpu.c sched: Fix hotplug vs. set_cpus_allowed_ptr() 2014-06-11 12:04:11 -07:00
cpu_pm.c
cpuset.c cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags 2014-12-01 18:02:38 +08:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c introduce for_each_thread() to replace the buggy while_each_thread() 2015-02-02 17:04:55 +08:00
extable.c
fork.c introduce for_each_thread() to replace the buggy while_each_thread() 2015-02-02 17:04:55 +08:00
freezer.c freezer: Do not freeze tasks killed by OOM killer 2015-02-02 17:04:54 +08:00
futex.c futex: Fix a race condition between REQUEUE_PI and task death 2015-02-02 17:05:05 +08:00
futex_compat.c futex: Revert "futex: Mark get_robust_list as deprecated" 2013-02-28 06:59:01 -08:00
groups.c
hrtimer.c hrtimer: Set expiry time before switch_hrtimer_base() 2014-06-07 16:02:01 -07:00
hung_task.c
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c
jump_label.c
kallsyms.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c
kfifo.c
kmod.c usermodehelper: check subprocess_info->path != NULL 2013-05-19 10:54:50 -07:00
kprobes.c
ksysfs.c
kthread.c kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed 2012-10-02 10:30:40 -07:00
latencytop.c
lockdep.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
Makefile
module.c ftrace/module: Hardcode ftrace_module_init() call into load_module() 2014-06-07 16:02:00 -07:00
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c
padata.c
panic.c panic: fix a possible deadlock in panic() 2013-04-12 09:38:47 -07:00
params.c
pid.c
pid_namespace.c
posix-cpu-timers.c posix-cpu-timers: Fix nanosleep task_struct leak 2013-02-28 06:58:59 -08:00
posix-timers.c posix-timers: Fix stack info leak in timer_create() 2015-02-02 17:05:05 +08:00
printk.c console: Fix console name size mismatch 2015-06-19 11:40:22 +08:00
profile.c
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-29 10:50:34 -08:00
range.c
rcu.h
rcupdate.c
rcutiny.c
rcutiny_plugin.h
rcutorture.c
rcutree.c rcu: Fix batch-limit size problem 2012-12-17 10:37:46 -08:00
rcutree.h
rcutree_plugin.h
rcutree_trace.c
relay.c splice: fix racy pipe->buffers uses 2012-07-16 09:04:42 -07:00
res_counter.c
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2013-02-14 10:48:53 -08:00
rtmutex-debug.c
rtmutex-debug.h rtmutex: Handle deadlock detection smarter 2014-07-17 15:39:50 -07:00
rtmutex-tester.c
rtmutex.c rtmutex: Plug slow unlock race 2014-07-17 15:39:50 -07:00
rtmutex.h rtmutex: Handle deadlock detection smarter 2014-07-17 15:39:50 -07:00
rtmutex_common.h
rwsem.c
seccomp.c
semaphore.c
signal.c kernel/signal.c: stop info leak via the tkill and the tgkill syscalls 2013-04-25 21:19:54 -07:00
smp.c smp: Fix SMP function call empty cpu mask race 2013-02-03 18:24:42 -06:00
softirq.c Fix lockup related to stop_machine being stuck in __do_softirq. 2015-06-19 11:40:32 +08:00
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys.c reboot: rigrate shutdown/reboot to boot cpu 2013-06-20 11:58:45 -07:00
sys_ni.c
sysctl.c hung_task: check the value of "sysctl_hung_task_timeout_sec" 2014-05-06 07:51:45 -07:00
sysctl_binary.c sysctl: fix null checking in bin_dn_node_address() 2013-03-04 06:06:41 +08:00
taskstats.c
test_kprobes.c
time.c time: settimeofday: Validate the values of tv from user 2015-04-14 17:33:50 +08:00
timeconst.pl timeconst.pl: Eliminate Perl warning 2013-02-28 06:58:58 -08:00
timer.c timer: Prevent overflow in apply_slack 2014-06-07 16:02:00 -07:00
tracepoint.c tracepoint: Do not waste memory on mods with no tracepoints 2014-05-18 05:25:56 -07:00
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
wait.c
watchdog.c watchdog: using u64 in get_sample_period() 2012-12-03 11:47:17 -08:00
workqueue.c workqueue: cond_resched() after processing each work item 2014-04-14 06:44:16 -07:00
workqueue_sched.h