android_kernel_samsung_msm8976

Commit Graph

Author	SHA1	Message	Date
Viresh Kumar	407b158b99	workqueue: Add system wide power_efficient workqueues This patch adds system wide workqueues aligned towards power saving. This is done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to 'true'. tj: updated comments a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-07-27 22:11:01 +02:00
Viresh Kumar	df8a9d39e7	workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues Workqueues can be performance or power-oriented. Currently, most workqueues are bound to the CPU they were created on. This gives good performance (due to cache effects) at the cost of potentially waking up otherwise idle cores (Idle from scheduler's perspective. Which may or may not be physically idle) just to process some work. To save power, we can allow the work to be rescheduled on a core that is already awake. Workqueues created with the WQ_UNBOUND flag will allow some power savings. However, we don't change the default behaviour of the system. To enable power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to be turned on. This option can also be overridden by the workqueue.power_efficient boot parameter. tj: Updated config description and comments. Renamed CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-07-27 22:11:01 +02:00
Tejun Heo	80677e6dda	workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq [ Upstream commit 637fdbae60d6cb9f6e963c1079d7e0445c86ff7d ] If queue_delayed_work() gets called with NULL @wq, the kernel will oops asynchronuosly on timer expiration which isn't too helpful in tracking down the offender. This actually happened with smc. __queue_delayed_work() already does several input sanity checks synchronously. Add NULL @wq check. Reported-by: Dave Jones <davej@codemonkey.org.uk> Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-07-27 21:45:23 +02:00
Tejun Heo	ff318b068c	workqueue: implicit ordered attribute should be overridable commit 0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream. 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automatically enabled ordered attribute for unbound workqueues w/ max_active == 1. Because ordered workqueues reject max_active and some attribute changes, this implicit ordered mode broke cases where the user creates an unbound workqueue w/ max_active == 1 and later explicitly changes the related attributes. This patch distinguishes explicit and implicit ordered setting and overrides from attribute changes if implict. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") Cc: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2019-07-27 21:44:39 +02:00
Tejun Heo	d5551f602f	workqueue: restore WQ_UNBOUND/max_active==1 to be ordered commit 5c0338c68706be53b3dc472e4308961c36e4ece1 upstream. The combination of WQ_UNBOUND and max_active == 1 used to imply ordered execution. After NUMA affinity `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues"), this is no longer true due to per-node worker pools. While the right way to create an ordered workqueue is alloc_ordered_workqueue(), the documentation has been misleading for a long time and people do use WQ_UNBOUND and max_active == 1 for ordered workqueues which can lead to subtle bugs which are very difficult to trigger. It's unlikely that we'd see noticeable performance impact by enforcing ordering on WQ_UNBOUND / max_active == 1 workqueues. Let's automatically set __WQ_ORDERED for those workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: Alexei Potashnik <alexei@purestorage.com> Fixes: `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Willy Tarreau <w@1wt.eu>	2019-07-27 21:44:15 +02:00
Kees Cook	9232b43518	time: Remove CONFIG_TIMER_STATS Currently CONFIG_TIMER_STATS exposes process information across namespaces: kernel/time/timer_list.c print_timer(): SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); /proc/timer_list: #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570 Given that the tracer can give the same information, this patch entirely removes CONFIG_TIMER_STATS. Change-Id: Ice26d74094d3ad563808342c1604ad444234844b Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: linux-doc@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Xing Gao <xgao01@email.wm.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jessica Frazelle <me@jessfraz.com> Cc: kernel-hardening@lists.openwall.com Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Olof Johansson <olof@lixom.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-22 23:02:59 +02:00
Luca Stefani	ff1ebfd98d	This is the 3.10.102 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXXS5iAAoJEE44bZycYXAvDj8P/jbhmGAgW6tw2cnS90QIZDqG M/nclEId61jICNvbfP6zsioKeWyrmzr5G7NjqTThsSNhCo/DXs3ddMqLy3pOaFdq mytXtHIUpwZoplEib+ODinW40CMqnu11XSWEcee2nrsPuGNsnc7BY0wmFBa6UVCV rOZef9SN9lJcZSYY/auvgLDXOXdQ+NMxp5hau30aF5HBO8hTDXStjPRcUwCvz7aR govTQJHlS4HzLH3JOYS3Dt8IYFDOrKhQIby2nFdw7eiUxHCRy2F0asabTh3DzCw1 iLvFroozjyVXwozfWMqLCvMa+514MXJy8Nkva6xiAHraC8UrgfPtcNsTdgtkdH9T V2Am9b0L7yiBdG6hsZLxkU3akk7vU/0dtppwzvudANT6i2tGcDSBeaZq3T2pAv7B 7coY53GzHZdQnbdTZbYeS1fxebxyXw50D5OJkF8DyLhoL7Uj2Dvv0QdjKv+U/e5D VQ+ZyGcBdCLuOzflXysI10E01y0/M3FrkubgGBM4Oh0eYKCHJaHG/NCZy5JY/qxy S0phem8RbeZPbcL14z+5buWIi1lUkTiCIMG8c32ZEmDh84drnICqABA0RzKmqdkj ucQa+PzkMQ1DyhAMUl/CwpBfSqf1Zs3agLo78Kp5MTGfeAA90m0SeVqhmDgWhwqG HhSlsPFfMfmJl5S0uJpQ =UhFl -----END PGP SIGNATURE----- Merge tag 'v3.10.102' into HEAD This is the 3.10.102 stable release Change-Id: Ic7d338fb190966b26aa151361fc37414f701d8b2	2017-04-18 17:22:08 +02:00
Luca Stefani	062311b2df	This is the 3.10.99 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJW2MPMAAoJEDjbvchgkmk+xlkP/3pLYC8OxOPz11sBFOWDB6Jn +/La3kW252TMcY7K8Z6R2UJC93HxXaCySAZTrLrwUL6mqpSStiHhX/1HQMI6If4c jMtsbgWpU+HZzprPzY8IK6rdrZJKz+Nxu3LMuV0pYTAFKLnCa4d9bSYZ52UArVnC w13KGpk/gnWTO7A6ZNx4dcRpMqYHWcG+eJsT9zdExmyk65qBCxhhUxXh+DijmSn7 QXrFJ4zjWr1kIdsk6Moat/HCTt/zvwMiWuHdnqYIzUSmvWZWbaQsqGw0cFvKM2hL pOJ3zf3fgUY6fsV0vG+SFdrMmL6RtL/v0c2EGM5ZlYCIPbUZcK+XMlaEqOe6UAHz hITIE+r03l2zqagWVb/2HOen8liHIxnfqUPYgHd6vmXz2qWXg9sWTsOhr3ZAQQLA tf0JDjmx/KCyBmiA7ZyhRLeyhx0jD/csxxo14YME8N3tJCyw5gEIOgXlOLNxhWRu uCqSN27FDnnf6ppbX1euMeWxzqi4DCZFMDJQT743V5sJIz10BsVR9HJS6mwyUioN ia4qVc99JfSEsXuawlZhC44Ht+Z/tTSxQPcZjWMHvftGVfxS9AZVf85BM5zNa91t 52mtJivT25N7JxHE41iEQA9t4V1shCjGmEUKD4cVMKgC18cpXD/awDlJ1Or1YuAO ro6ElZeHj+O3YETFp31/ =GlVi -----END PGP SIGNATURE----- Merge tag 'v3.10.99' into HEAD This is the 3.10.99 stable release Change-Id: I8113e58a5519664be2acc502462633d6d2f9ebf5	2017-04-18 17:17:46 +02:00
Luca Stefani	82b37d9f2f	Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD Change-Id: Ic2fe24529f029909ddd96490bd6d885d60f88be2	2017-04-18 17:02:28 +02:00
LuK1337	fc9499e55a	Import latest Samsung release * Package version: T713XXU2BQCO Change-Id: I293d9e7f2df458c512d59b7a06f8ca6add610c99	2017-04-18 03:43:52 +02:00
Roman Pen	8a872b18df	workqueue: fix ghost PENDING flag while doing MQ IO commit 346c09f80459a3ad97df1816d6d606169a51001a upstream. The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list with the following backtrace: [ 601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds. [ 601.347574] Tainted: G O 4.4.5-1-storage+ #6 [ 601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.348142] kworker/u129:5 D ffff880803077988 0 1636 2 0x00000000 [ 601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server] [ 601.348999] ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000 [ 601.349662] ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0 [ 601.350333] ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38 [ 601.350965] Call Trace: [ 601.351203] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.351444] [<ffffffff815b01d5>] schedule+0x35/0x80 [ 601.351709] [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230 [ 601.351958] [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220 [ 601.352208] [<ffffffff810bd737>] ? ktime_get+0x37/0xa0 [ 601.352446] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.352688] [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110 [ 601.352951] [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 [ 601.353196] [<ffffffff815b093b>] bit_wait_io+0x1b/0x70 [ 601.353440] [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90 [ 601.353689] [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0 [ 601.353958] [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40 [ 601.354200] [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140 [ 601.354441] [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30 [ 601.354688] [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70 [ 601.354932] [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50 [ 601.355193] [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0 [ 601.355432] [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100 [ 601.355679] [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0 [ 601.355925] [<ffffffff81198379>] vfs_write+0xa9/0x1a0 [ 601.356164] [<ffffffff811c59d8>] kernel_write+0x38/0x50 The underlying device is a null_blk, with default parameters: queue_mode = MQ submit_queues = 1 Verification that nullb0 has something inflight: root@pserver8:~# cat /sys/block/nullb0/inflight 0 1 root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \; ... /sys/block/nullb0/mq/0/cpu2/rq_list CTX pending: ffff8838038e2400 ... During debug it became clear that stalled request is always inserted in the rq_list from the following path: save_stack_trace_tsk + 34 blk_mq_insert_requests + 231 blk_mq_flush_plug_list + 281 blk_flush_plug_list + 199 wait_on_page_bit + 192 __filemap_fdatawait_range + 228 filemap_fdatawait_range + 20 filemap_write_and_wait_range + 63 blkdev_fsync + 27 vfs_fsync_range + 73 blkdev_write_iter + 202 __vfs_write + 170 vfs_write + 169 kernel_write + 56 So blk_flush_plug_list() was called with from_schedule == true. If from_schedule is true, that means that finally blk_mq_insert_requests() offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue, i.e. it calls kblockd_schedule_delayed_work_on(). That means, that we race with another CPU, which is about to execute __blk_mq_run_hw_queue() work. Further debugging shows the following traces from different CPUs: CPU#0 CPU#1 ---------------------------------- ------------------------------- reqeust A inserted STORE hctx->ctx_map[0] bit marked kblockd_schedule...() returns 1 <schedule to kblockd workqueue> request B inserted STORE hctx->ctx_map[1] bit marked kblockd_schedule...() returns 0 * WORK PENDING bit is cleared * flush_busy_ctxs() is executed, but bit 1, set by CPU#1, is not observed As a result request B pended forever. This behaviour can be explained by speculative LOAD of hctx->ctx_map on CPU#0, which is reordered with clear of PENDING bit and executed _before_ actual STORE of bit 1 on CPU#1. The proper fix is an explicit full barrier <mfence>, which guarantees that clear of PENDING bit is to be executed before all possible speculative LOADS or STORES inside actual work function. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:51 +02:00
Tejun Heo	49662cfccb	Revert "workqueue: make sure delayed work run in local cpu" commit 041bd12e272c53a35c54c13875839bcb98c999ce upstream. This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76. Workqueue used to implicity guarantee that work items queued without explicit CPU specified are put on the local CPU. Recent changes in timer broke the guarantee and led to vmstat breakage which was fixed by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU we need it to run on"). vmstat is the most likely to expose the issue and it's quite possible that there are other similar problems which are a lot more difficult to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") was applied to restore the local CPU guarnatee. Unfortunately, the change exposed a bug in timer code which got fixed by 22b886dd1018 ("timers: Use proper base migration in add_timer_on()"). Due to code restructuring, the commit couldn't be backported beyond certain point and stable kernels which only had 874bbfe600a6 started crashing. The local CPU guarantee was accidental more than anything else and we want to get rid of it anyway. As, with the vmstat case fixed, 874bbfe600a6 is causing more problems than it's fixing, it has been decided to take the chance and officially break the guarantee by reverting the commit. A debug feature will be added to force foreign CPU assignment to expose cases relying on the guarantee and fixes for the individual cases will be backported to stable as necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shli@fb.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:24 -08:00
Shaohua Li	334c94072f	workqueue: make sure delayed work run in local cpu commit 874bbfe600a660cba9c776b3957b1ce393151b76 upstream. My system keeps crashing with below message. vmstat_update() schedules a delayed work in current cpu and expects the work runs in the cpu. schedule_delayed_work() is expected to make delayed work run in local cpu. The problem is timer can be migrated with NO_HZ. __queue_work() queues work in timer handler, which could run in a different cpu other than where the delayed work is scheduled. The end result is the delayed work runs in different cpu. The patch makes __queue_delayed_work records local cpu earlier. Where the timer runs doesn't change where the work runs with the change. [ 28.010131] ------------[ cut here ]------------ [ 28.010609] kernel BUG at ../mm/vmstat.c:1392! [ 28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 28.011860] Modules linked in: [ 28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G W4.3.0-rc3+ #634 [ 28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014 [ 28.014160] Workqueue: events vmstat_update [ 28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000 [ 28.015445] RIP: 0010:[<ffffffff8115f921>] [<ffffffff8115f921>]vmstat_update+0x31/0x80 [ 28.016282] RSP: 0018:ffff8800ba42fd80 EFLAGS: 00010297 [ 28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000 [ 28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d [ 28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000 [ 28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640 [ 28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000 [ 28.020071] FS: 0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000 [ 28.020071] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0 [ 28.020071] Stack: [ 28.020071] ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88 [ 28.020071] ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8 [ 28.020071] ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340 [ 28.020071] Call Trace: [ 28.020071] [<ffffffff8106bd88>] process_one_work+0x1c8/0x540 [ 28.020071] [<ffffffff8106bd0b>] ? process_one_work+0x14b/0x540 [ 28.020071] [<ffffffff8106c214>] worker_thread+0x114/0x460 [ 28.020071] [<ffffffff8106c100>] ? process_one_work+0x540/0x540 [ 28.020071] [<ffffffff81071bf8>] kthread+0xf8/0x110 [ 28.020071] [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200 [ 28.020071] [<ffffffff81a6522f>] ret_from_fork+0x3f/0x70 [ 28.020071] [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200 Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-10-27 09:44:50 +09:00
Ian Maund	068b0551a9	This is the 3.10.73 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVFBE+AAoJEDjbvchgkmk+oTkP/j2ipSvgXghFEipZbOJUQkqC fa8elfoF7riTKpKOuDtDU2WI1ttCGYs5gmTNpd4KaEt23eJOQgVqIpV8GhAkW5Af NVyGhjF3dXNqpBkxnyuIkk5OLrNKGRNS2xpz1U254iGObYrK+tr62IzGPxEcPAhX Y+58xPVSjLtNdTJW3YLT3DohUbnbHG6Br9geI1IHtlxg1oDiTxtnX2FmOFzzDpP5 qu8gnPIekg/+1EE46nEiq0C59AwC3aCzNxwlYe1Kd41SY3LUFF1eZMzmOnnwyI5K 3FslAzT6x/sOmGJFTYrKjFA4GKsW67xHVkB/hp/Mu768RqxiQCxV4kgmPsAFLbXb D5qbNwr3i0iQ/9AaD7h8HJkxC/KHmszMux00L/mgZ3SGdGMEIBxHg+oP8+nP8V6C WfXKSWA94dpdRyULEfWdnKnUnp2860C7kt7ASTkOl8rIgU8HgaRqeu+U/KPM2ovD ZJtXPVB5UXCRuVAhZwbvvrLOY8UMZTnv2auAaeLYG8YptcvGeN5Z398/8qdV/z7c A9kOsgebs74X+lR3rbVgSDPQaq2AEiuIvtX77SfmrWXBXGmc99i9+PikuFggRprz cJm5bCM9DaHu/3b77X9Fwl7vnpReB0zPHiwTdH/p7OPMf5m1uQt7SqegC6btLPHs iYgjLd4oW+6uiV/2X1Vx =L+mC -----END PGP SIGNATURE----- Merge commit 'v3.10.73' into msm-3.10 This merge brings us up to date with upstream kernel.org tag v3.10.73. As part of the conflict resolution, changes introduced by commit `72684eae7` ("arm64: Fix up /proc/cpuinfo") have been intentionally dropped, as they conflict with Android changes msm-3.10 kernel to solve the problems in a different way. Since userspace readers of this file may depend on the existing msm-3.10 implementation, it's left as-is for now. The commit may later be introduced if it is found to not impact userspaces paired with this kernel. * commit 'v3.10.73' (264 commits): Linux 3.10.73 target: Allow Write Exclusive non-reservation holders to READ target: Allow AllRegistrants to re-RESERVE existing reservation target: Fix R_HOLDER bit usage for AllRegistrants target/pscsi: Fix NULL pointer dereference in get_device_type iscsi-target: Avoid early conn_logout_comp for iser connections target: Fix reference leak in target_get_sess_cmd() error path ARM: at91: pm: fix at91rm9200 standby ipvs: rerouting to local clients is not needed anymore ipvs: add missing ip_vs_pe_put in sync code powerpc/smp: Wait until secondaries are active & online x86/vdso: Fix the build on GCC5 x86/fpu: Drop_fpu() should not assume that tsk equals current x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() crypto: aesni - fix memory usage in GCM decryption libsas: Fix Kernel Crash in smp_execute_task xen-pciback: limit guest control of command register nilfs2: fix deadlock of segment constructor during recovery regulator: core: Fix enable GPIO reference counting regulator: Only enable disabled regulators on resume ALSA: hda - Treat stereo-to-mono mix properly ALSA: hda - Add workaround for MacBook Air 5,2 built-in mic ALSA: hda - Set single_adc_amp flag for CS420x codecs ALSA: hda - Don't access stereo amps for mono channel widgets ALSA: hda - Fix built-in mic on Compaq Presario CQ60 ALSA: control: Add sanity checks for user ctl id name string spi: pl022: Fix race in giveback() leading to driver lock-up tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE can: add missing initialisations in CAN related skbuffs Change email address for 8250_pci virtio_console: init work unconditionally fuse: notify: don't move pages fuse: set stolen page uptodate drm/radeon: drop setting UPLL to sleep mode drm/radeon: do a posting read in rs600_set_irq drm/radeon: do a posting read in si_set_irq drm/radeon: do a posting read in r600_set_irq drm/radeon: do a posting read in r100_set_irq drm/radeon: do a posting read in evergreen_set_irq drm/radeon: fix DRM_IOCTL_RADEON_CS oops tcp: make connect() mem charging friendly net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour tcp: fix tcp fin memory accounting Revert "net: cx82310_eth: use common match macro" rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() caif: fix MSG_OOB test in caif_seqpkt_recvmsg() inet_diag: fix possible overflow in inet_diag_dump_one_icsk() rds: avoid potential stack overflow net: sysctl_net_core: check SNDBUF and RCVBUF for min length sparc64: Fix several bugs in memmove(). sparc: Touch NMI watchdog when walking cpus and calling printk sparc: perf: Make counting mode actually work sparc: perf: Remove redundant perf_pmu_{en\|dis}able calls sparc: semtimedop() unreachable due to comparison error sparc32: destroy_context() and switch_mm() needs to disable interrupts. Linux 3.10.72 ath5k: fix spontaneus AR5312 freezes ACPI / video: Load the module even if ACPI is disabled drm/radeon: fix 1 RB harvest config setup for TN/RL Drivers: hv: vmbus: incorrect device name is printed when child device is unregistered HID: fixup the conflicting keyboard mappings quirk HID: input: fix confusion on conflicting mappings staging: comedi: cb_pcidas64: fix incorrect AI range code handling dm snapshot: fix a possible invalid memory access on unload dm: fix a race condition in dm_get_md dm io: reject unsupported DISCARD requests with EOPNOTSUPP dm mirror: do not degrade the mirror on discard error staging: comedi: comedi_compat32.c: fix COMEDI_CMD copy back clk: sunxi: Support factor clocks with N factor starting not from 0 fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit. nilfs2: fix potential memory overrun on inode IB/qib: Do not write EEPROM sg: fix read() error reporting ALSA: hda - Add pin configs for ASUS mobo with IDT 92HD73XX codec ALSA: pcm: Don't leave PREPARED state after draining tty: fix up atime/mtime mess, take four sunrpc: fix braino in ->poll() procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation USB: serial: fix potential use-after-free after failed probe TTY: fix tty_wait_until_sent on 64-bit machines USB: serial: fix infinite wait_until_sent timeout net: irda: fix wait_until_sent poll timeout xhci: fix reporting of 0-sized URBs in control endpoint xhci: Allocate correct amount of scratchpad buffers usb: ftdi_sio: Add jtag quirk support for Cyber Cortex AV boards USB: usbfs: don't leak kernel data in siginfo USB: serial: cp210x: Adding Seletek device id's KVM: MIPS: Fix trace event to save PC directly KVM: emulate: fix CMPXCHG8B on 32-bit hosts Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. Btrfs: fix data loss in the fast fsync path btrfs: fix lost return value due to variable shadowing iio: imu: adis16400: Fix sign extension x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization PM / QoS: remove duplicate call to pm_qos_update_target target: Check for LBA + sectors wrap-around in sbc_parse_cdb mm/memory.c: actually remap enough memory mm/compaction: fix wrong order check in compact_finished() mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() mm/hugetlb: add migration entry check in __unmap_hugepage_range team: don't traverse port list using rcu in team_set_mac_address udp: only allow UFO for packets from SOCK_DGRAM sockets usb: plusb: Add support for National Instruments host-to-host cable macvtap: make sure neighbour code can push ethernet header net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg team: fix possible null pointer dereference in team_handle_frame net: reject creation of netdev names with colons ematch: Fix auto-loading of ematch modules. net: phy: Fix verification of EEE support in phy_init_eee ipv4: ip_check_defrag should not assume that skb_network_offset is zero ipv4: ip_check_defrag should correctly check return value of skb_copy_bits gen_stats.c: Duplicate xstats buffer for later use rtnetlink: call ->dellink on failure when ->newlink exists ipv6: fix ipv6_cow_metrics for non DST_HOST case rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY Linux 3.10.71 libceph: fix double __remove_osd() problem libceph: change from BUG to WARN for __remove_osd() asserts libceph: assert both regular and lingering lists in __remove_osd() MIPS: Export FP functions used by lose_fpu(1) for KVM x86, mm/ASLR: Fix stack randomization on 64-bit systems blk-throttle: check stats_cpu before reading it from sysfs jffs2: fix handling of corrupted summary length md/raid1: fix read balance when a drive is write-mostly. md/raid5: Fix livelock when array is both resyncing and degraded. metag: Fix KSTK_EIP() and KSTK_ESP() macros gpio: tps65912: fix wrong container_of arguments arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endian hx4700: regulator: declare full constraints KVM: x86: update masterclock values on TSC writes KVM: MIPS: Don't leak FPU/DSP to guest ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE ntp: Fixup adjtimex freq validation on 32-bit systems kdb: fix incorrect counts in KDB summary command output ARM: pxa: add regulator_has_full_constraints to poodle board file ARM: pxa: add regulator_has_full_constraints to corgi board file vt: provide notifications on selection changes usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN USB: fix use-after-free bug in usb_hcd_unlink_urb() USB: cp210x: add ID for RUGGEDCOM USB Serial Console tty: Prevent untrappable signals from malicious program axonram: Fix bug in direct_access cfq-iosched: fix incorrect filing of rt async cfqq cfq-iosched: handle failure of cfq group allocation iscsi-target: Drop problematic active_ts_list usage NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args Added Little Endian support to vtpm module tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send tpm: Fix NULL return in tpm_ibmvtpm_get_desired_dma tpm_tis: verify interrupt during init ARM: 8284/1: sa1100: clear RCSR_SMR on resume tracing: Fix unmapping loop in tracing_mark_write MIPS: KVM: Deliver guest interrupts after local_irq_disable() nfs: don't call blocking operations while !TASK_RUNNING mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles power_supply: 88pm860x: Fix leaked power supply on probe fail ALSA: hdspm - Constrain periods to 2 on older cards ALSA: off by one bug in snd_riptide_joystick_probe() lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb cpufreq: speedstep-smi: enable interrupts when waiting PCI: Fix infinite loop with ROM image of size 0 PCI: Generate uppercase hex for modalias var in uevent HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events iwlwifi: mvm: always use mac color zero iwlwifi: mvm: fix failure path when power_update fails in add_interface iwlwifi: mvm: validate tid and sta_id in ba_notif iwlwifi: pcie: disable the SCD_BASE_ADDR when we resume from WoWLAN fsnotify: fix handling of renames in audit xfs: set superblock buffer type correctly xfs: inode unlink does not set AGI buffer type xfs: ensure buffer types are set correctly Bluetooth: ath3k: workaround the compatibility issue with xHCI controller Linux 3.10.70 rbd: drop an unsafe assertion media/rc: Send sync space information on the lirc device net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param ppp: deflate: never return len larger than output buffer ipv4: tcp: get rid of ugly unicast_sock tcp: ipv4: initialize unicast_sock sk_pacing_rate bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too ping: Fix race in free in receive path udp_diag: Fix socket skipping within chain ipv4: try to cache dst_entries which would cause a redirect net: sctp: fix slab corruption from use after free on INIT collisions netxen: fix netxen_nic_poll() logic ipv6: stop sending PTB packets for MTU < 1280 net: rps: fix cpu unplug ip: zero sockaddr returned on error queue Linux 3.10.69 crypto: crc32c - add missing crypto module alias x86,kvm,vmx: Preserve CR4 across VM entry kvm: vmx: handle invvpid vm exit gracefully smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread() ALSA: ak411x: Fix stall in work callback ASoC: sgtl5000: add delay before first I2C access ASoC: atmel_ssc_dai: fix start event for I2S mode lib/checksum.c: fix build for generic csum_tcpudp_nofold ext4: prevent bugon on race between write/fcntl arm64: Fix up /proc/cpuinfo nilfs2: fix deadlock of segment constructor over I_SYNC flag lib/checksum.c: fix carry in csum_tcpudp_nofold mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range MIPS: Fix kernel lockup or crash after CPU offline/online MIPS: IRQ: Fix disable_irq on CPU IRQs PCI: Add NEC variants to Stratus ftServer PCIe DMI check gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low gpio: sysfs: fix memory leak in gpiod_export_link Linux 3.10.68 target: Drop arbitrary maximum I/O size limit iser-target: Fix implicit termination of connections iser-target: Handle ADDR_CHANGE event for listener cm_id iser-target: Fix connected_handler + teardown flow race iser-target: Parallelize CM connection establishment iser-target: Fix flush + disconnect completion handling iscsi,iser-target: Initiate termination only once vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion tcm_loop: Fix wrong I_T nexus association vhost-scsi: Take configfs group dependency during VHOST_SCSI_SET_ENDPOINT ib_isert: Add max_send_sge=2 minimum for control PDU responses IB/isert: Adjust CQ size to HW limits workqueue: fix subtle pool management issue which can stall whole worker_pool gpio: squelch a compiler warning efi-pstore: Make efi-pstore return a unique id pstore/ram: avoid atomic accesses for ioremapped regions pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz pstore: skip zero size persistent ram buffer in traverse pstore: clarify clearing of _read_cnt in ramoops_context pstore: d_alloc_name() doesn't return an ERR_PTR pstore: Fail to unlink if a driver has not defined pstore_erase ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear ARM: DMA: ensure that old section mappings are flushed from the TLB ARM: 7931/1: Correct virt_addr_valid ARM: fix asm/memory.h build error ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg(). ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h ARM: lpae: fix definition of PTE_HWTABLE_PTRS ARM: fix type of PHYS_PFN_OFFSET to unsigned long ARM: LPAE: use phys_addr_t in alloc_init_pud() ARM: LPAE: use signed arithmetic for mask definitions ARM: mm: correct pte_same behaviour for LPAE. ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables drivers: net: cpsw: discard dual emac default vlan configuration regulator: core: fix race condition in regulator_put() spi/pxa2xx: Clear cur_chip pointer before starting next message dm cache: fix missing ERR_PTR returns and handling dm thin: don't allow messages to be sent to a pool target in READ_ONLY or FAIL mode nl80211: fix per-station group key get/del and memory leak NFSv4.1: Fix an Oops in nfs41_walk_client_list nfs: fix dio deadlock when O_DIRECT flag is flipped Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857) ALSA: seq-dummy: remove deadlock-causing events on close powerpc/xmon: Fix another endiannes issue in RTAS call from xmon can: kvaser_usb: Fix state handling upon BUS_ERROR events can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT can: kvaser_usb: Send correct context to URB completion can: kvaser_usb: Do not sleep in atomic context ASoC: wm8960: Fix capture sample rate from 11250 to 11025 spi: dw-mid: fix FIFO size Signed-off-by: Ian Maund <imaund@codeaurora.org>	2015-04-24 18:14:57 -07:00
Matt Wagantall	a56ef0d58f	Revert "workqueue: fix subtle pool management issue which can stall whole worker_pool" This reverts commit `6515b297d5`. Drop this change in preparation for picking up its upstream linux-stable version, in which merge conflicts have been differently resolved. This will allow for a cleaner merge of linux-stable v3.10.73. Change-Id: I6cc115983a00f0153914a9b8fa1378c8fbae07f8 Signed-off-by: Matt Wagantall <mattw@codeaurora.org>	2015-04-24 18:05:13 -07:00
Tejun Heo	d8bee0e3ab	workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 upstream. cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com v3: bit_waitqueue() can't be used for work items defined in vmalloc area. Switched to custom wake function which matches the target work item and exclusive wait and wakeup. v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if the target bit waitqueue has wait_bit_queue's on it. Use DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu Vizoso. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com> Tested-by: Jesper Nilsson <jesper.nilsson@axis.com> Tested-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-26 15:00:58 +01:00
Tejun Heo	6515b297d5	workqueue: fix subtle pool management issue which can stall whole worker_pool A worker_pool's forward progress is guaranteed by the fact that the last idle worker assumes the manager role to create more workers and summon the rescuers if creating workers doesn't succeed in timely manner before proceeding to execute work items. This manager role is implemented in manage_workers(), which indicates whether the worker may proceed to work item execution with its return value. This is necessary because multiple workers may contend for the manager role, and, if there already is a manager, others should proceed to work item execution. Unfortunately, the function also indicates that the worker may proceed to work item execution if need_to_create_worker() is false at the head of the function. need_to_create_worker() tests the following conditions. pending work items && !nr_running && !nr_idle The first and third conditions are protected by pool->lock and thus won't change while holding pool->lock; however, nr_running can change asynchronously as other workers block and resume and while it's likely to be zero, as someone woke this worker up in the first place, some other workers could have become runnable inbetween making it non-zero. If this happens, manage_worker() could return false even with zero nr_idle making the worker, the last idle one, proceed to execute work items. If then all workers of the pool end up blocking on a resource which can only be released by a work item which is pending on that pool, the whole pool can deadlock as there's no one to create more workers or summon the rescuers. This patch fixes the problem by removing the early exit condition from maybe_create_worker() and making manage_workers() return false iff there's already another manager, which ensures that the last worker doesn't start executing work items. We can leave the early exit condition alone and just ignore the return value but the only reason it was put there is because the manage_workers() used to perform both creations and destructions of workers and thus the function may be invoked while the pool is trying to reduce the number of workers. Now that manage_workers() is called only when more workers are needed, the only case this early exit condition is triggered is rare race conditions rendering it pointless. Tested with simulated workload and modified workqueue code which trigger the pool deadlock reliably without this patch. Change-Id: If0ca1b86c06ee17b7fbb670b0e70b421a256b2b7 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Eric Sandeen <sandeen@sandeen.net> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net Cc: Dave Chinner <david@fromorbit.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: stable@vger.kernel.org CRs-fixed: 783758 Git-commit: 29187a9eeaf362d8422e62e17a22a6e115277a49 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [joonwoop@codeaurora.org: fixed conflicts in comments and manage_workers() to keep maybe_destroy_workers()] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-02-13 10:15:14 -08:00
Tejun Heo	85be16bad7	workqueue: fix subtle pool management issue which can stall whole worker_pool commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream. A worker_pool's forward progress is guaranteed by the fact that the last idle worker assumes the manager role to create more workers and summon the rescuers if creating workers doesn't succeed in timely manner before proceeding to execute work items. This manager role is implemented in manage_workers(), which indicates whether the worker may proceed to work item execution with its return value. This is necessary because multiple workers may contend for the manager role, and, if there already is a manager, others should proceed to work item execution. Unfortunately, the function also indicates that the worker may proceed to work item execution if need_to_create_worker() is false at the head of the function. need_to_create_worker() tests the following conditions. pending work items && !nr_running && !nr_idle The first and third conditions are protected by pool->lock and thus won't change while holding pool->lock; however, nr_running can change asynchronously as other workers block and resume and while it's likely to be zero, as someone woke this worker up in the first place, some other workers could have become runnable inbetween making it non-zero. If this happens, manage_worker() could return false even with zero nr_idle making the worker, the last idle one, proceed to execute work items. If then all workers of the pool end up blocking on a resource which can only be released by a work item which is pending on that pool, the whole pool can deadlock as there's no one to create more workers or summon the rescuers. This patch fixes the problem by removing the early exit condition from maybe_create_worker() and making manage_workers() return false iff there's already another manager, which ensures that the last worker doesn't start executing work items. We can leave the early exit condition alone and just ignore the return value but the only reason it was put there is because the manage_workers() used to perform both creations and destructions of workers and thus the function may be invoked while the pool is trying to reduce the number of workers. Now that manage_workers() is called only when more workers are needed, the only case this early exit condition is triggered is rare race conditions rendering it pointless. Tested with simulated workload and modified workqueue code which trigger the pool deadlock reliably without this patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Eric Sandeen <sandeen@sandeen.net> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net Cc: Dave Chinner <david@fromorbit.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-05 22:35:40 -08:00
Ian Maund	6440f462f9	Merge upstream tag 'v3.10.49' into msm-3.10 * commit 'v3.10.49': (529 commits) Linux 3.10.49 ACPI / battery: Retry to get battery information if failed during probing x86, ioremap: Speed up check for RAM pages Score: Modify the Makefile of Score, remove -mlong-calls for compiling Score: The commit is for compiling successfully. Score: Implement the function csum_ipv6_magic score: normalize global variables exported by vmlinux.lds rtmutex: Plug slow unlock race rtmutex: Handle deadlock detection smarter rtmutex: Detect changes in the pi lock chain rtmutex: Fix deadlock detector for real ring-buffer: Check if buffer exists before polling drm/radeon: stop poisoning the GART TLB drm/radeon: fix typo in golden register setup on evergreen ext4: disable synchronous transaction batching if max_batch_time==0 ext4: clarify error count warning messages ext4: fix unjournalled bg descriptor while initializing inode bitmap dm io: fix a race condition in the wake up code for sync_io Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code clk: spear3xx: Use proper control register offset ... In addition to bringing in upstream commits, this merge also makes minor changes to mainitain compatibility with upstream: The definition of list_next_entry in qcrypto.c and ipa_dp.c has been removed, as upstream has moved the definition to list.h. The implementation of list_next_entry was identical between the two. irq.c, for both arm and arm64 architecture, has had its calls to __irq_set_affinity_locked updated to reflect changes to the API upstream. Finally, as we have removed the sleep_length member variable of the tick_sched struct, all changes made by upstream commit `ec804bd` do not apply to our tree and have been removed from this merge. Only kernel/time/tick-sched.c is impacted. Change-Id: I63b7e0c1354812921c94804e1f3b33d1ad6ee3f1 Signed-off-by: Ian Maund <imaund@codeaurora.org>	2014-08-20 13:23:09 -07:00
Yasuaki Ishimatsu	3e24998c8a	workqueue: zero cpumask of wq_numa_possible_cpumask on init commit 5a6024f1604eef119cf3a6fa413fe0261a81a8f3 upstream. When hot-adding and onlining CPU, kernel panic occurs, showing following call trace. BUG: unable to handle kernel paging request at 0000000000001d08 IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10 PGD 0 Oops: 0000 [#1] SMP ... Call Trace: [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50 [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0 [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0 [<ffffffff811926f1>] new_slab+0x91/0x300 [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482 [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0 [<ffffffff810a3c78>] ? load_balance+0x218/0x890 [<ffffffff8101a679>] ? sched_clock+0x9/0x10 [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10 [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200 [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0 [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60 [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140 [<ffffffff8105d0ec>] do_fork+0xbc/0x360 [<ffffffff8105d3b6>] kernel_thread+0x26/0x30 [<ffffffff81086652>] kthreadd+0x2c2/0x300 [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60 [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60 In my investigation, I found the root cause is wq_numa_possible_cpumask. All entries of wq_numa_possible_cpumask is allocated by alloc_cpumask_var_node(). And these entries are used without initializing. So these entries have wrong value. When hot-adding and onlining CPU, wq_update_unbound_numa() is called. wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq() calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set as follow: 3592 /* if cpumask is contained inside a NUMA node, we belong to that node */ 3593 if (wq_numa_enabled) { 3594 for_each_node(node) { 3595 if (cpumask_subset(pool->attrs->cpumask, 3596 wq_numa_possible_cpumask[node])) { 3597 pool->node = node; 3598 break; 3599 } 3600 } 3601 } But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong node is selected. As a result, kernel panic occurs. By this patch, all entries of wq_numa_possible_cpumask are allocated by zalloc_cpumask_var_node to initialize them. And the panic disappeared. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `bce903809a` ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-07-17 15:58:00 -07:00
Maxime Bizon	7c36f88c2f	workqueue: fix dev_set_uevent_suppress() imbalance commit bddbceb688c6d0decaabc7884fede319d02f96c8 upstream. Uevents are suppressed during attributes registration, but never restored, so kobject_uevent() does nothing. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `226223ab3c` Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-07-17 15:57:59 -07:00
Lai Jiangshan	f56fb0d42b	workqueue: make rescuer_thread() empty wq->maydays list before exiting commit 4d595b866d2c653dc90a492b9973a834eabfa354 upstream. After a @pwq is scheduled for emergency execution, other workers may consume the affectd work items before the rescuer gets to them. This means that a workqueue many have pwqs queued on @wq->maydays list while not having any work item pending or in-flight. If destroy_workqueue() executes in such condition, the rescuer may exit without emptying @wq->maydays. This currently doesn't cause any actual harm. destroy_workqueue() can safely destroy all the involved data structures whether @wq->maydays is populated or not as nobody access the list once the rescuer exits. However, this is nasty and makes future development difficult. Let's update rescuer_thread() so that it empties @wq->maydays after seeing should_stop to guarantee that the list is empty on rescuer exit. tj: Updated comment and patch description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-06-07 13:25:37 -07:00
Lai Jiangshan	aac8b37ffa	workqueue: fix a possible race condition between rescuer and pwq-release commit 77668c8b559e4fe2acf2a0749c7c83cde49a5025 upstream. There is a race condition between rescuer_thread() and pwq_unbound_release_workfn(). Even after a pwq is scheduled for rescue, the associated work items may be consumed by any worker. If all of them are consumed before the rescuer gets to them and the pwq's base ref was put due to attribute change, the pwq may be released while still being linked on @wq->maydays list making the rescuer dereference already freed pwq later. Make send_mayday() pin the target pwq until the rescuer is done with it. tj: Updated comment and patch description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-06-07 13:25:37 -07:00
Daeseok Youn	55a3dfcc84	workqueue: fix bugs in wq_update_unbound_numa() failure path commit 77f300b198f93328c26191b52655ce1b62e202cf upstream. wq_update_unbound_numa() failure path has the following two bugs. - alloc_unbound_pwq() is called without holding wq->mutex; however, if the allocation fails, it jumps to out_unlock which tries to unlock wq->mutex. - The function should switch to dfl_pwq on failure but didn't do so after alloc_unbound_pwq() failure. Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on alloc_unbound_pwq() failure. Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-06-07 13:25:37 -07:00
Ian Maund	356fb13538	Merge upstream linux-stable v3.10.36 into msm-3.10 * commit 'v3.10.36': (494 commits) Linux 3.10.36 netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages mm: close PageTail race net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE x86: fix boot on uniprocessor systems Input: cypress_ps2 - don't report as a button pads Input: synaptics - add manual min/max quirk for ThinkPad X240 Input: synaptics - add manual min/max quirk Input: mousedev - fix race when creating mixed device ext4: atomically set inode->i_flags in ext4_set_inode_flags() Linux 3.10.35 sched/autogroup: Fix race with task_groups list e100: Fix "disabling already-disabled device" warning xhci: Fix resume issues on Renesas chips in Samsung laptops Input: wacom - make sure touch_max is set for touch devices KVM: VMX: fix use after free of vmx->loaded_vmcs KVM: x86: handle invalid root_hpa everywhere KVM: MMU: handle invalid root_hpa at __direct_map Input: elantech - improve clickpad detection ARM: highbank: avoid L2 cache smc calls when PL310 is not present ... Change-Id: Ib68f565291702c53df09e914e637930c5d3e5310 Signed-off-by: Ian Maund <imaund@codeaurora.org>	2014-04-23 16:23:49 -07:00
Ian Maund	f1b32d4e47	Merge upstream linux-stable v3.10.28 into msm-3.10 The following commits have been reverted from this merge, as they are known to introduce new bugs and are currently incompatible with our audio implementation. Investigation of these commits is ongoing, and they are expected to be brought in at a later time: `86e6de7` ALSA: compress: fix drain calls blocking other compress functions (v6) `16442d4` ALSA: compress: fix drain calls blocking other compress functions This merge commit also includes a change in block, necessary for compilation. Upstream has modified elevator_init_fn to prevent race conditions, requring updates to row_init_queue and test_init_queue. * commit 'v3.10.28': (1964 commits) Linux 3.10.28 ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init() serial: amba-pl011: use port lock to guard control register access mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL md/raid5: Fix possible confusion when multiple write errors occur. md/raid10: fix two bugs in handling of known-bad-blocks. md/raid10: fix bug when raid10 recovery fails to recover a block. md: fix problem when adding device to read-only array with bitmap. drm/i915: fix DDI PLLs HW state readout code nilfs2: fix segctor bug that causes file system corruption thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only ftrace/x86: Load ftrace_ops in parameter not the variable holding it SELinux: Fix possible NULL pointer dereference in selinux_inode_permission() writeback: Fix data corruption on NFS hwmon: (coretemp) Fix truncated name of alarm attributes vfs: In d_path don't call d_dname on a mount point staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq() staging: comedi: addi_apci_1032: fix subdevice type/flags bug mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully GFS2: Increase i_writecount during gfs2_setattr_chown perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h perf scripting perl: Fix build error on Fedora 12 ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic Linux 3.10.27 sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper SCSI: sd: Reduce buffer size for vpd request intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters. mac80211: move "bufferable MMPDU" check to fix AP mode scan ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS ACPI / TPM: fix memory leak when walking ACPI namespace mfd: rtsx_pcr: Disable interrupts before cancelling delayed works clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock clk: samsung: exynos4: Correct SRC_MFC register clk: clk-divider: fix divisor > 255 bug ahci: add PCI ID for Marvell 88SE9170 SATA controller parisc: Ensure full cache coherency for kmap/kunmap drm/nouveau/bios: make jump conditional ARM: shmobile: mackerel: Fix coherent DMA mask ARM: shmobile: armadillo: Fix coherent DMA mask ARM: shmobile: kzm9g: Fix coherent DMA mask ARM: dts: exynos5250: Fix MDMA0 clock number ARM: fix "bad mode in ... handler" message for undefined instructions ARM: fix footbridge clockevent device net: Loosen constraints for recalculating checksum in skb_segment() bridge: use spin_lock_bh() in br_multicast_set_hash_max netpoll: Fix missing TXQ unlock and and OOPS. net: llc: fix use after free in llc_ui_recvmsg virtio-net: fix refill races during restore virtio_net: don't leak memory or block when too many frags virtio-net: make all RX paths handle errors consistently virtio_net: fix error handling for mergeable buffers vlan: Fix header ops passthru when doing TX VLAN offload. net: rose: restore old recvmsg behavior rds: prevent dereference of a NULL device ipv6: always set the new created dst's from in ip6_rt_copy net: fec: fix potential use after free hamradio/yam: fix info leak in ioctl drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() net: inet_diag: zero out uninitialized idiag_{src,dst} fields ip_gre: fix msg_name parsing for recvfrom/recvmsg net: unix: allow bind to fail on mutex lock ipv6: fix illegal mac_header comparison on 32bit netvsc: don't flush peers notifying work during setting mtu tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 net: unix: allow set_peek_off to fail net: drop_monitor: fix the value of maxattr ipv6: don't count addrconf generated routes against gc limit packet: fix send path when running with proto == 0 virtio: delete napi structures from netdev before releasing memory macvtap: signal truncated packets tun: update file current position macvtap: update file current position macvtap: Do not double-count received packets rds: prevent BUG_ON triggered on congestion update to loopback net: do not pretend FRAGLIST support IPv6: Fixed support for blackhole and prohibit routes HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue"" gpio-rcar: R-Car GPIO IRQ share interrupt clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast irqchip: renesas-irqc: Fix irqc_probe error handling Linux 3.10.26 sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c ext4: fix bigalloc regression arm64: Use Normal NonCacheable memory for writecombine arm64: Do not flush the D-cache for anonymous pages arm64: Avoid cache flushing in flush_dcache_page() ARM: KVM: arch_timers: zero CNTVOFF upon return to host ARM: hyp: initialize CNTVOFF to zero clocksource: arch_timer: use virtual counters arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S arm64: dts: Reserve the memory used for secondary CPU release address arm64: check for number of arguments in syscall_get/set_arguments() arm64: fix possible invalid FPSIMD initialization state ... Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106 Signed-off-by: Ian Maund <imaund@codeaurora.org>	2014-03-24 14:28:34 -07:00
Lai Jiangshan	4403be9e25	workqueue: ensure @task is valid across kthread_stop() commit 5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb upstream. When a kworker should die, the kworkre is notified through WORKER_DIE flag instead of kthread_should_stop(). This, IIRC, is primarily to keep the test synchronized inside worker_pool lock. WORKER_DIE is first set while holding pool->lock, the lock is dropped and kthread_stop() is called. Unfortunately, this means that there's a slight chance that the target kworker may see WORKER_DIE before kthread_stop() finishes and exits and frees the target task before or during kthread_stop(). Fix it by pinning the target task before setting WORKER_DIE and putting it after kthread_stop() is done. tj: Improved patch description and comment. Moved pinning above WORKER_DIE for better signify what it's protecting. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-03-06 21:30:11 -08:00
Tejun Heo	ced4ac9285	workqueue: fix ordered workqueues in NUMA setups commit 8a2b75384444488fc4f2cbb9f0921b6a0794838f upstream. An ordered workqueue implements execution ordering by using single pool_workqueue with max_active == 1. On a given pool_workqueue, work items are processed in FIFO order and limiting max_active to 1 enforces the queued work items to be processed one by one. Unfortunately, `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues") accidentally broke this guarantee by applying NUMA affinity to ordered workqueues too. On NUMA setups, an ordered workqueue would end up with separate pool_workqueues for different nodes. Each pool_workqueue still limits max_active to 1 but multiple work items may be executed concurrently and out of order depending on which node they are queued to. Fix it by using dedicated ordered_wq_attrs[] when creating ordered workqueues. The new attrs match the unbound ones except that no_numa is always set thus forcing all NUMA nodes to share the default pool_workqueue. While at it, add sanity check in workqueue creation path which verifies that an ordered workqueues has only the default pool_workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Libin <huawei.libin@huawei.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-12-04 10:57:16 -08:00
Tejun Heo	6ff96f7340	workqueue: cond_resched() after processing each work item commit b22ce2785d97423846206cceec4efee0c4afd980 upstream. If !PREEMPT, a kworker running work items back to back can hog CPU. This becomes dangerous when a self-requeueing work item which is waiting for something to happen races against stop_machine. Such self-requeueing work item would requeue itself indefinitely hogging the kworker and CPU it's running on while stop_machine would wait for that CPU to enter stop_machine while preventing anything else from happening on all other CPUs. The two would deadlock. Jamie Liu reports that this deadlock scenario exists around scsi_requeue_run_queue() and libata port multiplier support, where one port may exclude command processing from other ports. With the right timing, scsi_requeue_run_queue() can end up requeueing itself trying to execute an IO which is asked to be retried while another device has an exclusive access, which in turn can't make forward progress due to stop_machine. Fix it by invoking cond_resched() after executing each work item. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jamie Liu <jamieliu@google.com> References: http://thread.gmane.org/gmane.linux.kernel/1552567 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-09-07 22:09:58 -07:00
Syed Rameez Mustafa	7b49b86d3a	kernel/lib: add additional debug capabilites for data corruption Data corruptions in the kernel often end up in system crashes that are easier to debug closer to the time of detection. Specifically, if we do not panic immediately after lock or list corruptions have been detected, the problem context is lost in the ensuing system mayhem. Add support for allowing system crash immediately after such corruptions are detected. The CONFIG option controls the enabling/disabling of the feature. Change-Id: I9b2eb62da506a13007acff63e85e9515145909ff Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2013-08-22 18:08:50 -07:00
Shaohua Li	73b8bd6de8	workqueue: copy workqueue_attrs with all fields commit 2865a8fb44cc32420407362cbda80c10fa09c6b2 upstream. $echo '0' > /sys/bus/workqueue/devices/xxx/numa $cat /sys/bus/workqueue/devices/xxx/numa I got 1. It should be 0, the reason is copy_workqueue_attrs() called in apply_workqueue_attrs() doesn't copy no_numa field. Fix it by making copy_workqueue_attrs() copy ->no_numa too. This would also make get_unbound_pool() set a pool's ->no_numa attribute according to the workqueue attributes used when the pool was created. While harmelss, as ->no_numa isn't a pool attribute, this is a bit confusing. Clear it explicitly. tj: Updated description and comments a bit. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-08-11 18:35:25 -07:00
Tejun Heo	1be0c25da5	workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init() wq_numa_init() builds per-node cpumasks which are later used to make unbound workqueues NUMA-aware. The cpumasks are allocated using alloc_cpumask_var_node() for all possible nodes. Unfortunately, on machines with off-line nodes, this leads to NUMA-aware allocations on existing bug offline nodes, which in turn triggers BUG in the memory allocation code. Fix it by using NUMA_NO_NODE for cpumask allocations for offline nodes. kernel BUG at include/linux/gfp.h:323! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1 Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011 task: ffff880234608000 ti: ffff880234602000 task.ti: ffff880234602000 RIP: 0010:[<ffffffff8117495d>] [<ffffffff8117495d>] new_slab+0x2ad/0x340 RSP: 0000:ffff880234603bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880237404b40 RCX: 00000000000000d0 RDX: 0000000000000001 RSI: 0000000000000003 RDI: 00000000002052d0 RBP: ffff880234603c28 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000001 R11: ffffffff812e3aa8 R12: 0000000000000001 R13: ffff8802378161c0 R14: 0000000000030027 R15: 00000000000040d0 FS: 0000000000000000(0000) GS:ffff880237800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88043fdff000 CR3: 00000000018d5000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880234603c28 0000000000000001 00000000000000d0 ffff8802378161c0 ffff880237404b40 ffff880237404b40 ffff880234603d28 ffffffff815edba1 ffff880237816140 0000000000000000 ffff88023740e1c0 Call Trace: [<ffffffff815edba1>] __slab_alloc+0x330/0x4f2 [<ffffffff81174b25>] kmem_cache_alloc_node_trace+0xa5/0x200 [<ffffffff812e3aa8>] alloc_cpumask_var_node+0x28/0x90 [<ffffffff81a0bdb3>] wq_numa_init+0x10d/0x1be [<ffffffff81a0bec8>] init_workqueues+0x64/0x341 [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff819f1f31>] kernel_init_freeable+0xb7/0x1ec [<ffffffff815d50de>] kernel_init+0xe/0xf0 [<ffffffff815ff89c>] ret_from_fork+0x7c/0xb0 Code: 45 84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff <0f> 0b 4c 8b 73 38 44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-Tested-by: Lingzhu Xiang <lxiang@redhat.com>	2013-05-15 14:24:24 -07:00
Marc Dionne	ad7b1f841f	workqueue: Make schedule_work() available again to non GPL modules Commit `8425e3d5bd` ("workqueue: inline trivial wrappers") changed schedule_work() and schedule_delayed_work() to inline wrappers, but these rely on some symbols that are EXPORT_SYMBOL_GPL, while the original functions were EXPORT_SYMBOL. This has the effect of changing the licensing requirement for these functions and making them unavailable to non GPL modules. Make them available again by removing the restriction on the required symbols. Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-05-14 11:52:51 -07:00
Joonsoo Kim	8f174b1175	workqueue: correct handling of the pool spin_lock When we fail to mutex_trylock(), we release the pool spin_lock and do mutex_lock(). After that, we should regrab the pool spin_lock, but, regrabbing is missed in current code. So correct it. Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-05-14 11:48:15 -07:00
Tejun Heo	d325185916	workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number `df2d5ae499` ("workqueue: map an unbound workqueues to multiple per-node pool_workqueues") made unbound workqueues to map to multiple per-node pool_workqueues and accordingly updated workqueue_contested() so that, for unbound workqueues, it maps the specified @cpu to the NUMA node number to obtain the matching pool_workqueue to query the congested state. Before this change, workqueue_congested() ignored @cpu for unbound workqueues as there was only one pool_workqueue and some users (fscache) called it with WORK_CPU_UNBOUND. After the commit, this causes the following oops as WORK_CPU_UNBOUND gets translated to garbage by cpu_to_node(). BUG: unable to handle kernel paging request at ffff8803598d98b8 IP: [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa PGD 2421067 PUD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 2689 Comm: cat Tainted: GF 3.9.0-fsdevel+ #4 task: ffff88003d801040 ti: ffff880025806000 task.ti: ffff880025806000 RIP: 0010:[<ffffffff81043b7e>] [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa RSP: 0018:ffff880025807ad8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800388a2400 RCX: 0000000000000003 RDX: ffff880025807fd8 RSI: ffffffff81a31420 RDI: ffff88003d8016e0 RBP: ffff880025807ae8 R08: ffff88003d801730 R09: ffffffffa00b4898 R10: ffffffff81044217 R11: ffff88003d801040 R12: 0000000064206e97 R13: ffff880036059d98 R14: ffff880038cc8080 R15: ffff880038cc82d0 FS: 00007f21afd9c740(0000) GS:ffff88003d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8803598d98b8 CR3: 000000003df49000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff8800388a2400 0000000000000002 ffff880025807b18 ffffffff810442ce ffffffff81044217 ffff880000000002 ffff8800371b4080 ffff88003d112ec0 ffff880025807b38 ffffffffa00810b0 ffff880036059d88 ffff880036059be8 Call Trace: [<ffffffff810442ce>] workqueue_congested+0xb7/0x12c [<ffffffffa00810b0>] fscache_enqueue_object+0xb2/0xe8 [fscache] [<ffffffffa007facd>] __fscache_acquire_cookie+0x3b9/0x56c [fscache] [<ffffffffa00ad8fe>] nfs_fscache_set_inode_cookie+0xee/0x132 [nfs] [<ffffffffa009e112>] do_open+0x9/0xd [nfs] [<ffffffff810e804a>] do_dentry_open+0x175/0x24b [<ffffffff810e8298>] finish_open+0x41/0x51 Fix it by using smp_processor_id() if @cpu is WORK_CPU_UNBOUND. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Howells <dhowells@redhat.com> Tested-and-Acked-by: David Howells <dhowells@redhat.com>	2013-05-10 11:10:17 -07:00
Tejun Heo	3d1cb2059d	workqueue: include workqueue info when printing debug dump of a worker task One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. This patch implements set_worker_desc() which can be called from any workqueue work function to set its description. When the worker task is dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion and so on - the description will be printed out together with the workqueue name and the worker function pointer. The printing side is implemented by print_worker_info() which is called from functions in task dump paths - sched_show_task() and dump_stack_print_info(). print_worker_info() can be safely called on any task in any state as long as the task struct itself is accessible. It uses probe_*() functions to access worker fields. It may print garbage if something went very wrong, but it wouldn't cause (another) oops. The description is currently limited to 24bytes including the terminating \0. worker->desc_valid and workder->desc[] are added and the 64 bytes marker which was already incorrect before adding the new fields is moved to the correct position. Here's an example dump with writeback updated to set the bdi name as worker desc. Hardware name: Bochs Modules linked in: Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1 Workqueue: writeback bdi_writeback_workfn (flush-8:0) ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8 ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0 ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08 Call Trace: [<ffffffff81c61845>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81200150>] bdi_writeback_workfn+0x2a0/0x3b0 ... Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Wei Yongjun	cece95dfe5	workqueue: use kmem_cache_free() instead of kfree() memory allocated by kmem_cache_alloc() should be freed using kmem_cache_free(), not kfree(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-09 11:33:40 -07:00
Lai Jiangshan	5c529597e9	workqueue: avoid false negative WARN_ON() in destroy_workqueue() destroy_workqueue() performs several sanity checks before proceeding with destruction of a workqueue. One of the checks verifies that refcnt of each pwq (pool_workqueue) is over 1 as at that point there should be no in-flight work items and the only holder of pwq refs is the workqueue itself. This worked fine as a workqueue used to hold only one reference to its pwqs; however, since `4c16bd327c` ("workqueue: implement NUMA affinity for unbound workqueues"), a workqueue may hold multiple references to its default pwq triggering this sanity check spuriously. Fix it by not triggering the pwq->refcnt assertion on default pwqs. An example spurious WARN trigger follows. WARNING: at kernel/workqueue.c:4201 destroy_workqueue+0x6a/0x13e() Hardware name: 4286C12 Modules linked in: sdhci_pci sdhci mmc_core usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video Pid: 361, comm: umount Not tainted 3.9.0-rc5+ #29 Call Trace: [<c04314a7>] warn_slowpath_common+0x7c/0x93 [<c04314e0>] warn_slowpath_null+0x22/0x24 [<c044796a>] destroy_workqueue+0x6a/0x13e [<c056dc01>] ext4_put_super+0x43/0x2c4 [<c04fb7b8>] generic_shutdown_super+0x4b/0xb9 [<c04fb848>] kill_block_super+0x22/0x60 [<c04fb960>] deactivate_locked_super+0x2f/0x56 [<c04fc41b>] deactivate_super+0x2e/0x31 [<c050f1e6>] mntput_no_expire+0x103/0x108 [<c050fdce>] sys_umount+0x2a2/0x2c4 [<c050fe0e>] sys_oldumount+0x1e/0x20 [<c085ba4d>] sysenter_do_call+0x12/0x38 tj: Rewrote description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com>	2013-04-04 07:54:01 -07:00
Tejun Heo	229641a6f1	Linux 3.9-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRWLTrAAoJEHm+PkMAQRiGe8oH/iMy48mecVWvxVZn74Tx3Cef xmW/PnAIj28EhSPqK49N/Ow6AfQToFKf7AP0ge20KAf5teTq95AY+tH74DAANt8F BjKXXTZiR5xwBvRkq7CR5wDcCvEcBAAz8fgTEd6SEDB2d2VXFf5eKdKUqt1avTCh Z6Hup5kuwX+ddtwY2DCBXtp2n6fL0Rm5yLzY1A3OOBye1E7VyLTF7M5BR603Q44P 4kRLxn8+R7jy3hTuZIhAeoS8TKUoBwVk7DmKxEzrhTHZVOmvwE9lEHybRnIyOpd/ k1JnbRbiPsLsCVFOn10SQkGDAIk00lro3tuWP2C1ljERiD/OOh5Ui9nXYAhMkbI= =q15K -----END PGP SIGNATURE----- Merge tag 'v3.9-rc5' into wq/for-3.10 Writeback conversion to workqueue will be based on top of wq/for-3.10 branch to take advantage of custom attrs and NUMA support for unbound workqueues. Mainline currently contains two commits which result in non-trivial merge conflicts with wq/for-3.10 and because block/for-3.10/core is based on v3.9-rc3 which contains one of the conflicting commits, we need a pre-merge-window merge anyway. Let's pull v3.9-rc5 into wq/for-3.10 so that the block tree doesn't suffer from workqueue merge conflicts. The two conflicts and their resolutions: * `e68035fb65` ("workqueue: convert to idr_alloc()") in mainline changes worker_pool_assign_id() to use idr_alloc() instead of the old idr interface. worker_pool_assign_id() goes through multiple locking changes in wq/for-3.10 causing the following conflict. static int worker_pool_assign_id(struct worker_pool pool) { int ret; <<<<<<< HEAD lockdep_assert_held(&wq_pool_mutex); do { if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL)) return -ENOMEM; ret = idr_get_new(&worker_pool_idr, pool, &pool->id); } while (ret == -EAGAIN); ======= mutex_lock(&worker_pool_idr_mutex); ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL); if (ret >= 0) pool->id = ret; mutex_unlock(&worker_pool_idr_mutex); >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89 return ret < 0 ? ret : 0; } We want locking from the former and idr_alloc() usage from the latter, which can be combined to the following. static int worker_pool_assign_id(struct worker_pool pool) { int ret; lockdep_assert_held(&wq_pool_mutex); ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL); if (ret >= 0) { pool->id = ret; return 0; } return ret; } * `eb2834285c` ("workqueue: fix possible pool stall bug in wq_unbind_fn()") updated wq_unbind_fn() such that it has single larger for_each_std_worker_pool() loop instead of two separate loops with a schedule() call inbetween. wq/for-3.10 renamed pool->assoc_mutex to pool->manager_mutex causing the following conflict (earlier function body and comments omitted for brevity). static void wq_unbind_fn(struct work_struct work) { ... spin_unlock_irq(&pool->lock); <<<<<<< HEAD mutex_unlock(&pool->manager_mutex); } ======= mutex_unlock(&pool->assoc_mutex); >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89 schedule(); <<<<<<< HEAD for_each_cpu_worker_pool(pool, cpu) ======= >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89 atomic_set(&pool->nr_running, 0); spin_lock_irq(&pool->lock); wake_up_worker(pool); spin_unlock_irq(&pool->lock); } } The resolution is mostly trivial. We want the control flow of the latter with the rename of the former. static void wq_unbind_fn(struct work_struct work) { ... spin_unlock_irq(&pool->lock); mutex_unlock(&pool->manager_mutex); schedule(); atomic_set(&pool->nr_running, 0); spin_lock_irq(&pool->lock); wake_up_worker(pool); spin_unlock_irq(&pool->lock); } } Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-01 18:45:36 -07:00
Tejun Heo	d55262c4d1	workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity Unbound workqueues are now NUMA aware. Let's add some control knobs and update sysfs interface accordingly. * Add kernel param workqueue.numa_disable which disables NUMA affinity globally. * Replace sysfs file "pool_id" with "pool_ids" which contain node:pool_id pairs. This change is userland-visible but "pool_id" hasn't seen a release yet, so this is okay. * Add a new sysf files "numa" which can toggle NUMA affinity on individual workqueues. This is implemented as attrs->no_numa whichn is special in that it isn't part of a pool's attributes. It only affects how apply_workqueue_attrs() picks which pools to use. After "pool_ids" change, first_pwq() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:38 -07:00
Tejun Heo	4c16bd327c	workqueue: implement NUMA affinity for unbound workqueues Currently, an unbound workqueue has single current, or first, pwq (pool_workqueue) to which all new work items are queued. This often isn't optimal on NUMA machines as workers may jump around across node boundaries and work items get assigned to workers without any regard to NUMA affinity. This patch implements NUMA affinity for unbound workqueues. Instead of mapping all entries of numa_pwq_tbl[] to the same pwq, apply_workqueue_attrs() now creates a separate pwq covering the intersecting CPUs for each NUMA node which has online CPUs in @attrs->cpumask. Nodes which don't have intersecting possible CPUs are mapped to pwqs covering whole @attrs->cpumask. As CPUs come up and go down, the pool association is changed accordingly. Changing pool association may involve allocating new pools which may fail. To avoid failing CPU_DOWN, each workqueue always keeps a default pwq which covers whole attrs->cpumask which is used as fallback if pool creation fails during a CPU hotplug operation. This ensures that all work items issued on a NUMA node is executed on the same node as long as the workqueue allows execution on the CPUs of the node. As this maps a workqueue to multiple pwqs and max_active is per-pwq, this change the behavior of max_active. The limit is now per NUMA node instead of global. While this is an actual change, max_active is already per-cpu for per-cpu workqueues and primarily used as safety mechanism rather than for active concurrency control. Concurrency is usually limited from workqueue users by the number of concurrently active work items and this change shouldn't matter much. v2: Fixed pwq freeing in apply_workqueue_attrs() error path. Spotted by Lai. v3: The previous version incorrectly made a workqueue spanning multiple nodes spread work items over all online CPUs when some of its nodes don't have any desired cpus. Reimplemented so that NUMA affinity is properly updated as CPUs go up and down. This problem was spotted by Lai Jiangshan. v4: destroy_workqueue() was putting wq->dfl_pwq and then clearing it; however, wq may be freed at any time after dfl_pwq is put making the clearing use-after-free. Clear wq->dfl_pwq before putting it. v5: apply_workqueue_attrs() was leaking @tmp_attrs, @new_attrs and @pwq_tbl after success. Fixed. Retry loop in wq_update_unbound_numa_attrs() isn't necessary as application of new attrs is excluded via CPU hotplug. Removed. Documentation on CPU affinity guarantee on CPU_DOWN added. All changes are suggested by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:36 -07:00
Tejun Heo	dce90d47c4	workqueue: introduce put_pwq_unlocked() Factor out lock pool, put_pwq(), unlock sequence into put_pwq_unlocked(). The two existing places are converted and there will be more with NUMA affinity support. This is to prepare for NUMA affinity support for unbound workqueues and doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	1befcf3073	workqueue: introduce numa_pwq_tbl_install() Factor out pool_workqueue linking and installation into numa_pwq_tbl[] from apply_workqueue_attrs() into numa_pwq_tbl_install(). link_pwq() is made safe to call multiple times. numa_pwq_tbl_install() links the pwq, installs it into numa_pwq_tbl[] at the specified node and returns the old entry. @last_pwq is removed from link_pwq() as the return value of the new function can be used instead. This is to prepare for NUMA affinity support for unbound workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	e50aba9aea	workqueue: use NUMA-aware allocation for pool_workqueues Use kmem_cache_alloc_node() with @pool->node instead of kmem_cache_zalloc() when allocating a pool_workqueue so that it's allocated on the same node as the associated worker_pool. As there's no no kmem_cache_zalloc_node(), move zeroing to init_pwq(). This was suggested by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	f147f29eb7	workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() Break init_and_link_pwq() into init_pwq() and link_pwq() and move unbound-workqueue specific handling into apply_workqueue_attrs(). Also, factor out unbound pool and pool_workqueue allocation into alloc_unbound_pwq(). This reorganization is to prepare for NUMA affinity and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	df2d5ae499	workqueue: map an unbound workqueues to multiple per-node pool_workqueues Currently, an unbound workqueue has only one "current" pool_workqueue associated with it. It may have multple pool_workqueues but only the first pool_workqueue servies new work items. For NUMA affinity, we want to change this so that there are multiple current pool_workqueues serving different NUMA nodes. Introduce workqueue->numa_pwq_tbl[] which is indexed by NUMA node and points to the pool_workqueue to use for each possible node. This replaces first_pwq() in __queue_work() and workqueue_congested(). numa_pwq_tbl[] is currently initialized to point to the same pool_workqueue as first_pwq() so this patch doesn't make any behavior changes. v2: Use rcu_dereference_raw() in unbound_pwq_by_node() as the function may be called only with wq->mutex held. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	2728fd2f09	workqueue: move hot fields of workqueue_struct to the end Move wq->flags and ->cpu_pwqs to the end of workqueue_struct and align them to the cacheline. These two fields are used in the work item issue path and thus hot. The scheduled NUMA affinity support will add dispatch table at the end of workqueue_struct and relocating these two fields will allow us hitting only single cacheline on hot paths. Note that wq->pwqs isn't moved although it currently is being used in the work item issue path for unbound workqueues. The dispatch table mentioned above will replace its use in the issue path, so it will become cold once NUMA support is implemented. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	ecf6881ff3	workqueue: make workqueue->name[] fixed len Currently workqueue->name[] is of flexible length. We want to use the flexible field for something more useful and there isn't much benefit in allowing arbitrary name length anyway. Make it fixed len capping at 24 bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:34 -07:00
Tejun Heo	6029a91829	workqueue: add workqueue->unbound_attrs Currently, when exposing attrs of an unbound workqueue via sysfs, the workqueue_attrs of first_pwq() is used as that should equal the current state of the workqueue. The planned NUMA affinity support will make unbound workqueues make use of multiple pool_workqueues for different NUMA nodes and the above assumption will no longer hold. Introduce workqueue->unbound_attrs which records the current attrs in effect and use it for sysfs instead of first_pwq()->attrs. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:34 -07:00
Tejun Heo	f3f90ad469	workqueue: determine NUMA node of workers accourding to the allowed cpumask When worker tasks are created using kthread_create_on_node(), currently only per-cpu ones have the matching NUMA node specified. All unbound workers are always created with NUMA_NO_NODE. Now that an unbound worker pool may have an arbitrary cpumask associated with it, this isn't optimal. Add pool->node which is determined by the pool's cpumask. If the pool's cpumask is contained inside a NUMA node proper, the pool is associated with that node, and all workers of the pool are created on that node. This currently only makes difference for unbound worker pools with cpumask contained inside single NUMA node, but this will serve as foundation for making all unbound pools NUMA-affine. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:34 -07:00

1 2 3 4 5 ...

453 Commits