Commit Graph

379806 Commits

Author SHA1 Message Date
Davidlohr Bueso 3079c1e6ef ipc,mqueue: remove limits for the amount of system-wide queues
commit f3713fd9cff733d9df83116422d8e4af6e86b2bb upstream.

Commit 93e6f119c0 ("ipc/mqueue: cleanup definition names and
locations") added global hardcoded limits to the amount of message
queues that can be created.  While these limits are per-namespace,
reality is that it ends up breaking userspace applications.
Historically users have, at least in theory, been able to create up to
INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
for some workloads and use cases.  For instance, Madars reports:

 "This update imposes bad limits on our multi-process application.  As
  our app uses approaches that each process opens its own set of queues
  (usually something about 3-5 queues per process).  In some scenarios
  we might run up to 3000 processes or more (which of-course for linux
  is not a problem).  Thus we might need up to 9000 queues or more.  All
  processes run under one user."

Other affected users can be found in launchpad bug #1155695:
  https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695

Instead of increasing this limit, revert it entirely and fallback to the
original way of dealing queue limits -- where once a user's resource
limit is reached, and all memory is used, new queues cannot be created.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Madars Vitolins <m@silodev.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:12 -08:00
Jan Kara 4480120a6c quota: Fix race between dqput() and dquot_scan_active()
commit 1362f4ea20fa63688ba6026e586d9746ff13a846 upstream.

Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:

CPU1					CPU2
  dqput()
    spin_lock(&dq_list_lock);
    if (atomic_read(&dquot->dq_count) > 1) {
     - not taken
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
      spin_unlock(&dq_list_lock);
      ->release_dquot(dquot);
        if (atomic_read(&dquot->dq_count) > 1)
         - not taken
					  dquot_scan_active()
					    spin_lock(&dq_list_lock);
					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
					     - not taken
					    atomic_inc(&dquot->dq_count);
					    spin_unlock(&dq_list_lock);
        - proceeds to release dquot
					    ret = fn(dquot, priv);
					     - called for inactive dquot

Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:12 -08:00
Eric Paris 3e66969eab SELinux: bigendian problems with filename trans rules
commit 9085a6422900092886da8c404e1c5340c4ff1cbf upstream.

When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian.  On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu.  So the values are all screwed up.  Write the values in le
format like it should have been to start.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:12 -08:00
Max Filippov c9f87229e1 xtensa: introduce spill_registers_kernel macro
commit e2fd1374c705abe4661df3fb6fadb3879c7c1846 upstream.

Most in-kernel users want registers spilled on the kernel stack and
don't require PS.EXCM to be set. That means that they don't need fixup
routine and could reuse regular window overflow mechanism for that,
which makes spill routine very simple.

Suggested-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Takashi Iwai f5556a68da ALSA: hda - Add a fixup for HP Folio 13 mute LED
commit 37c367ecdb9a01c9acc980e6e17913570a1788a7 upstream.

HP Folio 13 may have a broken BIOS that doesn't set up the mute LED
GPIO properly, and the driver guesses it wrongly, too.  Add a new
fixup entry for setting the GPIO pin statically for this laptop.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70991
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Peter Zijlstra 35d1c83324 perf: Fix hotplug splat
commit e3703f8cdfcf39c25c4338c3ad8e68891cca3731 upstream.

Drew Richardson reported that he could make the kernel go *boom* when hotplugging
while having perf events active.

It turned out that when you have a group event, the code in
__perf_event_exit_context() fails to remove the group siblings from
the context.

We then proceed with destroying and freeing the event, and when you
re-plug the CPU and try and add another event to that CPU, things go
*boom* because you've still got dead entries there.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Denis CIOCCA 55c830e876 iio:gyro: bug on L3GD20H gyroscope support
commit a0657716416f834ef7710a9044614d50a36c3bdc upstream.

The driver was not able to manage the sensor: during probe function
and wai check, the driver stops and writes: "device name and WhoAmI mismatch."
The correct value of L3GD20H wai is 0xd7 instead of 0xd4.
Dropped support for the sensor.

Signed-off-by: Denis Ciocca <denis.ciocca@st.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Arve Hjønnevåg b3a59abff4 staging: binder: Fix death notifications
commit e194fd8a5d8e0a7eeed239a8534460724b62fe2d upstream.

The change (008fa749e0) that moved the
node release code to a separate function broke death notifications in
some cases. When it encountered a reference without a death
notification request, it would skip looking at the remaining
references, and therefore fail to send death notifications for them.

Cc: Colin Cross <ccross@android.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Lai Jiangshan 4403be9e25 workqueue: ensure @task is valid across kthread_stop()
commit 5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb upstream.

When a kworker should die, the kworkre is notified through WORKER_DIE
flag instead of kthread_should_stop().  This, IIRC, is primarily to
keep the test synchronized inside worker_pool lock.  WORKER_DIE is
first set while holding pool->lock, the lock is dropped and
kthread_stop() is called.

Unfortunately, this means that there's a slight chance that the target
kworker may see WORKER_DIE before kthread_stop() finishes and exits
and frees the target task before or during kthread_stop().

Fix it by pinning the target task before setting WORKER_DIE and
putting it after kthread_stop() is done.

tj: Improved patch description and comment.  Moved pinning above
    WORKER_DIE for better signify what it's protecting.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Guenter Roeck 755ac7af96 hwmon: (max1668) Fix writing the minimum temperature
commit 500a91571f0a5d0d3242d83802ea2fd1faccc66e upstream.

When trying to set the minimum temperature, the driver was erroneously
writing the maximum temperature into the chip.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:11 -08:00
Chao Bi 2bc744aa8b mei: set client's read_cb to NULL when flow control fails
commit accb884b32e82f943340688c9cd30290531e73e0 upstream.

In mei_cl_read_start(), if it fails to send flow control request, it
will release "cl->read_cb" but forget to set pointer to NULL, leaving
"cl->read_cb" still pointing to random memory, next time this client is
operated like mei_release(), it has chance to refer to this wrong pointer.

Fixes:  PANIC at kfree in mei_release()

[228781.826904] Call Trace:
[228781.829737]  [<c16249b8>] ? mei_cl_unlink+0x48/0xa0
[228781.835283]  [<c1624487>] mei_io_cb_free+0x17/0x30
[228781.840733]  [<c16265d8>] mei_release+0xa8/0x180
[228781.845989]  [<c135c610>] ? __fsnotify_parent+0xa0/0xf0
[228781.851925]  [<c1325a69>] __fput+0xd9/0x200
[228781.856696]  [<c1325b9d>] ____fput+0xd/0x10
[228781.861467]  [<c125cae1>] task_work_run+0x81/0xb0
[228781.866821]  [<c1242e53>] do_exit+0x283/0xa00
[228781.871786]  [<c1a82b36>] ? kprobe_flush_task+0x66/0xc0
[228781.877722]  [<c124eeb8>] ? __dequeue_signal+0x18/0x1a0
[228781.883657]  [<c124f072>] ? dequeue_signal+0x32/0x190
[228781.889397]  [<c1243744>] do_group_exit+0x34/0xa0
[228781.894750]  [<c12517b6>] get_signal_to_deliver+0x206/0x610
[228781.901075]  [<c12018d8>] do_signal+0x38/0x100
[228781.906136]  [<c1626d1c>] ? mei_read+0x42c/0x4e0
[228781.911393]  [<c12600a0>] ? wake_up_bit+0x30/0x30
[228781.916745]  [<c16268f0>] ? mei_poll+0x120/0x120
[228781.922001]  [<c1324be9>] ? vfs_read+0x89/0x160
[228781.927158]  [<c16268f0>] ? mei_poll+0x120/0x120
[228781.932414]  [<c133ca34>] ? fget_light+0x44/0xe0
[228781.937670]  [<c1324e58>] ? SyS_read+0x68/0x80
[228781.942730]  [<c12019f5>] do_notify_resume+0x55/0x70
[228781.948376]  [<c1a7de5d>] work_notifysig+0x29/0x30
[228781.953827]  [<c1a70000>] ? bad_area+0x5/0x3e

Signed-off-by: Chao Bi <chao.bi@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Joerg Dorchain a4d9fb6ef9 USB: ftdi_sio: add Cressi Leonardo PID
commit 6dbd46c849e071e6afc1e0cad489b0175bca9318 upstream.

Hello,

the following patch adds an entry for the PID of a Cressi Leonardo
diving computer interface to kernel 3.13.0.
It is detected as FT232RL.
Works with subsurface.

Signed-off-by: Joerg Dorchain <joerg@dorchain.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Stanislaw Gruszka 2efc229a0e usb: ehci: fix deadlock when threadirqs option is used
commit a1227f3c1030e96ebc51d677d2f636268845c5fb upstream.

ehci_irq() and ehci_hrtimer_func() can deadlock on ehci->lock when
threadirqs option is used. To prevent the deadlock use
spin_lock_irqsave() in ehci_irq().

This change can be reverted when hrtimer callbacks become threaded.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Aleksander Morgado 9021ee52db USB: serial: option: blacklist interface 4 for Cinterion PHS8 and PXS8
commit 12df84d4a80278a5b1abfec3206795291da52fc9 upstream.

This interface is to be handled by the qmi_wwan driver.

CC: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
CC: Christian Schmiedl <christian.schmiedl@gemalto.com>
CC: Nicolaus Colberg <nicolaus.colberg@gemalto.com>
CC: David McCullough <david.mccullough@accelecon.com>
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Florian Fainelli a4c937c5b0 usb: gadget: bcm63xx_udc: fix build failure on DMA channel code
commit 2d1f7af3d60dd09794e0738a915d272c6c27abc5 upstream.

Commit 3dc6475 ("bcm63xx_enet: add support Broadcom BCM6345 Ethernet")
changed the ENETDMA[CS] macros such that they are no longer macros, but
actual register offset definitions. The bcm63xx_udc driver was not
updated, and as a result, causes the following build error to pop up:

 CC      drivers/usb/gadget/u_ether.o
drivers/usb/gadget/bcm63xx_udc.c: In function 'iudma_write':
drivers/usb/gadget/bcm63xx_udc.c:642:24: error: called object '0' is not
a function
drivers/usb/gadget/bcm63xx_udc.c: In function 'iudma_reset_channel':
drivers/usb/gadget/bcm63xx_udc.c:698:46: error: called object '0' is not
a function
drivers/usb/gadget/bcm63xx_udc.c:700:49: error: called object '0' is not
a function

Fix this by updating usb_dmac_{read,write}l and usb_dmas_{read,write}l to
take an extra channel argument, and use the channel width
(ENETDMA_CHAN_WIDTH) to offset the register we want to access, hence
doing again what the macro implicitely did for us.

Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Matthieu CASTET 4ebd089823 usb: chipidea: need to mask when writting endptflush and endptprime
commit 5bf5dbeda2454296f1984adfbfc8e6f5965ac389 upstream.

ENDPTFLUSH and ENDPTPRIME registers are set by software and clear
by hardware. There is a bit for each endpoint. When we are setting
a bit for an endpoint we should make sure we do not touch other
endpoint bit. There is a race condition if the hardware clear the
bit between the read and the write in hw_write.

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Matthieu CASTET <matthieu.castet@parrot.com>
Tested-by: Michael Grzeschik <mgrzeschik@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Olivier Sobrie 2551dadbe3 can: kvaser_usb: check number of channels returned by HW
commit 862474f8b46f6c1e600d4934e40ba40646c696ec upstream.

It is needed to check the number of channels returned by the HW because it
cannot be greater than MAX_NET_DEVICES otherwise it will crash.

Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:10 -08:00
Lan Tianyu 44ae49aa88 ACPI / processor: Rework processor throttling with work_on_cpu()
commit f3ca4164529b875374c410193bbbac0ee960895f upstream.

acpi_processor_set_throttling() uses set_cpus_allowed_ptr() to make
sure that the (struct acpi_processor)->acpi_processor_set_throttling()
callback will run on the right CPU.  However, the function may be
called from a worker thread already bound to a different CPU in which
case that won't work.

Make acpi_processor_set_throttling() use work_on_cpu() as appropriate
instead of abusing set_cpus_allowed_ptr().

Reported-and-tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Hans de Goede 3cb947fdcf ACPI / video: Filter the _BCL table for duplicate brightness values
commit bd8ba20597f0cfef3ef65c3fd2aa92ab23d4c8e1 upstream.

Some devices have duplicate entries in there brightness levels table, ie
on my Dell Latitude E6430 the table looks like this:

[    3.686060] acpi backlight index   0, val 80
[    3.686095] acpi backlight index   1, val 50
[    3.686122] acpi backlight index   2, val 5
[    3.686147] acpi backlight index   3, val 5
[    3.686172] acpi backlight index   4, val 5
[    3.686197] acpi backlight index   5, val 5
[    3.686223] acpi backlight index   6, val 5
[    3.686248] acpi backlight index   7, val 5
[    3.686273] acpi backlight index   8, val 6
[    3.686332] acpi backlight index   9, val 7
[    3.686356] acpi backlight index  10, val 8
[    3.686380] acpi backlight index  11, val 9
etc.

Notice that brightness values 0-5 are all mapped to 5. This means that
if userspace writes any value between 0 and 5 to the brightness sysfs attribute
and then reads it, it will always return 0, which is somewhat unexpected.

This is a problem for ie gnome-settings-daemon, which uses read-modify-write
logic when the users presses the brightness up or down keys. This is done
this way to take brightness changes from other sources into account.

On this specific laptop what happens once the brightness has been set to 0,
is that gsd reads 0, adds 5, writes 5, and on the next brightness up key press
again reads 0, so things get stuck at the lowest brightness setting.

Filtering out the duplicate table entries, makes any write to brightness
read back as the written value as one would expect, fixing this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Jean Delvare 6b96200e1d i7core_edac: Fix PCI device reference count
commit c0f5eeed0f4cef4f05b74883a7160e7edde58b6a upstream.

The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.

In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.

This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Tomasz Nowicki 63b5b009bd ACPI / PCI: Fix memory leak in acpi_pci_irq_enable()
commit b685f3b1744061aa9ad822548ba9c674de5be7c6 upstream.

acpi_pci_link_allocate_irq() can return negative gsi even if
entry != NULL.  For that case we have a memory leak, so free
entry before returning from acpi_pci_irq_enable() for gsi < 0.

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
[rjw: Subject and changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Bjorn Helgaas 9942fc0221 PCI: Enable INTx if BIOS left them disabled
commit 1f42db786b14a31bf807fc41ee5583a00c08fcb1 upstream.

Some firmware leaves the Interrupt Disable bit set even if the device uses
INTx interrupts.  Clear Interrupt Disable so we get those interrupts.

Based on the report mentioned below, if the user selects the "EHCI only"
option in the Intel Baytrail BIOS, the EHCI device is handed off to the OS
with the PCI_COMMAND_INTX_DISABLE bit set.

Link: http://lkml.kernel.org/r/20140114181721.GC12126@xanatos
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70601
Reported-by: Chris Cheng <chris.cheng@atrustcorp.com>
Reported-and-tested-by: Jamie Chen <jamie.chen@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Srivatsa S. Bhat 1bcccca64c cpufreq: powernow-k8: Initialize per-cpu data-structures properly
commit c3274763bfc3bf1ececa269ed6e6c4d7ec1c3e5e upstream.

The powernow-k8 driver maintains a per-cpu data-structure called
powernow_data that is used to perform the frequency transitions.
It initializes this data structure only for the policy->cpu. So,
accesses to this data structure by other CPUs results in various
problems because they would have been uninitialized.

Specifically, if a cpu (!= policy->cpu) invokes the drivers' ->get()
function, it returns 0 as the KHz value, since its per-cpu memory
doesn't point to anything valid. This causes problems during
suspend/resume since cpufreq_update_policy() tries to enforce this
(0 KHz) as the current frequency of the CPU, and this madness gets
propagated to adjust_jiffies() as well. Eventually, lots of things
start breaking down, including the r8169 ethernet card, in one
particularly interesting case reported by Pierre Ossman.

Fix this by initializing the per-cpu data-structures of all the CPUs
in the policy appropriately.

References: https://bugzilla.kernel.org/show_bug.cgi?id=70311
Reported-by: Pierre Ossman <pierre@ossman.eu>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Tejun Heo 599b45bd7f sata_sil: apply MOD15WRITE quirk to TOSHIBA MK2561GSYN
commit 9f9c47f00ce99329b1a82e2ac4f70f0fe3db549c upstream.

It's a bit odd to see a newer device showing mod15write; however, the
reported behavior is highly consistent and other factors which could
contribute seem to have been verified well enough.  Also, both
sata_sil itself and the drive are fairly outdated at this point making
the risk of this change fairly low.  It is possible, probably likely,
that other drive models in the same family have the same problem;
however, for now, let's just add the specific model which was tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: matson <lists-matsonpa@luxsci.me>
References: http://lkml.kernel.org/g/201401211912.s0LJCk7F015058@rs103.luxsci.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:09 -08:00
Denis V. Lunev 69f554c94a ata: enable quirk from jmicron JMB350 for JMB394
commit efb9e0f4f43780f0ae0c6428d66bd03e805c7539 upstream.

Without the patch the kernel generates the following error.

 ata11.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
 ata11.15: Port Multiplier vendor mismatch '0x197b' != '0x123'
 ata11.15: PMP revalidation failed (errno=-19)
 ata11.15: failed to recover PMP after 5 tries, giving up

This patch helps to bypass this error and the device becomes
functional.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Peter Zijlstra f84d45345a perf/x86: Fix event scheduling
commit 26e61e8939b1fe8729572dabe9a9e97d930dd4f6 upstream.

Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.

This is I think the relevant bit:

	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)

So we add the insn:p event (fd[23]).

At this point we should have:

  n_events = 2, n_added = 1, n_txn = 1

	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)

We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.

	group_sched_in()
	  pmu->start_txn() /* nop - BP pmu */
	  event_sched_in()
	     event->pmu->add()

So here we should end up with:

  0: n_events = 3, n_added = 2, n_txn = 2
  4: n_events = 4, n_added = 3, n_txn = 3

But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.

Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.

But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.

However, since we try and schedule 4 it means the 0 event must have
succeeded!  Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:

	event_sched_out()
	  event->pmu->del()

on 0 and the BP event.

Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:

 n_event = 2, n_added = 2, n_txn = 2

	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0

So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Marek Szyprowski a9710605da x86: dma-mapping: fix GFP_ATOMIC macro usage
commit c091c71ad2218fc50a07b3d1dab85783f3b77efd upstream.

GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller
wants to perform an atomic allocation, the code must test for a lack of the
__GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Levente Kurusa aa5b8c4513 ahci: disable NCQ on Samsung pci-e SSDs on macbooks
commit 67809f85d31eac600f6b28defa5386c9d2a13b1d upstream.

Samsung's pci-e SSDs with device ID 0x1600 which are found on some
macbooks time out on NCQ commands.  Blacklist NCQ on the device so
that the affected machines can at least boot.

Original-patch-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60731
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Laurent Dufour fb22dbab12 powerpc/crashdump : Fix page frame number check in copy_oldmem_page
commit f5295bd8ea8a65dc5eac608b151386314cb978f1 upstream.

In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
decide if the page is backed or not, is not valid when the memory layout is
not continuous.

This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
in the memory. In that case max_pfn points to the end of RTAS, and a hole
between the end of the kdump kernel and RTAS is not backed by PTEs. As a
consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
in a direct way the pages in that hole.

This fix relies on the memblock's service memblock_is_region_memory to
check if the read page is part or not of the directly accessible memory.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Tony Breeds 64747d3d2f powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
commit 41dd03a94c7d408d2ef32530545097f7d1befe5c upstream.

Currently we're storing a host endian RTAS token in
rtas_stop_self_args.token.  We then pass that directly to rtas.  This is
fine on big endian however on little endian the token is not what we
expect.

This will typically result in hitting:
	panic("Alas, I survived.\n");

To fix this we always use the stop-self token in host order and always
convert it to be32 before passing this to rtas.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Trond Myklebust 830d5bd83a SUNRPC: Fix races in xs_nospace()
commit 06ea0bfe6e6043cb56a78935a19f6f8ebc636226 upstream.

When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.

Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52ee1 (SUNRPC: Don't start the retransmission timer...)
Reported-by: Neil Brown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:08 -08:00
Lars-Peter Clausen 6319b13bb7 ASoC: wm8958-dsp: Fix firmware block loading
commit 548da08fc1e245faf9b0d7c41ecd8e07984fc332 upstream.

The codec->control_data contains a pointer to the device's regmap struct. But
wm8994_bulk_write() expects a pointer to the parent wm8998 device.

The issue was introduced in commit d9a7666f ("ASoC: Remove ASoC-specific
WM8994 I/O code").

Fixes: d9a7666f ("ASoC: Remove ASoC-specific WM8994 I/O code")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:07 -08:00
Takashi Iwai 76a94d6a33 ASoC: sta32x: Fix array access overflow
commit 025c3fa9256d4c54506b7a29dc3befac54f5c68d upstream.

Preset EQ enum of sta32x codec driver declares too many number of
items and it may lead to the access over the actual array size.

Use SOC_ENUM_SINGLE_DECL() helper and it's automatically fixed.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:07 -08:00
Takashi Iwai 27b5a374da ASoC: sta32x: Fix wrong enum for limiter2 release rate
commit b3619b288b621e63f66908045f48495869a996a6 upstream.

There is a typo in the Limiter2 Release Rate control, a wrong enum for
Limiter1 is assigned.  It must point to Limiter2.
Spotted by a compile warning:

In file included from sound/soc/codecs/sta32x.c:34:0:
sound/soc/codecs/sta32x.c:223:29: warning: ‘sta32x_limiter2_release_rate_enum’ defined but not used [-Wunused-variable]
 static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
                             ^
include/sound/soc.h:275:18: note: in definition of macro ‘SOC_ENUM_DOUBLE_DECL’
  struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
                  ^
sound/soc/codecs/sta32x.c:223:8: note: in expansion of macro ‘SOC_ENUM_SINGLE_DECL’
 static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
        ^

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:07 -08:00
Lars-Peter Clausen df87c8141c ASoC: sta32x: Fix cache sync
commit 70ff00f82a6af0ff68f8f7b411738634ce2f20d0 upstream.

codec->control_data contains a pointer to the regmap struct of the device, not
to the device private data. Use snd_soc_codec_get_drvdata() instead.

The issue was introduced in commit 29fdf4fbbe ("ASoC: sta32x: Convert to
regmap").

Fixes: 29fdf4fbbe (ASoC: sta32x: Convert to regmap)
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:07 -08:00
Mark Brown 8efd034580 ASoC: da732x: Mark DC offset control registers volatile
commit 75306820248e26d15d84acf4e297b9fb27dd3bb2 upstream.

The driver reads from the DC offset control registers during callibration
but since the registers are marked as volatile and there is a register
cache the values will not be read from the hardware after the first reading
rendering the callibration ineffective.

It appears that the driver was originally written for the ASoC level
register I/O code but converted to regmap prior to merge and this issue
was missed during the conversion as the framework level volatile register
functionality was not being used.

Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:07 -08:00
Takashi Iwai 1276754ca6 ASoC: wm8770: Fix wrong number of enum items
commit 7a6c0a58dc824523966f212c76322d47c5b0e6fe upstream.

wm8770 codec driver defines ain_enum with a wrong number of items.

Use SOC_ENUM_DOUBLE_DECL() macro and it's automatically fixed.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:07 -08:00
Dylan Reid ecbda0477f ASoC: max98090: sync regcache on entering STANDBY
commit c42c8922c46d33ed769e99618bdfba06866a0c72 upstream.

Sync regcache when entering STANDBY from OFF.  ON isn't entered with
OFF as the current state, so the registers were not being re-synced
after suspend/resume.

The 98088 and 98095 already call regcache_sync from STANDBY.

Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Andrew Honig 5a03bc087c kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
commit a08d3b3b99efd509133946056531cdf8f3a0c09b upstream.

The problem occurs when the guest performs a pusha with the stack
address pointing to an mmio address (or an invalid guest physical
address) to start with, but then extending into an ordinary guest
physical address.  When doing repeated emulated pushes
emulator_read_write sets mmio_needed to 1 on the first one.  On a
later push when the stack points to regular memory,
mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.

As a result, KVM exits to userspace, and then returns to
complete_emulated_mmio.  In complete_emulated_mmio
vcpu->mmio_cur_fragment is incremented.  The termination condition of
vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
The code bounces back and fourth to userspace incrementing
mmio_cur_fragment past it's buffer.  If the guest does nothing else it
eventually leads to a a crash on a memcpy from invalid memory address.

However if a guest code can cause the vm to be destroyed in another
vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
can be used by the guest to control the data that's pointed to by the
call to cancel_work_item, which can be used to gain execution.

Fixes: f78146b0f9
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Hui Wang 80c2450409 ALSA: hda - Enable front audio jacks on one HP desktop model
commit 1de7ca5e844866f56bebb2fc47fa18e090677e88 upstream.

The front headphone and mic jackes on a HP desktop model (Vendor Id:
0x111d76c7 Subsystem Id: 0x103c2b17) can not work, the codec on this
machine has 8 physical ports, 6 of them are routed to rear jackes
and all of them work very well, while the remaining 2 ports are
routed to front headphone and mic jackes, but the corresponding
pin complex node are not defined correctly.

After apply this fix, the front audio jackes can work very well.

[trivial fix of enum definition by tiwai]

BugLink: https://bugs.launchpad.net/bugs/1282369
Cc: David Henningsson <david.henningsson@canonical.com>
Tested-by: Gerald Yang <gerald.yang@canonical.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Hsin-Yu Chao 3d92e99d70 ALSA: hda/ca0132 - Fix recording from mode id 0x8
commit 13c12dbe3a2ce17227f7ddef652b6a53c78fa51f upstream.

Incorrect ADC is picked in ca0132_capture_pcm_prepare(),
where it assumes multiple streams while there is one stream
per ADC. Note that ca0132_capture_pcm_cleanup() already does
the right thing.

The Chromebook Pixel has a microphone under the keyboard that
is attached to node id 0x8. Before this fix, recording would
always go to the main internal mic (node id 0x7).

Signed-off-by: Hsin-Yu Chao <hychao@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Hsin-Yu Chao 6e52b4fd38 ALSA: hda/ca0132 - setup/cleanup streams
commit 28fba95087a7f3d107a3a6728aef7dbfaf3fd782 upstream.

When a HDMI stream is opened with the same stream tag
as a following opened stream to ca0132, audio will be
heard from two ports simultaneously.
Fix this issue by change to use snd_hda_codec_setup_stream
and snd_hda_codec_cleanup_stream instead, so that an
inactive stream can be marked as 'dirty' when found
with a conflict stream tag, and then get purified.

Signed-off-by: Hsin-Yu Chao <hychao@chromium.org>
Reviewed-by: Chih-Chung Chang <chihchung@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Clemens Ladisch 19ee64e6c2 ALSA: usb-audio: work around KEF X300A firmware bug
commit 624aef494f86ed0c58056361c06347ad62b26806 upstream.

When the driver tries to access Function Unit 10, the KEF X300A
speakers' firmware apparently locks up, making even PCM streaming
impossible.  Work around this by ignoring this FU.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Christoph Hellwig d5685be133 fs: fix iversion handling
commit dff6efc326a4d5f305797d4a6bba14f374fdd633 upstream.

Currently notify_change directly updates i_version for size updates,
which not only is counter to how all other fields are updated through
struct iattr, but also breaks XFS, which need inode updates to happen
under its own lock, and synchronized to the structure that gets written
to the log.

Remove the update in the common code, and it to btrfs and ext4,
XFS already does a proper updaste internally and currently gets a
double update with the existing code.

IMHO this is 3.13 and -stable material and should go in through the XFS
tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:06 -08:00
Michal Hocko 5fb67b91df memcg: fix endless loop caused by mem_cgroup_iter
commit ecc736fc3c71c411a9d201d8588c9e7e049e5d8c upstream.

Hugh has reported an endless loop when the hardlimit reclaim sees the
same group all the time.  This might happen when the reclaim races with
the memcg removal.

shrink_zone
                                                [rmdir root]
  mem_cgroup_iter(root, NULL, reclaim)
    // prev = NULL
    rcu_read_lock()
    mem_cgroup_iter_load
      last_visited = iter->last_visited   // gets root || NULL
      css_tryget(last_visited)            // failed
      last_visited = NULL                 [1]
    memcg = root = __mem_cgroup_iter_next(root, NULL)
    mem_cgroup_iter_update
      iter->last_visited = root;
    reclaim->generation = iter->generation

 mem_cgroup_iter(root, root, reclaim)
   // prev = root
   rcu_read_lock
    mem_cgroup_iter_load
      last_visited = iter->last_visited   // gets root
      css_tryget(last_visited)            // failed
    [1]

The issue seemed to be introduced by commit 5f57816197 ("memcg: relax
memcg iter caching") which has replaced unconditional css_get/css_put by
css_tryget/css_put for the cached iterator.

This patch fixes the issue by skipping css_tryget on the root of the
tree walk in mem_cgroup_iter_load and symmetrically doesn't release it
in mem_cgroup_iter_update.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org>	[3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:05 -08:00
Eric Dumazet a9e3d78962 net: use __GFP_NORETRY for high order allocations
[ Upstream commit ed98df3361f059db42786c830ea96e2d18b8d4db ]

sock_alloc_send_pskb() & sk_page_frag_refill()
have a loop trying high order allocations to prepare
skb with low number of fragments as this increases performance.

Problem is that under memory pressure/fragmentation, this can
trigger OOM while the intent was only to try the high order
allocations, then fallback to order-0 allocations.

We had various reports from unexpected regressions.

According to David, setting __GFP_NORETRY should be fine,
as the asynchronous compaction is still enabled, and this
will prevent OOM from kicking as in :

CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0,
oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled
CFSClientEventm

Call Trace:
 [<ffffffff8043766c>] dump_header+0xe1/0x23e
 [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
 [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
 [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
 [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
 [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
 [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
 [<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0
 [<ffffffff802a5037>] inet_sendmsg+0x67/0x100
 [<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90
 [<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0
 [<ffffffff802847b6>] __sys_sendmsg+0x136/0x430
 [<ffffffff80284ec8>] sys_sendmsg+0x88/0x110
 [<ffffffff80711472>] system_call_fastpath+0x16/0x1b
Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:05 -08:00
Florian Westphal d868190cc2 net: ip, ipv6: handle gso skbs in forwarding path
commit fe6cc55f3a9a053482a76f5a6b2257cee51b4663 upstream.

Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:05 -08:00
Florian Westphal a999dd5c18 net: core: introduce netif_skb_dev_features
commit d206940319c41df4299db75ed56142177bb2e5f6 upstream.

Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:05 -08:00
Florian Westphal 3fb03b59b4 net: add and use skb_gso_transport_seglen()
commit de960aa9ab4decc3304959f69533eef64d05d8e8 upstream.

[ no skb_gso_seglen helper in 3.10, leave tbf alone ]

This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:05 -08:00
Daniel Borkmann b306713b63 net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode
[ Upstream commit ffd5939381c609056b33b7585fb05a77b4c695f3 ]

SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit
emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of
'struct sctp_getaddrs_old' which includes a struct sockaddr pointer,
sizeof(param) check will always fail in kernel as the structure in
64bit kernel space is 4bytes larger than for user binaries compiled
in 32bit mode. Thus, applications making use of sctp_connectx() won't
be able to run under such circumstances.

Introduce a compat interface in the kernel to deal with such
situations by using a 'struct compat_sctp_getaddrs_old' structure
where user data is copied into it, and then sucessively transformed
into a 'struct sctp_getaddrs_old' structure with the help of
compat_ptr(). That fixes sctp_connectx() abi without any changes
needed in user space, and lets the SCTP test suite pass when compiled
in 32bit and run on 64bit kernels.

Fixes: f9c67811eb ("sctp: Fix regression introduced by new sctp_connectx api")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:05 -08:00