Commit Graph

2860 Commits

Author SHA1 Message Date
Peter Zijlstra c5ac12693f arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 4e857c58efeb99393cba5a5d0d8ec7117183137c
[joonwoop@codeaurora.org: fixed trivial merge conflict.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-08-15 11:45:28 -07:00
Amir Samuelov 70b031848c platform: msm: fix PFT for 64-bit
Fix compile warning for 64-bit platform.

Change-Id: Ie19f331a893f3265eef70e70945e1edd3268c4b0
Signed-off-by: Amir Samuelov <amirs@codeaurora.org>
2014-07-06 13:51:12 +03:00
Amir Samuelov 8307869921 platform: msm: fix PFT when using direct-io
When read/write a file using dircet-io (O_DIRECT),
we can't get the inode from a bio by walking the path
bio->bi_io_vec->bv_page->mapping->host
since the page is anonymous and not mapped.
On the other more typical cases when not using O_DIRECT,
the page cache is used and the bv_page is available.

Change-Id: I349cad54a978ed9919f960d55f0f95c1e53262e5
Signed-off-by: Amir Samuelov <amirs@codeaurora.org>
2014-06-23 21:38:40 +03:00
Ian Maund 491fb5c232 Merge upstream tag 'v3.10.40' into msm-3.10
* commit 'v3.10.40': (203 commits)
  Linux 3.10.40
  ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe
  drm: cirrus: add power management support
  Input: synaptics - add min/max quirk for ThinkPad Edge E431
  Input: synaptics - add min/max quirk for ThinkPad T431s, L440, L540, S1 Yoga and X1
  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
  dm thin: fix dangling bio in process_deferred_bios error path
  dm transaction manager: fix corruption due to non-atomic transaction commit
  Skip intel_crt_init for Dell XPS 8700
  mtd: sm_ftl: heap corruption in sm_create_sysfs_attributes()
  mtd: nuc900_nand: NULL dereference in nuc900_nand_enable()
  mtd: atmel_nand: Disable subpage NAND write when using Atmel PMECC
  tgafb: fix data copying
  gpio: mxs: Allow for recursive enable_irq_wake() call
  rtlwifi: rtl8188ee: initialize packet_beacon
  rtlwifi: rtl8192se: Fix regression due to commit 1bf4bbb
  rtlwifi: rtl8192se: Fix too long disable of IRQs
  rtlwifi: rtl8192cu: Fix too long disable of IRQs
  rtlwifi: rtl8188ee: Fix too long disable of IRQs
  rtlwifi: rtl8723ae: Fix too long disable of IRQs
  ...

Change-Id: If5388cf980cb123e35e1b29275ba288c89c5aa18
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-18 13:10:54 -07:00
Linux Build Service Account 0ccc34fcb6 Merge "dm-crypt: sort writes" 2014-06-11 10:56:56 -07:00
Linux Build Service Account 057c6edf81 Merge "dm-crypt: offload writes to thread" 2014-06-11 10:56:55 -07:00
Linux Build Service Account 3e924cec74 Merge "dm-crypt: remove io_pool" 2014-06-11 10:56:54 -07:00
Linux Build Service Account 382821f43c Merge "dm-crypt: avoid deadlock in mempools" 2014-06-11 10:56:53 -07:00
Linux Build Service Account 416b849eab Merge "dm-crypt: don't allocate pages for a partial request" 2014-06-11 10:56:52 -07:00
Linux Build Service Account cfeac85536 Merge "dm-crypt: use unbound workqueue for request processing" 2014-06-11 10:56:51 -07:00
Linux Build Service Account 2a2d1f4e98 Merge "dm-crypt: use per-bio data" 2014-06-11 10:56:50 -07:00
Linux Build Service Account 07b061120d Merge "dm-crypt: remove per-cpu structure" 2014-06-11 10:56:48 -07:00
Mikulas Patocka 364a180149 dm-crypt: sort writes
Write requests are sorted in a red-black tree structure and are submitted
in the sorted order.

In theory the sorting should be performed by the underlying disk scheduler,
however, in practice the disk scheduler accepts and sorts only 128 requests.
In order to sort more requests, we need to implement our own sorting.

CRs-fixed: 670391
Change-Id: Iffd9345fa1253f5cf2a556893ed36e08f1ac51aa
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Patch-mainline: dm-devel @ 04/05/14, 14:09
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:53 -07:00
Mikulas Patocka 980c01db22 dm-crypt: offload writes to thread
Submitting write bios directly in the encryption thread caused serious
performance degradation. On multiprocessor machine encryption requests
finish in a different order than they were submitted in. Consequently, write
requests would be submitted in a different order and it could cause severe
performance degradation.

This patch moves submitting write requests to a separate thread so that
the requests can be sorted before submitting.

Sorting is implemented in the next patch.

Note: it is required that a previous patch "dm-crypt: don't allocate pages
for a partial request." is applied before applying this patch. Without
that, this patch could introduce a crash.

CRs-fixed: 670391
Change-Id: I886ed2da0ff174d3539ea18e27170d7fd1062680
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Patch-mainline: dm-devel @ 04/05/14, 14:08
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:53 -07:00
Mikulas Patocka 30147f3526 dm-crypt: remove io_pool
Remove io_pool and _crypt_io_pool because they are unused.

CRs-fixed: 670391
Change-Id: I71400ecda66902c56c3af981e8b739f156db1e27
Signed-off-by: Mikulas Patocka <mpatock@redhat.com>
Patch-mainline: dm-devel @ 04/05/14, 14:08
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:52 -07:00
Mikulas Patocka e1254d136d dm-crypt: avoid deadlock in mempools
This patch fixes a theoretical deadlock introduced in the previous patch.

The function crypt_alloc_buffer may be called concurrently. If we allocate
from the mempool concurrently, there is a possibility of deadlock.
For example, if we have mempool of 256 pages, two processes, each wanting 256,
pages allocate from the mempool concurrently, it may deadlock in a situation
where both processes have allocated 128 pages and the mempool is exhausted.

In order to avoid this scenarios, we allocate the pages under a mutex.

In order to not degrade performance with excessive locking, we try
non-blocking allocations without a mutex first and if it fails, we fallback
to a blocking allocation with a mutex.

CRs-fixed: 670391
Change-Id: I6c391dece4ba44fe0b2e9b75ea2b9235bf1b525b
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Patch-mainline: dm-devel @ 04/05/14, 14:07
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:52 -07:00
Mikulas Patocka db4c8ba31c dm-crypt: don't allocate pages for a partial request
This patch changes crypt_alloc_buffer so that it always allocates pages for
a full request.

This change enables further simplification and removing of one refcounts
in the next patches.

Note: the next patch is needed to fix a theoretical deadlock

CRs-fixed: 670391
Change-Id: I7bcadac8b3450976366c701fceb1fee7cb18df85
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[joonwoop@codeaurora.org: resolve trivial merge conflicts]
Patch-mainline: dm-devel @ 04/05/14, 14:07
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:52 -07:00
Mikulas Patocka 0a03cc03c9 dm-crypt: use unbound workqueue for request processing
Use unbound workqueue so that work is automatically ballanced between
available CPUs.

CRs-fixed: 670391
Change-Id: I169099d0b5b27535633c9d3aaab2037b5fea6aa9
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[joonwoop@codeaurora.org: resolve trivial merge conflict]
Patch-mainline: dm-devel @ 04/05/14, 14:06
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:52 -07:00
Mikulas Patocka 5b03d4b7bb dm-crypt: use per-bio data
This patch changes dm-crypt so that it uses auxiliary data allocated with
the bio.

Dm-crypt requires two allocations per request - struct dm_crypt_io and
struct ablkcipher_request (with other data appended to it). It used
mempool for the allocation.

Some requests may require more dm_crypt_ios and ablkcipher_requests,
however most requests need just one of each of these two structures to
complete.

This patch changes it so that the first dm_crypt_io and ablkcipher_request
and allocated with the bio (using target per_bio_data_size option). If the
request needs additional values, they are allocated from the mempool.

CRs-fixed: 670391
Change-Id: I8abc48a021391398f3b35bdd4ac9efbbec3a9fa5
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Patch-mainline: dm-devel @ 04/05/14, 14:05
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:52 -07:00
Mikulas Patocka 58423ac034 dm-crypt: remove per-cpu structure
Dm-crypt used per-cpu structures to hold pointers to ablkcipher_request.
The code assumed that the work item keeps executing on a single CPU, so it
used no synchronization when accessing this structure.

When we disable a CPU by writing zero to
/sys/devices/system/cpu/cpu*/online, the work item could be moved to
another CPU. This causes crashes in dm-crypt because the code starts using
a wrong ablkcipher_request.

This patch fixes this bug by removing the percpu definition. The structure
ablkcipher_request is accessed via a pointer from convert_context.
Consequently, if the work item is rescheduled to a different CPU, the
thread still uses the same ablkcipher_request.

CRs-fixed: 670391
Change-Id: Ib94281c719aedb1d53b853d38085d2e3f884665b
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Patch-mainline: dm-devel @ 04/05/14, 14:04
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-06-05 16:45:51 -07:00
AnilKumar Chimata 02a4f15149 crypto: msm: Move qcrypto.h header file
Move header file from architecture folder to include/linux
folder.

Change-Id: I20a653b272ec21419706cb02bc7c1beac20802eb
Signed-off-by: AnilKumar Chimata <anilc@codeaurora.org>
2014-06-06 04:04:23 +05:30
Amir Samuelov da92eea47b dm: dm-req-crypt: add support for Per-File-Encryption
Support Per-File-Encryption (PFE) based on file tagging.
The Per-File-Tagger (PFT) reports if a file should be encrypted or not.
The Per-File-Encryption can be used after Full-Disk-Encryption (FDE) was
completed or without FDE.
The PFE and FDE uses different keys, which are managed by the Trust-Zone.

Change-Id: I727ef11e252649f895a5e3f8a49ca848cea50795
Signed-off-by: Amir Samuelov <amirs@codeaurora.org>
2014-06-03 09:55:10 +03:00
Dinesh K Garg 40acb5587b Setting correct HW crypto driver instance for FDE
HW crypto provides different instance for different use cases.
Dm-req-crypt should use the one for full disk encryption use case.

Change-Id: I22d3f6ab1dd6f83b0d5b83a681236719e2340934
Signed-off-by: Dinesh K Garg <dineshg@codeaurora.org>
2014-05-15 14:48:36 -07:00
Mike Snitzer e2e8845036 dm thin: fix dangling bio in process_deferred_bios error path
commit fe76cd88e654124d1431bb662a0fc6e99ca811a5 upstream.

If unable to ensure_next_mapping() we must add the current bio, which
was removed from the @bios list via bio_list_pop, back to the
deferred_bios list before all the remaining @bios.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-13 13:59:46 +02:00
Joe Thornber ff3296214c dm transaction manager: fix corruption due to non-atomic transaction commit
commit a9d45396f5956d0b615c7ae3b936afd888351a47 upstream.

The persistent-data library used by dm-thin, dm-cache, etc is
transactional.  If anything goes wrong, such as an io error when writing
new metadata or a power failure, then we roll back to the last
transaction.

Atomicity when committing a transaction is achieved by:

a) Never overwriting data from the previous transaction.
b) Writing the superblock last, after all other metadata has hit the
   disk.

This commit and the following commit ("dm: take care to copy the space
map roots before locking the superblock") fix a bug associated with (b).
When committing it was possible for the superblock to still be written
in spite of an io error occurring during the preceeding metadata flush.
With these commits we're careful not to take the write lock out on the
superblock until after the metadata flush has completed.

Change the transaction manager's semantics for dm_tm_commit() to assume
all data has been flushed _before_ the single superblock that is passed
in.

As a prerequisite, split the block manager's block unlocking and
flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush().  Now
the unlocking must be done separately.

This issue was discovered by forcing io errors at the crucial time
using dm-flakey.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-13 13:59:45 +02:00
Ian Maund 356fb13538 Merge upstream linux-stable v3.10.36 into msm-3.10
* commit 'v3.10.36': (494 commits)
  Linux 3.10.36
  netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages
  mm: close PageTail race
  net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE
  x86: fix boot on uniprocessor systems
  Input: cypress_ps2 - don't report as a button pads
  Input: synaptics - add manual min/max quirk for ThinkPad X240
  Input: synaptics - add manual min/max quirk
  Input: mousedev - fix race when creating mixed device
  ext4: atomically set inode->i_flags in ext4_set_inode_flags()
  Linux 3.10.35
  sched/autogroup: Fix race with task_groups list
  e100: Fix "disabling already-disabled device" warning
  xhci: Fix resume issues on Renesas chips in Samsung laptops
  Input: wacom - make sure touch_max is set for touch devices
  KVM: VMX: fix use after free of vmx->loaded_vmcs
  KVM: x86: handle invalid root_hpa everywhere
  KVM: MMU: handle invalid root_hpa at __direct_map
  Input: elantech - improve clickpad detection
  ARM: highbank: avoid L2 cache smc calls when PL310 is not present
  ...

Change-Id: Ib68f565291702c53df09e914e637930c5d3e5310
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-04-23 16:23:49 -07:00
Dinesh K Garg def50e6b5e dm: Clean up dm-req-crypt
dm-req-crypt defines optional functions from device mapper. These
functions are not required for dm-req-crypt. dm-req-crypt has few
memory de-allocation issues. This change removes unnecessary
functions and fixes memory deallocation issues as well.

Change-Id: Id36bd13989940ce41571a8a10277730048df6f6d
Signed-off-by: Dinesh K Garg <dineshg@codeaurora.org>
2014-04-14 21:49:31 -07:00
Ian Maund f1b32d4e47 Merge upstream linux-stable v3.10.28 into msm-3.10
The following commits have been reverted from this merge, as they are
known to introduce new bugs and are currently incompatible with our
audio implementation. Investigation of these commits is ongoing, and
they are expected to be brought in at a later time:

86e6de7 ALSA: compress: fix drain calls blocking other compress functions (v6)
16442d4 ALSA: compress: fix drain calls blocking other compress functions

This merge commit also includes a change in block, necessary for
compilation. Upstream has modified elevator_init_fn to prevent race
conditions, requring updates to row_init_queue and test_init_queue.

* commit 'v3.10.28': (1964 commits)
  Linux 3.10.28
  ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  serial: amba-pl011: use port lock to guard control register access
  mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
  md/raid5: Fix possible confusion when multiple write errors occur.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md: fix problem when adding device to read-only array with bitmap.
  drm/i915: fix DDI PLLs HW state readout code
  nilfs2: fix segctor bug that causes file system corruption
  thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
  ftrace/x86: Load ftrace_ops in parameter not the variable holding it
  SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
  writeback: Fix data corruption on NFS
  hwmon: (coretemp) Fix truncated name of alarm attributes
  vfs: In d_path don't call d_dname on a mount point
  staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq()
  staging: comedi: addi_apci_1032: fix subdevice type/flags bug
  mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
  GFS2: Increase i_writecount during gfs2_setattr_chown
  perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
  perf scripting perl: Fix build error on Fedora 12
  ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic
  Linux 3.10.27
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
  netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
  SCSI: sd: Reduce buffer size for vpd request
  intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
  mac80211: move "bufferable MMPDU" check to fix AP mode scan
  ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
  ACPI / TPM: fix memory leak when walking ACPI namespace
  mfd: rtsx_pcr: Disable interrupts before cancelling delayed works
  clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks
  clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock
  clk: samsung: exynos4: Correct SRC_MFC register
  clk: clk-divider: fix divisor > 255 bug
  ahci: add PCI ID for Marvell 88SE9170 SATA controller
  parisc: Ensure full cache coherency for kmap/kunmap
  drm/nouveau/bios: make jump conditional
  ARM: shmobile: mackerel: Fix coherent DMA mask
  ARM: shmobile: armadillo: Fix coherent DMA mask
  ARM: shmobile: kzm9g: Fix coherent DMA mask
  ARM: dts: exynos5250: Fix MDMA0 clock number
  ARM: fix "bad mode in ... handler" message for undefined instructions
  ARM: fix footbridge clockevent device
  net: Loosen constraints for recalculating checksum in skb_segment()
  bridge: use spin_lock_bh() in br_multicast_set_hash_max
  netpoll: Fix missing TXQ unlock and and OOPS.
  net: llc: fix use after free in llc_ui_recvmsg
  virtio-net: fix refill races during restore
  virtio_net: don't leak memory or block when too many frags
  virtio-net: make all RX paths handle errors consistently
  virtio_net: fix error handling for mergeable buffers
  vlan: Fix header ops passthru when doing TX VLAN offload.
  net: rose: restore old recvmsg behavior
  rds: prevent dereference of a NULL device
  ipv6: always set the new created dst's from in ip6_rt_copy
  net: fec: fix potential use after free
  hamradio/yam: fix info leak in ioctl
  drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()
  net: inet_diag: zero out uninitialized idiag_{src,dst} fields
  ip_gre: fix msg_name parsing for recvfrom/recvmsg
  net: unix: allow bind to fail on mutex lock
  ipv6: fix illegal mac_header comparison on 32bit
  netvsc: don't flush peers notifying work during setting mtu
  tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0
  net: unix: allow set_peek_off to fail
  net: drop_monitor: fix the value of maxattr
  ipv6: don't count addrconf generated routes against gc limit
  packet: fix send path when running with proto == 0
  virtio: delete napi structures from netdev before releasing memory
  macvtap: signal truncated packets
  tun: update file current position
  macvtap: update file current position
  macvtap: Do not double-count received packets
  rds: prevent BUG_ON triggered on congestion update to loopback
  net: do not pretend FRAGLIST support
  IPv6: Fixed support for blackhole and prohibit routes
  HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""
  gpio-rcar: R-Car GPIO IRQ share interrupt
  clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
  irqchip: renesas-irqc: Fix irqc_probe error handling
  Linux 3.10.26
  sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
  ext4: fix bigalloc regression
  arm64: Use Normal NonCacheable memory for writecombine
  arm64: Do not flush the D-cache for anonymous pages
  arm64: Avoid cache flushing in flush_dcache_page()
  ARM: KVM: arch_timers: zero CNTVOFF upon return to host
  ARM: hyp: initialize CNTVOFF to zero
  clocksource: arch_timer: use virtual counters
  arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S
  arm64: dts: Reserve the memory used for secondary CPU release address
  arm64: check for number of arguments in syscall_get/set_arguments()
  arm64: fix possible invalid FPSIMD initialization state
  ...

Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-03-24 14:28:34 -07:00
Heinz Mauelshagen bd86e7cb77 dm cache: fix access beyond end of origin device
commit e893fba90c09f9b57fb97daae204ea9cc2c52fa5 upstream.

In order to avoid wasting cache space a partial block at the end of the
origin device is not cached.  Unfortunately, the check for such a
partial block at the end of the origin device was flawed.

Fix accesses beyond the end of the origin device that occured due to
attempted promotion of an undetected partial block by:

- initializing the per bio data struct to allow cache_end_io to work properly
- recognizing access to the partial block at the end of the origin device
- avoiding out of bounds access to the discard bitset

Otherwise, users can experience errors like the following:

 attempt to access beyond end of device
 dm-5: rw=0, want=20971520, limit=20971456
 ...
 device-mapper: cache: promotion failed; couldn't copy block

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:18 -07:00
Heinz Mauelshagen e88217a8ee dm cache: fix truncation bug when copying a block to/from >2TB fast device
commit 8b9d96666529a979acf4825391efcc7c8a3e9f12 upstream.

During demotion or promotion to a cache's >2TB fast device we must not
truncate the cache block's associated sector to 32bits.  The 32bit
temporary result of from_cblock() caused a 32bit multiplication when
calculating the sector of the fast device in issue_copy_real().

Use an intermediate 64bit type to store the 32bit from_cblock() to allow
for proper 64bit multiplication.

Here is an example of how this bug manifests on an ext4 filesystem:

 EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt.
 JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23 21:38:18 -07:00
Mike Snitzer fe8ee730af dm thin: fix the error path for the thin device constructor
commit 1acacc0784aab45627b6009e0e9224886279ac0b upstream.

dm_pool_close_thin_device() must be called if dm_set_target_max_io_len()
fails in thin_ctr().  Otherwise __pool_destroy() will fail because the
pool will still have an open thin device:

 device-mapper: thin metadata: attempt to close pmd when 1 device(s) are still open
 device-mapper: thin: __pool_destroy: dm_pool_metadata_close() failed.

Also, must establish error code if failing thin_ctr() because the pool
is in fail_io mode.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:14 -08:00
Mike Snitzer 4f48d3328e dm thin: avoid metadata commit if a pool's thin devices haven't changed
commit 4d1662a30dde6e545086fe0e8fd7e474c4e0b639 upstream.

Commit 905e51b ("dm thin: commit outstanding data every second")
introduced a periodic commit.  This commit occurs regardless of whether
any thin devices have made changes.

Fix the periodic commit to check if any of a pool's thin devices have
changed using dm_pool_changed_this_transaction().

Reported-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:13 -08:00
Hannes Reinecke f4124bc34f dm mpath: fix stalls when handling invalid ioctls
commit a1989b330093578ea5470bea0a00f940c444c466 upstream.

An invalid ioctl will never be valid, irrespective of whether multipath
has active paths or not.  So for invalid ioctls we do not have to wait
for multipath to activate any paths, but can rather return an error
code immediately.  This fix resolves numerous instances of:

 udevd[]: worker [] unexpectedly returned with status 0x0100

that have been seen during testing.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06 21:30:13 -08:00
Dinesh K Garg d9b69c0618 dm: Adding support of AES 256 bits in dm-req-crypt
dm-req-crypt currently uses AES 128 bit. Updating dm-req-crypt
to use AES 256 bit instead.

Change-Id: I2e8a4484b25496d53f1f9aa83ec6c0ed7b947901
Signed-off-by: Dinesh K Garg <dineshg@codeaurora.org>
2014-02-26 11:27:03 -08:00
Oleg Nesterov 4d4ef86d44 md/raid5: Fix CPU hotplug callback registration
commit 789b5e0315284463617e106baad360cb9e8db3ac upstream.

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Interestingly, the raid5 code can actually prevent double initialization and
hence can use the following simplified form of callback registration:

	register_cpu_notifier(&foobar_cpu_notifier);

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	put_online_cpus();

A hotplug operation that occurs between registering the notifier and calling
get_online_cpus(), won't disrupt anything, because the code takes care to
perform the memory allocations only once.

So reorganize the code in raid5 this way to fix the deadlock with callback
registration.

Cc: linux-raid@vger.kernel.org
Fixes: 36d1c6476b
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[Srivatsa: Fixed the unregister_cpu_notifier() deadlock, added the
free_scratch_buffer() helper to condense code further and wrote the changelog.]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22 12:41:29 -08:00
NeilBrown 9f2d289933 md/raid1: restore ability for check and repair to fix read errors.
commit 1877db75589a895bbdc4c4c3f23558e57b521141 upstream.

commit 30bc9b53878a9921b02e3b5bc4283ac1c6de102a
    md/raid1: fix bio handling problems in process_checks()

Move the bio_reset() to a point before where BIO_UPTODATE is checked,
so that check now always report that the bio is uptodate, even if it is not.

This causes process_check() to sometimes treat read-errors as
successful matches so the good data isn't written out.

This patch preserves the flag until it is needed.

Bug was introduced in 3.11, but backported to 3.10-stable (as it fixed
an even worse bug).  So suitable for any -stable since 3.10.

Reported-and-tested-by: Michael Tokarev <mjt@tls.msk.ru>
Fixed: 30bc9b53878a9921b02e3b5bc4283ac1c6de102a
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-22 12:41:29 -08:00
Mikulas Patocka 4f66403630 dm sysfs: fix a module unload race
commit 2995fa78e423d7193f3b57835f6c1c75006a0315 upstream.

This reverts commit be35f48610 ("dm: wait until embedded kobject is
released before destroying a device") and provides an improved fix.

The kobject release code that calls the completion must be placed in a
non-module file, otherwise there is a module unload race (if the process
calling dm_kobject_release is preempted and the DM module unloaded after
the completion is triggered, but before dm_kobject_release returns).

To fix this race, this patch moves the completion code to dm-builtin.c
which is always compiled directly into the kernel if BLK_DEV_DM is
selected.

The patch introduces a new dm_kobject_holder structure, its purpose is
to keep the completion and kobject in one place, so that it can be
accessed from non-module code without the need to export the layout of
struct mapped_device to that code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:02 -08:00
Joe Thornber 88972eec55 dm space map metadata: fix bug in resizing of thin metadata
commit fca028438fb903852beaf7c3fe1cd326651af57d upstream.

This bug was introduced in commit 7e664b3dec431e ("dm space map metadata:
fix extending the space map").

When extending a dm-thin metadata volume we:

- Switch the space map into a simple bootstrap mode, which allocates
  all space linearly from the newly added space.
- Add new bitmap entries for the new space
- Increment the reference counts for those newly allocated bitmap
  entries
- Commit changes to disk
- Switch back out of bootstrap mode.

But, the disk commit may allocate space itself, if so this fact will be
lost when switching out of bootstrap mode.

The bug exhibited itself as an error when the bitmap_root, with an
erroneous ref count of 0, was subsequently decremented as part of a
later disk commit.  This would cause the disk commit to fail, and thinp
to enter read_only mode.  The metadata was not damaged (thin_check
passed).

The fix is to put the increments + commit into a loop, running until
the commit has not allocated extra space.  In practise this loop only
runs twice.

With this fix the following device mapper testsuite test passes:
 dmtest run --suite thin-provisioning -n thin_remove_works_after_resize

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:01 -08:00
Joe Thornber f15396a369 dm space map metadata: fix extending the space map
commit 7e664b3dec431eebf0c5df5ff704d6197634cf35 upstream.

When extending a metadata space map we should do the first commit whilst
still in bootstrap mode -- a mode where all blocks get allocated in the
new area.

That way the commit overhead is allocated from the newly added space.
Otherwise we risk running out of space.

With this fix, and the previous commit "dm space map common: make sure
new space is used during extend", the following device mapper testsuite
test passes:
 dmtest run --suite thin-provisioning -n /resize_metadata_no_io/

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:01 -08:00
Joe Thornber 247373410a dm space map common: make sure new space is used during extend
commit 12c91a5c2d2a8e8cc40a9552313e1e7b0a2d9ee3 upstream.

When extending a low level space map we should update nr_blocks at
the start so the new space is used for the index entries.

Otherwise extend can fail, e.g.: sm_metadata_extend call sequence
that fails:
 -> sm_ll_extend
    -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block
    => returns -ENOSPC because smm->begin == smm->ll.nr_blocks

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:01 -08:00
Mikulas Patocka eef2b6df86 dm: wait until embedded kobject is released before destroying a device
commit be35f486108227e10fe5d96fd42fb2b344c59983 upstream.

There may be other parts of the kernel holding a reference on the dm
kobject.  We must wait until all references are dropped before
deallocating the mapped_device structure.

The dm_kobject_release method signals that all references are dropped
via completion.  But dm_kobject_release doesn't free the kobject (which
is embedded in the mapped_device structure).

This is the sequence of operations:
* when destroying a DM device, call kobject_put from dm_sysfs_exit
* wait until all users stop using the kobject, when it happens the
  release method is called
* the release method signals the completion and should return without
  delay
* the dm device removal code that waits on the completion continues
* the dm device removal code drops the dm_mod reference the device had
* the dm device removal code frees the mapped_device structure that
  contains the kobject

Using kobject this way should avoid the module unload race that was
mentioned at the beginning of this thread:
https://lkml.org/lkml/2014/1/4/83

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:01 -08:00
Mike Snitzer 025d61e0f1 dm thin: initialize dm_thin_new_mapping returned by get_next_mapping
commit 16961b042db8cc5cf75d782b4255193ad56e1d4f upstream.

As additional members are added to the dm_thin_new_mapping structure
care should be taken to make sure they get initialized before use.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:01 -08:00
Joe Thornber 614319dfb0 dm thin: fix discard support to a previously shared block
commit 19fa1a6756ed9e92daa9537c03b47d6b55cc2316 upstream.

If a snapshot is created and later deleted the origin dm_thin_device's
snapshotted_time will have been updated to reflect the snapshot's
creation time.  The 'shared' flag in the dm_thin_lookup_result struct
returned from dm_thin_find_block() is an approximation based on
snapshotted_time -- this is done to avoid 0(n), or worse, time
complexity.  In this case, the shared flag would be true.

But because the 'shared' flag reflects an approximation a block can be
incorrectly assumed to be shared (e.g. false positive for 'shared'
because the snapshot no longer exists).  This could result in discards
issued to a thin device not being passed down to the pool's underlying
data device.

To fix this we double check that a thin block is really still in-use
after a mapping is removed using dm_pool_block_is_used().  If the
reference count for a block is now zero the discard is allowed to be
passed down.

Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
structure -- reflects that the 'shared' flag in the response from
dm_thin_find_block() can only be held as definitive if false is
returned.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:01 -08:00
Kent Overstreet f4ac67e8e7 bcache: Data corruption fix
commit ef71ec00002d92a08eb27e9d036e3d48835b6597 upstream.

The code that handles overlapping extents that we've just read back in from disk
was depending on the behaviour of the code that handles overlapping extents as
we're inserting into a btree node in the case of an insert that forced an
existing extent to be split: on insert, if we had to split we'd also insert a
new extent to represent the top part of the old extent - and then that new
extent would get written out.

The code that read the extents back in thus not bother with splitting extents -
if it saw an extent that ovelapped in the middle of an older extent, it would
trim the old extent to only represent the bottom part, assuming that the
original insert would've inserted a new extent to represent the top part.

I still haven't figured out _how_ it can happen, but I'm now pretty convinced
(and testing has confirmed) that there's some kind of an obscure corner case
(probably involving extent merging, and multiple overwrites in different sets)
that breaks this. The fix is to change the mergesort fixup code to split extents
itself when required.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06 11:08:16 -08:00
NeilBrown 6ba854e9bf md/raid5: fix long-standing problem with bitmap handling on write failure.
commit 9f97e4b128d2ea90a5f5063ea0ee3b0911f4c669 upstream.

Before a write starts we set a bit in the write-intent bitmap.
When the write completes we clear that bit if the write was successful
to all devices.  However if the write wasn't fully successful we
should not clear the bit.  If the faulty drive is subsequently
re-added, the fact that the bit is still set ensure that we will
re-write the data that is missing.

This logic is mediated by the STRIPE_DEGRADED flag - we only clear the
bitmap bit when this flag is not set.
Currently we correctly set the flag if a write starts when some
devices are failed or missing.  But we do *not* set the flag if some
device failed during the write attempt.
This is wrong and can result in clearing the bit inappropriately.

So: set the flag when a write fails.

This bug has been present since bitmaps were introduces, so the fix is
suitable for any -stable kernel.

Reported-by: Ethan Wilson <ethan.wilson@shiftmail.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06 11:08:12 -08:00
Dinesh K Garg 4aab49333d dm: updating crypto algorithm used by dm-req-crypt
dm-req-crypt uses AES-XTS algorithm implemented by HW crypto
engine. Crypto driver renamed the aes-xts algorithm to avoid
conflict with SW based implementation of AES-XTS. dm-req-crypt
must use the new name for AES-XTS provided by crypto driver.

Change-Id: I286b6435af33fd64673c7c1af4bb2447f3115868
Signed-off-by: Dinesh K Garg <dineshg@codeaurora.org>
2014-02-03 09:28:19 -08:00
NeilBrown c44bc4496d md/raid5: Fix possible confusion when multiple write errors occur.
commit 1cc03eb93245e63b0b7a7832165efdc52e25b4e6 upstream.

commit 5d8c71f9e5
    md: raid5 crash during degradation

Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.

commit 14a75d3e07
    md/raid5: preferentially read from replacement device if possible.

Fixed the same bug if a more effective way, so we can now revert
the original commit.

Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 5d8c71f9e5
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-25 08:27:12 -08:00
NeilBrown ddd618aa9e md/raid10: fix two bugs in handling of known-bad-blocks.
commit b50c259e25d9260b9108dc0c2964c26e5ecbe1c1 upstream.

If we discover a bad block when reading we split the request and
potentially read some of it from a different device.

The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
   seen by comparison with raid1.c

This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.

Fixes: 856e08e237
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-25 08:27:12 -08:00
NeilBrown 16342c21c9 md/raid10: fix bug when raid10 recovery fails to recover a block.
commit e8b849158508565e0cd6bc80061124afc5879160 upstream.

commit e875ecea26
    md/raid10 record bad blocks as needed during recovery.

added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.

So move the freeing of r10bio down where it is safe.

Fixes: e875ecea26
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-25 08:27:12 -08:00
NeilBrown bb4a65df30 md: fix problem when adding device to read-only array with bitmap.
commit 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 upstream.

If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update.  We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag.  If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <andreiw@vmware.com>
Fixes: d70ed2e4fa
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-25 08:27:12 -08:00