android_kernel_samsung_msm8976/mm
Ming Lei 21caf2fc19 mm: teach mm by current context info to not do I/O during memory allocation
This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task to avoid
doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device, and
the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
  GFP_KERNEL is called inside runtime resume callback of any one of its
  ancestors(or the block device itself), the deadlock may be triggered
  inside the memory allocation since it might not complete until the block
  device becomes active and the involed page I/O finishes.  The situation
  is pointed out first by Alan Stern.  It is not a good approach to
  convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
  subsystems may be involved(for example, PCI, USB and SCSI may be
  involved for usb mass stoarage device, network devices involved too in
  the iSCSI case)

- during block device runtime suspend, because runtime resume need to
  wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset will
  be put on the device, so there shouldn't have any memory allocation with
  GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
  above may be triggered.  Unfortunately, any usb device may include one
  mass storage interface in theory, so it requires all usb interface
  drivers to handle the situation.  In fact, most usb drivers don't know
  how to handle bus reset on the device and don't provide .pre_set() and
  .post_reset() callback at all, so USB core has to unbind and bind driver
  for these devices.  So it is still not practical to resort to GFP_NOIO
  for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

It is not a good idea to convert all these GFP_KERNEL in the affected
path into GFP_NOIO because these functions doing that may be implemented
as library and will be called in many other contexts.

In fact, memalloc_noio_flags() can convert some of current static
GFP_NOIO allocation into GFP_KERNEL back in other non-affected contexts,
at least almost all GFP_NOIO in USB subsystem can be converted into
GFP_KERNEL after applying the approach and make allocation with GFP_NOIO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

[1], several GFP_KERNEL allocation examples in runtime resume path

- pci subsystem
acpi_os_allocate
	<-acpi_ut_allocate
		<-ACPI_ALLOCATE_ZEROED
			<-acpi_evaluate_object
				<-__acpi_bus_set_power
					<-acpi_bus_set_power
						<-acpi_pci_set_power_state
							<-platform_pci_set_power_state
								<-pci_platform_power_transition
									<-__pci_complete_power_transition
										<-pci_set_power_state
											<-pci_restore_standard_config
												<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
	<-finish_port_resume
		<-usb_port_resume
			<-generic_resume
				<-usb_resume_device
					<-usb_resume_both
						<-usb_runtime_resume

- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....

That is just what I have found.  Unfortunately, this allocation can only
be found by human being now, and there should be many not found since
any function in the resume path(call tree) may allocate memory with
GFP_KERNEL.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Decotigny <david.decotigny@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:16 -08:00
..
backing-dev.c bdi: allow block devices to say that they require stable page writes 2013-02-21 17:22:19 -08:00
balloon_compaction.c
bootmem.c
bounce.c block: optionally snapshot page contents to provide stable pages during write 2013-02-21 17:22:20 -08:00
cleancache.c
compaction.c mm: remove MIGRATE_ISOLATE check in hotpath 2013-02-23 17:50:15 -08:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap.c mm: only enforce stable page writes if the backing device requires it 2013-02-21 17:22:19 -08:00
filemap_xip.c
fremap.c mm: introduce VM_POPULATE flag to better deal with racy userspace programs 2013-02-23 17:50:11 -08:00
frontswap.c
highmem.c
huge_memory.c mm/huge_memory.c: use new hashtable implementation 2013-02-23 17:50:10 -08:00
hugetlb.c mm/hugetlb.c: convert to pr_foo() 2013-02-23 17:50:09 -08:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: directly use __mlock_vma_pages_range() in find_extend_vma() 2013-02-23 17:50:11 -08:00
interval_tree.c
Kconfig memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap 2013-02-23 17:50:12 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c mm/ksm.c: use new hashtable implementation 2013-02-23 17:50:10 -08:00
maccess.c
madvise.c mm: make madvise(MADV_WILLNEED) support swap file prefetch 2013-02-23 17:50:10 -08:00
Makefile
memblock.c mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region(). 2013-02-23 17:50:14 -08:00
memcontrol.c mm/memcontrol.c: convert printk(KERN_FOO) to pr_foo() 2013-02-23 17:50:09 -08:00
memory-failure.c mm/memory-failure.c: fix wrong num_poisoned_pages in handling memory error on thp 2013-02-23 17:50:15 -08:00
memory.c mm: directly use __mlock_vma_pages_range() in find_extend_vma() 2013-02-23 17:50:11 -08:00
memory_hotplug.c mm: increase totalram_pages when free pages allocated by bootmem allocator 2013-02-23 17:50:15 -08:00
mempolicy.c mempolicy: fix is_valid_nodemask() 2013-02-23 17:50:13 -08:00
mempool.c
migrate.c
mincore.c
mlock.c mm: introduce VM_POPULATE flag to better deal with racy userspace programs 2013-02-23 17:50:11 -08:00
mm_init.c
mmap.c mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool 2013-02-23 17:50:11 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c mm: use mm_populate() for mremap() of VM_LOCKED vmas 2013-02-23 17:50:11 -08:00
msync.c
nobootmem.c
nommu.c mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool 2013-02-23 17:50:11 -08:00
oom_kill.c memcg, oom: provide more precise dump info while memcg oom happening 2013-02-23 17:50:08 -08:00
page-writeback.c block: optionally snapshot page contents to provide stable pages during write 2013-02-21 17:22:20 -08:00
page_alloc.c mm: teach mm by current context info to not do I/O during memory allocation 2013-02-23 17:50:16 -08:00
page_cgroup.c
page_io.c
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c s390/mm: implement software dirty bits 2013-02-14 15:55:23 +01:00
shmem.c
slab.c
slab.h
slab_common.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c memory-failure: use num_poisoned_pages instead of mce_bad_pages 2013-02-23 17:50:15 -08:00
swap.c
swap_state.c
swapfile.c
truncate.c
util.c mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool 2013-02-23 17:50:11 -08:00
vmalloc.c
vmscan.c mm: teach mm by current context info to not do I/O during memory allocation 2013-02-23 17:50:16 -08:00
vmstat.c mm: don't wait on congested zones in balance_pgdat() 2013-02-23 17:50:15 -08:00