android_kernel_google_msm/kernel
Willy Tarreau dcaffd537f pipe: limit the per-user amount of pages allocated in pipes
On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Conflicts:
	Documentation/sysctl/fs.txt
	fs/pipe.c
	include/linux/sched.h

Change-Id: Ic7c678af18129943e16715fdaa64a97a7f0854be
2016-10-29 23:12:35 +08:00
..
debug debug: add parameters to prevent entering debug mode on errors 2012-05-18 17:03:10 -07:00
events FROMLIST: security,perf: Allow further restriction of perf_event_open 2016-06-20 19:00:29 +00:00
gcov
irq random: remove rand_initialize_irq() 2013-09-09 17:01:42 -07:00
power Power: Changes the permission to read only for sysfs file 2014-08-05 19:00:47 +00:00
sched f2fs: avoid hungtask problem caused by losing wake_up 2016-10-29 23:12:32 +08:00
time tick: Cleanup NOHZ per cpu data on cpu down 2016-10-29 23:12:18 +08:00
trace tracing/syscalls: Ignore numbers outside NR_syscalls' range 2016-10-29 23:12:11 +08:00
.gitignore kernel/hz.bc: ignore. 2016-10-29 23:12:15 +08:00
acct.c
async.c
audit.c
audit.h
audit_tree.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2015-07-13 11:17:46 -07:00
audit_watch.c
auditfilter.c
auditsc.c seccomp: remove duplicated failure logging 2014-10-31 19:46:13 -07:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() 2014-12-01 16:09:15 -08:00
cgroup_freezer.c
compat.c compat: Fix RT signal mask corruption via sigprocmask 2012-05-10 08:58:33 -07:00
configs.c
cpu.c Move x86_64 idle notifiers to generic 2012-04-09 13:57:52 -07:00
cpu_pm.c
cpuset.c Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 2012-04-02 08:53:24 -07:00
crash_dump.c
cred.c cred: copy_process() should clear child->replacement_session_keyring 2012-04-11 08:20:11 -07:00
delayacct.c
dma.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
elfcore.c
exec_domain.c
exit.c flo: Put device-specific code behind #ifndef CONFIG_UML. 2015-05-20 15:22:06 +09:00
extable.c
fork.c introduce for_each_thread() to replace the buggy while_each_thread() 2014-10-31 19:46:30 -07:00
freezer.c freezer: skip waking up tasks with PF_FREEZER_SKIP set 2013-07-12 14:22:56 -07:00
futex.c futex: Make lookup_pi_state more robust 2014-06-11 15:16:22 -07:00
futex_compat.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
groups.c
hrtimer.c hrtimer: Prevent remote enqueue of leftmost timers 2016-10-29 23:12:34 +08:00
hung_task.c
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c
kallsyms.c vsprintf: Fix %ps on non symbols when using kallsyms 2013-02-08 15:14:22 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
Kconfig.preempt locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
kexec.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-28 17:19:28 -07:00
kfifo.c
kmod.c PM / Sleep: Mitigate race between the freezer and request_firmware() 2012-03-28 23:30:28 +02:00
kprobes.c
ksysfs.c rcu: Add a module parameter to force use of expedited RCU primitives 2016-10-29 23:12:17 +08:00
kthread.c
latencytop.c
lglock.c brlocks/lglocks: turn into functions 2015-07-13 11:17:40 -07:00
lockdep.c lockdep: remove task argument from debug_check_no_locks_held 2013-07-12 14:22:56 -07:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
Makefile kernel: Replace timeconst.pl with a bc script 2016-10-29 23:12:15 +08:00
module.c module: Remove module size limit 2012-03-26 12:50:53 +10:30
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c vfs: Add a user namespace reference from struct mnt_namespace 2015-07-13 11:17:54 -07:00
padata.c padata: Fix cpu hotplug 2012-03-29 19:52:46 +08:00
panic.c panic: resume console if panic after console suspend. 2013-09-09 17:16:14 -07:00
params.c params: <level>_initcall-like kernel parameters 2012-03-26 12:50:51 +10:30
pid.c proc: Usable inode numbers for the namespace file descriptors. 2015-07-13 11:18:01 -07:00
pid_namespace.c proc: Usable inode numbers for the namespace file descriptors. 2015-07-13 11:18:01 -07:00
posix-cpu-timers.c
posix-timers.c
printk.c flo: Put device-specific code behind #ifndef CONFIG_UML. 2015-05-20 15:22:06 +09:00
profile.c
ptrace.c __ptrace_may_access() should not deny sub-threads 2016-10-29 23:12:26 +08:00
range.c
rcu.h rcu: Add a module parameter to force use of expedited RCU primitives 2016-10-29 23:12:17 +08:00
rcupdate.c rcu: Make exit_rcu() more precise and consolidate 2016-10-29 23:12:17 +08:00
rcutiny.c
rcutiny_plugin.h rcu: Make exit_rcu() more precise and consolidate 2016-10-29 23:12:17 +08:00
rcutorture.c
rcutree.c rcu: Fix batch-limit size problem 2016-10-29 23:12:32 +08:00
rcutree.h rcu: Make rcu_barrier() less disruptive 2016-10-29 23:12:18 +08:00
rcutree_plugin.h rcu: Precompute RCU_FAST_NO_HZ timer offsets 2016-10-29 23:12:18 +08:00
rcutree_trace.c rcu: Make rcu_barrier() less disruptive 2016-10-29 23:12:18 +08:00
relay.c
res_counter.c
resource.c kernel: Restrict permissions of /proc/iomem. 2016-06-03 11:56:04 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
seccomp.c seccomp: Use atomic operations that are present in kernel 3.4. 2014-10-31 19:46:31 -07:00
semaphore.c
signal.c signal, x86: add SIGSYS info and make it synchronous. 2014-10-31 19:46:15 -07:00
smp.c smp: add func to IPI cpus based on parameter func 2012-03-28 17:14:35 -07:00
softirq.c softirq: reduce latencies 2016-10-29 23:12:18 +08:00
spinlock.c locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
srcu.c
stacktrace.c
stop_machine.c
sys.c prctl: make PR_SET_TIMERSLACK_PID pid namespace aware 2016-10-29 23:12:27 +08:00
sys_ni.c seccomp: add "seccomp" syscall 2014-10-31 19:46:27 -07:00
sysctl.c pipe: limit the per-user amount of pages allocated in pipes 2016-10-29 23:12:35 +08:00
sysctl_binary.c msm: 8x55: put reason for boot in procfs from SMEM 2013-02-08 15:14:28 -08:00
taskstats.c
test_kprobes.c
time.c jiffies: Fix timeval conversion to jiffies 2016-10-29 23:12:15 +08:00
timeconst.bc kernel: Replace timeconst.pl with a bc script 2016-10-29 23:12:15 +08:00
timer.c timer: Fix mod_timer_pinned() header comment 2016-10-29 23:12:18 +08:00
tracepoint.c
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c proc: Usable inode numbers for the namespace file descriptors. 2015-07-13 11:18:01 -07:00
user_namespace.c proc: Usable inode numbers for the namespace file descriptors. 2015-07-13 11:18:01 -07:00
utsname.c proc: Usable inode numbers for the namespace file descriptors. 2015-07-13 11:18:01 -07:00
utsname_sysctl.c
wait.c
watchdog.c kernel/watchdog.c: add comment to watchdog() exit path 2012-03-23 16:58:32 -07:00
workqueue.c workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active 2013-03-04 12:48:24 -08:00
workqueue_sched.h