Commit graph

306680 commits

Author SHA1 Message Date
Will Drewry
05acc0b62e samples/seccomp: fix dependencies on arch macros
This change fixes the compilation error reported for
i386 allmodconfig here:
  http://kisskb.ellerman.id.au/kisskb/buildresult/6123842/

Logic attempting to predict the host architecture has been
removed from the Makefile.  Instead, the bpf-direct sample
should now compile on any architecture, but if the architecture
is not supported, it will compile a minimal main() function.

This change also ensures the samples are not compiled when
there is no seccomp filter support.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
2014-10-31 19:46:20 -07:00
Will Drewry
4444d9b1b4 seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER
If both audit and seccomp filter support are disabled, 'ret' is marked
as unused.

If just seccomp filter support is disabled, data and skip are considered
unused.

This change fixes those build warnings.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-10-31 19:46:19 -07:00
Will Drewry
01bc71e06c seccomp: ignore secure_computing return values
This change is inspired by
  https://lkml.org/lkml/2012/4/16/14
which fixes the build warnings for arches that don't support
CONFIG_HAVE_ARCH_SECCOMP_FILTER.

In particular, there is no requirement for the return value of
secure_computing() to be checked unless the architecture supports
seccomp filter.  Instead of silencing the warnings with (void)
a new static inline is added to encode the expected behavior
in a compiler and human friendly way.

v2: - cleans things up with a static inline
    - removes sfr's signed-off-by since it is a different approach
v1: - matches sfr's original change

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-10-31 19:46:19 -07:00
Stephen Rothwell
60d3a9c7ee seccomp: use a static inline for a function stub
Fixes this error message when CONFIG_SECCOMP is not set:

arch/powerpc/kernel/ptrace.c: In function 'do_syscall_trace_enter':
arch/powerpc/kernel/ptrace.c:1713:2: error: statement with no effect [-Werror=unused-value]

Signed-off-by: Stephen Rothwell <sfr@ozlabs.au.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-10-31 19:46:18 -07:00
Will Drewry
bc3f6efeb7 Documentation: prctl/seccomp_filter
Documents how system call filtering using Berkeley Packet
Filter programs works and how it may be used.
Includes an example for x86 and a semi-generic
example using a macro-based code generator.

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Will Drewry <wad@chromium.org>

v18: - added acked by
     - update no new privs numbers
v17: - remove @compat note and add Pitfalls section for arch checking
       (keescook@chromium.org)
v16: -
v15: -
v14: - rebase/nochanges
v13: - rebase on to 88ebdda615
v12: - comment on the ptrace_event use
     - update arch support comment
     - note the behavior of SECCOMP_RET_DATA when there are multiple filters
       (keescook@chromium.org)
     - lots of samples/ clean up incl 64-bit bpf-direct support
       (markus@chromium.org)
     - rebase to linux-next
v11: - overhaul return value language, updates (keescook@chromium.org)
     - comment on do_exit(SIGSYS)
v10: - update for SIGSYS
     - update for new seccomp_data layout
     - update for ptrace option use
v9: - updated bpf-direct.c for SIGILL
v8: - add PR_SET_NO_NEW_PRIVS to the samples.
v7: - updated for all the new stuff in v7: TRAP, TRACE
    - only talk about PR_SET_SECCOMP now
    - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com)
    - adds dropper.c: a simple system call disabler
v6: - tweak the language to note the requirement of
      PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu)
v5: - update sample to use system call arguments
    - adds a "fancy" example using a macro-based generator
    - cleaned up bpf in the sample
    - update docs to mention arguments
    - fix prctl value (eparis@redhat.com)
    - language cleanup (rdunlap@xenotime.net)
v4: - update for no_new_privs use
    - minor tweaks
v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net)
    - document use of tentative always-unprivileged
    - guard sample compilation for i386 and x86_64
v2: - move code to samples (corbet@lwn.net)
2014-10-31 19:46:17 -07:00
Sasha Levitskiy
b91885c38c Change-Id: I7c9d49079d4e18390c2d520513a4afd55e6eaa3e 2014-10-31 19:46:17 -07:00
Will Drewry
2d338c789a ptrace,seccomp: Add PTRACE_SECCOMP support
This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.

When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
results in a BPF program returning SECCOMP_RET_TRACE.  The 16-bit
SECCOMP_RET_DATA mask of the BPF program return value will be passed as
the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.

If the subordinate process is not using seccomp filter, then no
system call notifications will occur even if the option is specified.

If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
is returned, the system call will not be executed and an -ENOSYS errno
will be returned to userspace.

This change adds a dependency on the system call slow path.  Any future
efforts to use the system call fast path for seccomp filter will need to
address this restriction.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - rebase
     - comment fatal_signal check
     - acked-by
     - drop secure_computing_int comment
v17: - ...
v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
       [note PT_TRACE_MASK disappears in linux-next]
v15: - add audit support for non-zero return codes
     - clean up style (indan@nul.nu)
v14: - rebase/nochanges
v13: - rebase on to 88ebdda615
       (Brings back a change to ptrace.c and the masks.)
v12: - rebase to linux-next
     - use ptrace_event and update arch/Kconfig to mention slow-path dependency
     - drop all tracehook changes and inclusion (oleg@redhat.com)
v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
       (indan@nul.nu)
v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
v9:  - n/a
v8:  - guarded PTRACE_SECCOMP use with an ifdef
v7:  - introduced
2014-10-31 19:46:16 -07:00
Will Drewry
f0a1e625bd seccomp: Add SECCOMP_RET_TRAP
Adds a new return value to seccomp filters that triggers a SIGSYS to be
delivered with the new SYS_SECCOMP si_code.

This allows in-process system call emulation, including just specifying
an errno or cleanly dumping core, rather than just dying.

Suggested-by: Markus Gutschke <markus@chromium.org>
Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - acked-by, rebase
     - don't mention secure_computing_int() anymore
v15: - use audit_seccomp/skip
     - pad out error spacing; clean up switch (indan@nul.nu)
v14: - n/a
v13: - rebase on to 88ebdda615
v12: - rebase on to linux-next
v11: - clarify the comment (indan@nul.nu)
     - s/sigtrap/sigsys
v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
       note suggested-by (though original suggestion had other behaviors)
v9:  - changes to SIGILL
v8:  - clean up based on changes to dependent patches
v7:  - introduction
2014-10-31 19:46:15 -07:00
Will Drewry
abdbfff1ae signal, x86: add SIGSYS info and make it synchronous.
This change enables SIGSYS, defines _sigfields._sigsys, and adds
x86 (compat) arch support.  _sigsys defines fields which allow
a signal handler to receive the triggering system call number,
the relevant AUDIT_ARCH_* value for that number, and the address
of the callsite.

SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
to have setup_frame() called for it. The goal is to ensure that
ucontext_t reflects the machine state from the time-of-syscall and not
from another signal handler.

The first consumer of SIGSYS would be seccomp filter.  In particular,
a filter program could specify a new return value, SECCOMP_RET_TRAP,
which would result in the system call being denied and the calling
thread signaled.  This also means that implementing arch-specific
support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - added acked by, rebase
v17: - rebase and reviewed-by addition
v14: - rebase/nochanges
v13: - rebase on to 88ebdda615
v12: - reworded changelog (oleg@redhat.com)
v11: - fix dropped words in the change description
     - added fallback copy_siginfo support.
     - added __ARCH_SIGSYS define to allow stepped arch support.
v10: - first version based on suggestion
2014-10-31 19:46:15 -07:00
Will Drewry
c548af9d53 seccomp: add SECCOMP_RET_ERRNO
This change adds the SECCOMP_RET_ERRNO as a valid return value from a
seccomp filter.  Additionally, it makes the first use of the lower
16-bits for storing a filter-supplied errno.  16-bits is more than
enough for the errno-base.h calls.

Returning errors instead of immediately terminating processes that
violate seccomp policy allow for broader use of this functionality
for kernel attack surface reduction.  For example, a linux container
could maintain a whitelist of pre-existing system calls but drop
all new ones with errnos.  This would keep a logically static attack
surface while providing errnos that may allow for graceful failure
without the downside of do_exit() on a bad call.

This change also changes the signature of __secure_computing.  It
appears the only direct caller is the arm entry code and it clobbers
any possible return value (register) immediately.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - fix up comments and rebase
     - fix bad var name which was fixed in later revs
     - remove _int() and just change the __secure_computing signature
v16-v17: ...
v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
     - clean up and pad out return codes (indan@nul.nu)
v14: - no change/rebase
v13: - rebase on to 88ebdda615
v12: - move to WARN_ON if filter is NULL
       (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
     - return immediately for filter==NULL (keescook@chromium.org)
     - change evaluation to only compare the ACTION so that layered
       errnos don't result in the lowest one being returned.
       (keeschook@chromium.org)
v11: - check for NULL filter (keescook@chromium.org)
v10: - change loaders to fn
 v9: - n/a
 v8: - update Kconfig to note new need for syscall_set_return_value.
     - reordered such that TRAP behavior follows on later.
     - made the for loop a little less indent-y
 v7: - introduced
2014-10-31 19:46:14 -07:00
Kees Cook
929e7b93d1 seccomp: remove duplicated failure logging
This consolidates the seccomp filter error logging path and adds more
details to the audit log.

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: make compat= permanent in the record
v15: added a return code to the audit_seccomp path by wad@chromium.org
     (suggested by eparis@redhat.com)
v*: original by keescook@chromium.org
2014-10-31 19:46:13 -07:00
Will Drewry
cab80ebeec seccomp: add system call filtering using BPF
[This patch depends on luto@mit.edu's no_new_privs patch:
   https://lkml.org/lkml/2012/1/30/264
 The whole series including Andrew's patches can be found here:
   https://github.com/redpig/linux/tree/seccomp
 Complete diff here:
   https://github.com/redpig/linux/compare/1dc65fed...seccomp
]

This patch adds support for seccomp mode 2.  Mode 2 introduces the
ability for unprivileged processes to install system call filtering
policy expressed in terms of a Berkeley Packet Filter (BPF) program.
This program will be evaluated in the kernel for each system call
the task makes and computes a result based on data in the format
of struct seccomp_data.

A filter program may be installed by calling:
  struct sock_fprog fprog = { ... };
  ...
  prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);

The return value of the filter program determines if the system call is
allowed to proceed or denied.  If the first filter program installed
allows prctl(2) calls, then the above call may be made repeatedly
by a task to further reduce its access to the kernel.  All attached
programs must be evaluated before a system call will be allowed to
proceed.

Filter programs will be inherited across fork/clone and execve.
However, if the task attaching the filter is unprivileged
(!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
ensures that unprivileged tasks cannot attach filters that affect
privileged tasks (e.g., setuid binary).

There are a number of benefits to this approach. A few of which are
as follows:
- BPF has been exposed to userland for a long time
- BPF optimization (and JIT'ing) are well understood
- Userland already knows its ABI: system call numbers and desired
  arguments
- No time-of-check-time-of-use vulnerable data accesses are possible.
- system call arguments are loaded on access only to minimize copying
  required for system call policy decisions.

Mode 2 support is restricted to architectures that enable
HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
syscall_get_arguments().  The full desired scope of this feature will
add a few minor additional requirements expressed later in this series.
Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
the desired additional functionality.

No architectures are enabled in this patch.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Indan Zupancic <indan@nul.nu>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - rebase to v3.4-rc2
     - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
     - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
     - add a comment for get_u32 regarding endianness (akpm@)
     - fix other typos, style mistakes (akpm@)
     - added acked-by
v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
     - tighten return mask to 0x7fff0000
v16: - no change
v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
       size (indan@nul.nu)
     - drop the max insns to 256KB (indan@nul.nu)
     - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
     - move IP checks after args (indan@nul.nu)
     - drop !user_filter check (indan@nul.nu)
     - only allow explicit bpf codes (indan@nul.nu)
     - exit_code -> exit_sig
v14: - put/get_seccomp_filter takes struct task_struct
       (indan@nul.nu,keescook@chromium.org)
     - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
     - add seccomp_bpf_load for use by net/core/filter.c
     - lower max per-process/per-hierarchy: 1MB
     - moved nnp/capability check prior to allocation
       (all of the above: indan@nul.nu)
v13: - rebase on to 88ebdda615
v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
     - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
     - reworded the prctl_set_seccomp comment (indan@nul.nu)
v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
     - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
     - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
     - pare down Kconfig doc reference.
     - extra comment clean up
v10: - seccomp_data has changed again to be more aesthetically pleasing
       (hpa@zytor.com)
     - calling convention is noted in a new u32 field using syscall_get_arch.
       This allows for cross-calling convention tasks to use seccomp filters.
       (hpa@zytor.com)
     - lots of clean up (thanks, Indan!)
 v9: - n/a
 v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
     - Lots of fixes courtesy of indan@nul.nu:
     -- fix up load behavior, compat fixups, and merge alloc code,
     -- renamed pc and dropped __packed, use bool compat.
     -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
        dependencies
 v7:  (massive overhaul thanks to Indan, others)
     - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
     - merged into seccomp.c
     - minimal seccomp_filter.h
     - no config option (part of seccomp)
     - no new prctl
     - doesn't break seccomp on systems without asm/syscall.h
       (works but arg access always fails)
     - dropped seccomp_init_task, extra free functions, ...
     - dropped the no-asm/syscall.h code paths
     - merges with network sk_run_filter and sk_chk_filter
 v6: - fix memory leak on attach compat check failure
     - require no_new_privs || CAP_SYS_ADMIN prior to filter
       installation. (luto@mit.edu)
     - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
     - cleaned up Kconfig (amwang@redhat.com)
     - on block, note if the call was compat (so the # means something)
 v5: - uses syscall_get_arguments
       (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
      - uses union-based arg storage with hi/lo struct to
        handle endianness.  Compromises between the two alternate
        proposals to minimize extra arg shuffling and account for
        endianness assuming userspace uses offsetof().
        (mcgrathr@chromium.org, indan@nul.nu)
      - update Kconfig description
      - add include/seccomp_filter.h and add its installation
      - (naive) on-demand syscall argument loading
      - drop seccomp_t (eparis@redhat.com)
 v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
      - now uses current->no_new_privs
        (luto@mit.edu,torvalds@linux-foundation.com)
      - assign names to seccomp modes (rdunlap@xenotime.net)
      - fix style issues (rdunlap@xenotime.net)
      - reworded Kconfig entry (rdunlap@xenotime.net)
 v3:  - macros to inline (oleg@redhat.com)
      - init_task behavior fixed (oleg@redhat.com)
      - drop creator entry and extra NULL check (oleg@redhat.com)
      - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
      - adds tentative use of "always_unprivileged" as per
        torvalds@linux-foundation.org and luto@mit.edu
 v2:  - (patch 2 only)
2014-10-31 19:46:13 -07:00
Will Drewry
b88dfd9fe6 arch/x86: add syscall_get_arch to syscall.h
Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call
entry path.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: - update comment about x32 tasks
     - rebase to v3.4-rc2
v17: rebase and reviewed-by
v14: rebase/nochanges
v13: rebase on to 88ebdda615
2014-10-31 19:46:12 -07:00
Will Drewry
00be32c23f asm/syscall.h: add syscall_get_arch
Adds a stub for a function that will return the AUDIT_ARCH_* value
appropriate to the supplied task based on the system call convention.

For audit's use, the value can generally be hard-coded at the
audit-site.  However, for other functionality not inlined into syscall
entry/exit, this makes that information available.  seccomp_filter is
the first planned consumer and, as such, the comment indicates a tie to
CONFIG_HAVE_ARCH_SECCOMP_FILTER.

Suggested-by: Roland McGrath <mcgrathr@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: comment and change reword and rebase.
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v11: fixed improper return type
v10: introduced
2014-10-31 19:46:12 -07:00
Will Drewry
9742b8d995 seccomp: kill the seccomp_t typedef
Replaces the seccomp_t typedef with struct seccomp to match modern
kernel style.

Signed-off-by: Will Drewry <wad@chromium.org>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: rebase
...
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v8-v11: no changes
v7: struct seccomp_struct -> struct seccomp
v6: original inclusion in this series.
2014-10-31 19:46:11 -07:00
Will Drewry
06b878fc11 net/compat.c,linux/filter.h: share compat_sock_fprog
Any other users of bpf_*_filter that take a struct sock_fprog from
userspace will need to be able to also accept a compat_sock_fprog
if the arch supports compat calls.  This change allows the existing
compat_sock_fprog be shared.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: tasered by the apostrophe police
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v11: introduction
2014-10-31 19:46:10 -07:00
Will Drewry
5171cc5332 sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W
Introduces a new BPF ancillary instruction that all LD calls will be
mapped through when skb_run_filter() is being used for seccomp BPF.  The
rewriting will be done using a secondary chk_filter function that is run
after skb_chk_filter.

The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
along with the seccomp_bpf_load() function later in this series.

This is based on http://lkml.org/lkml/2012/3/2/141

Suggested-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: rebase
...
v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
v14: First cut using a single additional instruction
... v13: made bpf functions generic.
2014-10-31 19:46:10 -07:00
John Johansen
7f86aad709 Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVS
Add support for AppArmor to explicitly fail requested domain transitions
if NO_NEW_PRIVS is set and the task is not unconfined.

Transitions from unconfined are still allowed because this always results
in a reduction of privileges.

Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>

v18: new acked-by, new description
2014-10-31 19:46:09 -07:00
Andy Lutomirski
14434eef82 Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs
With this change, calling
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
disables privilege granting operations at execve-time.  For example, a
process will not be able to execute a setuid binary to change their uid
or gid if this bit is set.  The same is true for file capabilities.

Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
LSMs respect the requested behavior.

To determine if the NO_NEW_PRIVS bit is set, a task may call
  prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
It returns 1 if set and 0 if it is not set. If any of the arguments are
non-zero, it will return -1 and set errno to -EINVAL.
(PR_SET_NO_NEW_PRIVS behaves similarly.)

This functionality is desired for the proposed seccomp filter patch
series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
system call behavior for itself and its child tasks without being
able to impact the behavior of a more privileged task.

Another potential use is making certain privileged operations
unprivileged.  For example, chroot may be considered "safe" if it cannot
affect privileged tasks.

Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
set and AppArmor is in use.  It is fixed in a subsequent patch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Paris <eparis@redhat.com>

v18: updated change desc
v17: using new define values as per 3.4

Conflicts:
	include/linux/prctl.h
	kernel/sys.c
2014-10-31 19:46:07 -07:00
hsuan-chih_chen
fa84206008 mmc: add 5.0 emmc support
bug: 17968808 Kernel change for new eMMC v5.0 parts for FLO/DEB

Change-Id: Ia18152457fe3ff70401b199c267fa37374b9d544
Signed-off-by: hsuan-chih_chen <hsuan-chih_chen@asus.com>
2014-10-14 10:19:32 +08:00
Naveen Ramaraj
0368de3446 mm: Increase number of GFP masks
The __GFP_CMA mask is now placed after all available GFP masks.
With this we need to increase the total number of GFP flags.
Do so accordingly.

Bug: 17494249
CRs-Fixed: 648978
Change-Id: I53f5f064ac16a50ee10c84ff2bb50fdb7e085bd0
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Naveen Ramaraj <nramaraj@codeaurora.org>
2014-10-03 16:20:45 -07:00
Naveen Ramaraj
b72125bf65 lowmemorykiller: Account for highmem during kswapd reclaim
Currenlty most memory reclaim is done through kswapd.
Since kswapd uses a gfp mask of GFP_KERNEL, and because
the lowmemorykiller is zone aware, the lowmemorykiller will
ignore highmem most of the time.
This results in the lowmemorykiller being overly aggressive.

The fix to this issue is to allow the lowmemorykiller to
count highmem when being called by the kswapd if the lowmem
watermarks are satisfied.

Bug: 17494249
Change-Id: I938644584f374763d10d429d835e74daa4854a38
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Naveen Ramaraj <nramaraj@codeaurora.org>
2014-10-03 16:20:24 -07:00
Harsh Vardhan Dwivedi
2dda0e5bde msm: kgsl: Add a flag to mark user mapped GPU buffers
Add a flag in /sys/kernel/debug/kgsl/kgsl-3do/proc/<pid>/mem
file to indicate for each GPU buffer if it is mapped to userspace
in <pid> or not.

CRs-fixed: 634962
Change-Id: I8abda74ef5656aca6b1c0315af8deb77460fa5a9
Signed-off-by: Harsh Vardhan Dwivedi <hdwivedi@codeaurora.org>
2014-08-16 11:45:54 -07:00
Jordan Crouse
78af1e178f msm: kgsl: Mark mmapped objects with VM_DONTCOPY
Add VM_DONTCOPY to the default set of mmap flags to keep VM objects
from being copied on fork() and causing issues.  KGSL file descriptors
copied to a child are not expected to be usable.

Change-Id: Ic0dedbad85c07118a931ccb9f7a6fd0507da3e5a
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-08-16 11:45:53 -07:00
Jordan Crouse
c240a9bc86 msm: kgsl: Don't set VM_IO on mmap()ed GPU memory objects
VM_IO prevents mapped memory from being peeked by ptrace(). That
kind of protection isn't really needed for nominal GPU buffers.
A process given itself up to ptrace() already expects to be
examined so there is no additional risk to let the parent examine
GPU buffers too.  This is done universally now, but there is no
reason why we wouldn't let the process choose which buffers to
keep private in the future.

That said; there is more of a concern about including GPU buffers
in a core dump since that is a more permanent and less secure
record of the memory so add VM_DONTDUMP for all GPU buffers to
protect against that.

CRs-Fixed: 654751
Change-Id: Ic0dedbade91a2ec458bcb27eff3312d4ec6e4389
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-08-16 11:45:52 -07:00
Jordan Crouse
cd2236ad5a msm: kgsl: Use a constant mask for the memops vmflags
We don't need to define a function to return a constant.  Save
ourselves some source code and some .text space too.

Change-Id: Ic0dedbadc72c2fdd473cd4369ed772c84a923a15
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-08-16 11:45:51 -07:00
Shrenuj Bansal
048bc4cbd2 msm: kgsl: Add CMA memory feature to support NOMMU for GPU
Reserves CMA memory for kgsl driver early during bootup and then
uses dma_alloc_coherent() to allocate physically contiguous memory
instead of using the MMU

Change-Id: Ica9b244fe9b9d8a902d670293a0bec2edf81bd5d
Signed-off-by: Shrenuj Bansal <shrenujb@codeaurora.org>
2014-08-16 11:45:50 -07:00
Carter Cooper
9ba9485e95 msm: kgsl: Check constraint state coming out of slumber
Only set the default constraint when we come out of slumber if there
is no current constraint set. The current behavior will always override
the constraint that was set when coming out of slumber.

Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Change-Id: I58a5e2338bbee64e885edf697e83869820be2c22
2014-08-16 11:45:49 -07:00
Carter Cooper
c89e7307de msm: kgsl: Respect power constraints for a full interval
No matter what type of constraint leave it in effect until
the constraint has expired.

Change-Id: I75118823fd484f87dda8e0f26fa3fe1ae12ca07d
Signed-off-by: Lucille Sylester <lsylvest@codeaurora.org>
2014-08-16 11:45:49 -07:00
Carter Cooper
8ee5bb778f msm: kgsl: Add missing power constraint trace
Add missing trace to ensure debug logs show everything

Change-Id: I5da21b15ba498e1266d6c96b700c6c18135f92e9
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
2014-08-16 11:45:48 -07:00
Oleg Perelet
95d88d6199 msm: kgsl: remove power constraint on context destroy
Remove power constraint if parent context is
deleted before constraint expires.

Change-Id: I6a28fec842132733b2e9015333cc4d14c77daa8e
Signed-off-by: Oleg Perelet <operelet@codeaurora.org>
2014-08-16 11:45:47 -07:00
Shrenuj Bansal
de8fe0466c msm: kgsl: Clear the memstore while destroying the context
When creating a context, we add the event group much before
initializing the memstore for that context. Between these events,
its possible that events are registered and retired and the
timestamp read in retire_events() gets us the last timestamp of
the last destroyed context. This results in the processed
timestamp to be greater than the actual retired timestamp in the
memstore which is very problematic for us.

CRs-Fixed: 640550
Change-Id: I2ace6d99e2ce417ba38f6bbbeeb787478eb4e372
Signed-off-by: Shrenuj Bansal <shrenujb@codeaurora.org>
2014-08-16 11:45:46 -07:00
Ed Tam
ea4c32f57c prima: release v3.2.3.22
git://codeaurora.org/external/wlan/prima.git

    d3d0022 wlan : Revision 3.2.3.22
    060e06d wlan: Send directed Probe Request frames only for hidden SSIDs.

Signed-off-by: Ed Tam <etam@google.com>
2014-08-13 11:32:19 -07:00
Minsung Kim
4e847771ec cpufreq: fix sleeping in atomic context when realloc freq_table for all_time_in_state
Commit 40cf2f8 (cpufreq: Persist cpufreq time in state data across hotplug)
causes the following call trace to be spit on boot:

BUG: sleeping function called from invalid context at mm/slub.c:936
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 3.10.9-20140624.172707-eng-gd6c0f69-dirty #50
Backtrace:
[<c0012270>] (dump_backtrace+0x0/0x10c) from [<c001256c>] (show_stack+0x18/0x1c)
 r6:ffff1788 r5:c0c020c0 r4:e609c000 r3:00000000
[<c0012554>] (show_stack+0x0/0x1c) from [<c07a2970>] (dump_stack+0x20/0x28)
[<c07a2950>] (dump_stack+0x0/0x28) from [<c0057678>] (__might_sleep+0x104/0x120)
[<c0057574>] (__might_sleep+0x0/0x120) from [<c00ff000>] (__kmalloc_track_caller+0x144/0x274)
 r6:00000000 r5:e609c000 r4:e6802140
[<c00feebc>] (__kmalloc_track_caller+0x0/0x274) from [<c00da098>] (krealloc+0x58/0xb0)
[<c00da040>] (krealloc+0x0/0xb0) from [<c050266c>] (cpufreq_allstats_create+0x120/0x204)
 r8:e4c4ff00 r7:c0d266b8 r6:0013d620 r5:e4c4e600 r4:00000001
r3:e535d6d0
[<c050254c>] (cpufreq_allstats_create+0x0/0x204) from [<c0502e38>] (cpufreq_stat_notifier_policy+0xb8/0xd0)
[<c0502d80>] (cpufreq_stat_notifier_policy+0x0/0xd0) from [<c00517cc>] (notifier_call_chain+0x4c/0x8c)
 r5:00000000 r4:fffffffe
[<c0051780>] (notifier_call_chain+0x0/0x8c) from [<c00519fc>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0cd4d00 r7:00000002 r6:e609dd7c r5:ffffffff r4:c0d25a4c
r3:ffffffff
[<c00519ac>] (__blocking_notifier_call_chain+0x0/0x68) from [<c0051a34>] (blocking_notifier_call_chain+0x20/0x28)
 r7:c0e24f30 r6:00000000 r5:e53e1e00 r4:e609dd7c
[<c0051a14>] (blocking_notifier_call_chain+0x0/0x28) from [<c0500fec>] (__cpufreq_set_policy+0xc0/0x1d0)
[<c0500f2c>] (__cpufreq_set_policy+0x0/0x1d0) from [<c0501308>] (cpufreq_add_dev_interface+0x20c/0x270)
 r7:00000008 r6:00000000 r5:e53e1e00 r4:e53e1e58
[<c05010fc>] (cpufreq_add_dev_interface+0x0/0x270) from [<c05016a8>] (cpufreq_add_dev+0x33c/0x420)
[<c050136c>] (cpufreq_add_dev+0x0/0x420) from [<c03604a4>] (subsys_interface_register+0x80/0xbc)
[<c0360424>] (subsys_interface_register+0x0/0xbc) from [<c050035c>] (cpufreq_register_driver+0x8c/0x194)

Change-Id: If77a656d0ea60a8fc4083283d104509fa6c07f8f
Signed-off-by: Minsung Kim <ms925.kim@samsung.com>
2014-08-05 19:08:52 +00:00
Ruchi Kandoi
4c944cad3d cpufreq: Persist cpufreq time in state data across hotplug
Cpufreq time_in_state data for all CPUs is made persistent across
hotplug and exposed to userspace via sysfs file
/sys/devices/system/cpu/cpufreq/all_time_in_state

Change-Id: I97cb5de24b6de16189bf8b5df9592d0a6e6ddf32
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
2014-08-05 19:08:33 +00:00
Ruchi Kandoi
451076288b Power: Changes the permission to read only for sysfs file
/sys/kernel/wakeup_reasons/last_resume_reason

Change-Id: If25e8e416ee9726996518b58b6551a61dc1591e3
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
2014-08-05 19:00:47 +00:00
Stephen Smalley
9041172907 Enable setting security contexts on rootfs inodes.
rootfs (ramfs) can support setting of security contexts
by userspace due to the vfs fallback behavior of calling
the security module to set the in-core inode state
for security.* attributes when the filesystem does not
provide an xattr handler.  No xattr handler required
as the inodes are pinned in memory and have no backing
store.

This is useful in allowing early userspace to label individual
files within a rootfs while still providing a policy-defined
default via genfs.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
2014-07-25 15:09:14 -07:00
Hans de Goede
565b635ce8 cgroup: Fix use after free of cgrp (cgrp->css_sets)
Running a 3.4 kernel + Fedora-18 (systemd) userland on my Allwinner A10
(arm cortex a8), I'm seeing repeated, reproducable list_del list corruption
errors when build with CONFIG_DEBUG_LIST, and the backtrace always shows
free_css_set_work as the function making the problematic list_del call.

I've tracked this doen to a use after free of the cgrp struct, specifically
of the cgrp->css_sets list_head, which gets cleared by free_css_set_work.

Since free_css_set_work runs form a workqueue, it is possible for it to not be
done with clearing the list when the cgrp gets free-ed. To avoid this the code
adding the links increases cgrp->count, and the freeing code running from the
workqueue decreases cgrp->count *after* doing list_del, and then if the count
goes to 0 calls cgroup_wakeup_rmdir_waiter().

However cgroup_rmdir() is missing a check for cgrp->count != 0, causing it
to still continue with the rmdir (which leads to the free-ing of the cgrp),
before free_css_set_work is done. Sometimes the free-ed memory is re-used
before free_css_set_work gets around to unlinking link->cgrp_link_list,
triggering the list_del list corruption messages.

This patch fixes this by properly checking for cgrp->count != 0 and waiting
for the cgroup_rmdir_waitq in that case.

Change-Id: I9dbc02a0a75d5dffa1b65d67456e00139dea57c3
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2014-07-24 15:38:09 -07:00
Hans de Goede
0504ca2d5f cgroup: Take css_set_lock from cgroup_css_sets_empty()
As indicated in the comment above cgroup_css_sets_empty it needs the
css_set_lock. But neither of the 2 call points have it, so rather then fixing
the callers just take the lock inside cgroup_css_sets_empty().

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Change-Id: If7aea71824f6d0e3f2cc6c1ce236c3ae6be2037b
2014-07-24 15:37:35 -07:00
Sasha Levin
f7820fdf25 net/l2tp: don't fall back on UDP [get|set]sockopt
The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

  "If we wanted this to work, it'd have to look up the tunnel and then
   use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Phil Turnbull <phil.turnbull@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ed Tam <etam@google.com>
2014-07-24 15:36:39 -07:00
Lucille Sylvester
6dbd07cf20 msm: kgsl: fix MIN power constraints in non-trivial cases
Switching from a MAX to a MIN power constraint in the same
context is broken.  So is switching from a MAX context constraint
to a MIN constraint on a different context.  Clarify constraint
ownership, update the timestamps accordingly, and reset constraints
when changed by the context.

Change-Id: Id27dbc20e2aac994101817af3857451c1703219e
Signed-off-by: Lucille Sylvester <lsylvest@codeaurora.org>
2014-07-18 16:04:27 -06:00
Jordan Crouse
8501cd2ddb msm: kgsl: Protect CP_STATE_DEBUG_INDEX
Put CP_STATE_DEBUG_INDEX and CP_STATE_DEBUG_DATA under protection
to keep it from being written from an IB1. Doing so however opens
up a subtle "feature" in the microcode: memory read opcodes turn off
protected mode in the microcode to do the read and then turns it
back on regardless of the initial state. This is a problem if the
memory read happens while protected mode is turned off and then we
try to access a protected register which then complains and goes boom.

To account for this irregularity explicitly turn back off protected
mode in all the places where we know this will be a problem.

Change-Id: Ic0dedbad1397ca9b80132241ac006560a615e042
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-07-18 16:04:27 -06:00
Jordan Crouse
442fa84040 msm: kgsl: Print the register that causes a protected mode error
When we get a protected mode error print out the register information
that caused the exception.

Change-Id: Ic0dedbad4f586c5715669226619b51665ef9681f
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-07-18 16:04:26 -06:00
Jordan Crouse
92a940c808 msm: kgsl: Protect SMMU registers from being written by the GPU
Put the SMMU register range in protected mode to shield them from
IB1/IB2 writes from userspace.

CRs-Fixed: 599971
Change-Id: Ic0dedbad8c03fc1c54ff73221231e2440d3c34dd
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-07-18 16:04:26 -06:00
Jordan Crouse
7810303daf msm: kgsl: Enable protected register mode for A3XX
Turn on protected register mode for the A3XX GPU family and add 0x63
(RBBM_INT_0_MASK) to the list of protected registers.

Change-Id: Ic0dedbad10ebfa6eb6d3d815b5aa9b6b6f0e8e35
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-07-18 16:04:26 -06:00
Jordan Crouse
407efd4e1c msm: kgsl: Mark the IOMMU setstate memory as read only
Mark the IOMMU setstate memory as read only in the pagetable.

Change-Id: Ic0dedbadb19e499c749cd744c3e89be3bcb4c2a2
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-07-18 16:04:26 -06:00
Jeff Boody
16356049f5 msm: kgsl: update KGSL to match most recent version
Over time chery-picks for KGSL have been skipped or
have been resolved differently between branches. As
a result, this branch of KGSL has become increasingly
difficult to maintain due to merge conflicts. With a
few exceptions KGSL should match the msm-3.4 mainline
exactly. To rectify the situation, this change brings
KGSL up-to-date with the msm-3.4 mainline as a bulk
change because cherry-picks are not practical.

Change-Id: I53f9f7fbf4942e147dea486ff5dbf179af75ea8c
Signed-off-by: Jeff Boody <jboody@codeaurora.org>
2014-07-18 16:04:26 -06:00
Linus Torvalds
edcc061a1b Fix a few incorrectly checked [io_]remap_pfn_range() calls
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
really should use the vm_iomap_memory() helper.  This trivially converts
two of them to the helper, and comments about why the third one really
needs to continue to use remap_pfn_range(), and adds the missing size
check.

CRs-Fixed: 570735
Change-Id: I927a67ea80fea5ed706749ead9defb1e72633952
Reported-by: Nico Golde <nico@ngolde.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.
Git-commit: 7314e613d5
Git-repo: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
[pratibha@codeaurora.org:resolve trivial merge conflicts]
Signed-off-by: Pratibhasagar V <pratibha@codeaurora.org>

Signed-off-by: Ed Tam <etam@google.com>
2014-07-15 17:06:52 -07:00
Uwe Kleine-König
cd59f95175 uio: provide vm access to UIO_MEM_PHYS maps
This makes it possible to let gdb access mappings of the process that is
being debugged.

uio_mmap_logical was moved and uio_vm_ops renamed to group related code
and differentiate to new stuff.

CRs-Fixed: 570735
Change-Id: I8a5ff343727cc58fedfeb73f3466cc9a7f153e84
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Git-commit: 7294151d05
Git-repo: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Pratibhasagar V <pratibha@codeaurora.org>
Signed-off-by: Ed Tam <etam@google.com>
2014-07-15 17:06:19 -07:00
Mekala Natarajan
93d5429eb1 board-flo-storage: Add support for UIO devices for RemoteFS
The RemoteFS server now uses the UIO driver. Add the UIO
device for APQ8064.

Bug: 12784954

Signed-off-by: Mekala Natarajan <mekalan@codeaurora.org>
2014-07-16 00:04:03 +00:00