Commit graph

307680 commits

Author SHA1 Message Date
Daniel Rosenberg
3edd0f78ef sdcardfs: Add gid and mask to private mount data
Adds support for mount2, remount2, and the functions
to allocate/clone/copy the private data

The next patch will switch over to actually using it.

Change-Id: I8a43da26021d33401f655f0b2784ead161c575e3
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:08 +03:00
Daniel Rosenberg
8967b8cde8 sdcardfs: User new permission2 functions
Change-Id: Ic7e0fb8fdcebb31e657b079fe02ac834c4a50db9
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:08 +03:00
Daniel Rosenberg
1620d1d7d4 vfs: Add setattr2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.

Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:07 +03:00
Daniel Rosenberg
19a3f7c232 vfs: Add permission2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.

Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:07 +03:00
Daniel Rosenberg
2686aceaa7 vfs: Allow filesystems to access their private mount data
Now we pass the vfsmount when mounting and remounting.
This allows the filesystem to actually set up the mount
specific data, although we can't quite do anything with
it yet. show_options is expanded to include data that
lives with the mount.

To avoid changing existing filesystems, these have
been added as new vfs functions.

Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:06 +03:00
Daniel Rosenberg
ccbd24c7a0 mnt: Add filesystem private data to mount points
This starts to add private data associated directly
to mount points. The intent is to give filesystems
a sense of where they have come from, as a means of
letting a filesystem take different actions based on
this information.

Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:06 +03:00
Daniel Rosenberg
313cdb2651 ANDROID: sdcardfs: Fix backport issue for 3.10
Don't use make_kuid/make_guid

Bug: 30954918
Change-Id: I56de640771872aeeae5a69c42bf2ce8a5cfa413f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:05 +03:00
Daniel Rosenberg
87b619e0cb sdcardfs: Move directory unlock before touch
This removes a deadlock under low memory conditions.
filp_open can call lookup_slow, which will attempt to
lock the parent.

Change-Id: I940643d0793f5051d1e79a56f4da2fa8ca3d8ff7
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:05 +03:00
alvin_liang
b94d192e3e sdcardfs: fix external storage exporting incorrect uid
Symptom: App cannot write into per-app folder
Root Cause: sdcardfs exports incorrect uid
Solution: fix uid
Project: All
Note:
Test done by RD: passed

Change-Id: Iff64f6f40ba4c679f07f4426d3db6e6d0db7e3ca
2017-09-22 19:12:04 +03:00
Daniel Rosenberg
00fb55c68d sdcardfs: Added top to sdcardfs_inode_info
Adding packages to the package list and moving files
takes a large amount of locks, and is currently a
heavy operation. This adds a 'top' field to the
inode_info, which points to the inode for the top
most directory whose owner you would like to match.

On permission checks and get_attr, we look up the
owner based on the information at top. When we change
a package mapping, we need only modify the information
in the corresponding top inode_info's. When renaming,
we must ensure top is set correctly in all children.
This happens when an app specific folder gets moved
outside of the folder for that app.

Change-Id: Ib749c60b568e9a45a46f8ceed985c1338246ec6c
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:04 +03:00
Daniel Rosenberg
853f8e0523 sdcardfs: Switch package list to RCU
Switched the package id hashmap to use RCU.

Change-Id: I9fdcab279009005bf28536247d11e13babab0b93
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:03 +03:00
Daniel Rosenberg
b7ae873970 sdcardfs: Fix locking for permission fix up
Iterating over d_subdirs requires taking d_lock.
Removed several unneeded locks.

Change-Id: I5b1588e54c7e6ee19b756d6705171c7f829e2650
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:03 +03:00
Daniel Rosenberg
82f20f9741 sdcardfs: Check for other cases on path lookup
This fixes a bug where the first lookup of a
file or folder created under a different view
would not be case insensitive. It will now
search through for a case insensitive match
if the initial lookup fails.

Bug:28024488
Change-Id: I4ff9ce297b9f2f9864b47540e740fd491c545229
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:03 +03:00
Daniel Rosenberg
feccf66542 sdcardfs: override umask on mkdir and create
The mode on files created on the lower fs should
not be affected by the umask of the calling
task's fs_struct. Instead, we create a copy
and modify it as needed. This also lets us avoid
the string shenanigans around .nomedia files.

Bug: 27992761
Change-Id: Ia3a6e56c24c6e19b3b01c1827e46403bb71c2f4c
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:02 +03:00
Julia Lawall
b46375c331 ANDROID: sdcardfs: fix itnull.cocci warnings
List_for_each_entry has the property that the first argument is always
bound to a real list element, never NULL, so testing dentry is not needed.

Generated by: scripts/coccinelle/iterators/itnull.cocci

Change-Id: I51033a2649eb39451862b35b6358fe5cfe25c5f5
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2017-09-22 19:12:02 +03:00
Daniel Rosenberg
ccf7c04945 sdcardfs: Truncate packages_gid.list on overflow
packages_gid.list was improperly returning the wrong
count. Use scnprintf instead, and inform the user that
the list was truncated if it is.

Bug: 30013843
Change-Id: Ida2b2ef7cd86dd87300bfb4c2cdb6bfe2ee1650d
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:01 +03:00
Daniel Rosenberg
611237fca8 vfs: change d_canonical_path to take two paths
bug: 23904372
Change-Id: I4a686d64b6de37decf60019be1718e1d820193e6
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:01 +03:00
Daniel Rosenberg
f5fea6a938 fuse: Add support for d_canonical_path
Allows FUSE to report to inotify that it is acting
as a layered filesystem. The userspace component
returns a string representing the location of the
underlying file. If the string cannot be resolved
into a path, the top level path is returned instead.

bug: 23904372
Change-Id: Iabdca0bbedfbff59e9c820c58636a68ef9683d9f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:00 +03:00
Daniel Rosenberg
10db9d59bf sdcardfs: remove unneeded __init and __exit
Change-Id: I2a2d45d52f891332174c3000e8681c5167c1564f
2017-09-22 19:12:00 +03:00
Daniel Rosenberg
679046d6e7 sdcardfs: Remove unused code
Change-Id: Ie97cba27ce44818ac56cfe40954f164ad44eccf6
2017-09-22 19:12:00 +03:00
Daniel Rosenberg
158b6be502 sdcardfs: remove effectless config option
CONFIG_SDCARD_FS_CI_SEARCH only guards a define for
LOOKUP_CASE_INSENSITIVE, which is never used in the
kernel. Remove both, along with the option matching
that supports it.

Change-Id: I363a8f31de8ee7a7a934d75300cc9ba8176e2edf
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:11:59 +03:00
Daniel Rosenberg
49e4c3ad85 inotify: Fix erroneous update of bit count
Patch "vfs: add d_canonical_path for stacked filesystem support"
erroneously updated the ALL_INOTIFY_BITS count. This changes it back

Change-Id: Idb04edc736da276159d30f04c40cff9d6b1e070f
2017-09-22 19:11:59 +03:00
Daniel Rosenberg
3bf0ab5c8d sdcardfs: Add support for d_canonicalize
Change-Id: I5d6f0e71b8ca99aec4b0894412f1dfd1cfe12add
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:11:58 +03:00
Daniel Rosenberg
9f8d208e0e vfs: add d_canonical_path for stacked filesystem support
Inotify does not currently know when a filesystem
is acting as a wrapper around another fs. This means
that inotify watchers will miss any modifications to
the base file, as well as any made in a separate
stacked fs that points to the same file.
d_canonical_path solves this problem by allowing the fs
to map a dentry to a path in the lower fs. Inotify
can use it to find the appropriate place to watch to
be informed of all changes to a file.

Change-Id: I09563baffad1711a045e45c1bd0bd8713c2cc0b6
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:11:58 +03:00
Daniel Rosenberg
8e6b0f6c8c sdcardfs: Bring up to date with Android M permissions:
In M, the workings of sdcardfs were changed significantly.
This brings sdcardfs into line with the changes.

Change-Id: I10e91a84a884c838feef7aa26c0a2b21f02e052e
2017-09-22 19:11:57 +03:00
Daniel Campello
1dbc72eb35 sdcardfs: Changed type-cast in packagelist management
Change-Id: Ic8842de2d7274b7a5438938d2febf5d8da867148
2017-09-22 19:11:57 +03:00
fluxi
fd2464db10 sdcardfs: Port to 3.4
Analog port to 3.10 by Daniel Campello <campello@google.com>.

Change-Id: I0b05890cdd4332c5cfc2ffdf66a3f3a7890cce35
2017-09-22 19:11:56 +03:00
Daniel Campello
6b980874d5 Included sdcardfs source code for kernel 3.0
Only included the source code as is for kernel 3.0. Following patches
take care of porting this file system to version 3.10.

Change-Id: I09e76db77cd98a059053ba5b6fd88572a4b75b5b
Signed-off-by: Daniel Campello <campello@google.com>
2017-09-22 19:11:56 +03:00
Al Viro
be084ce9f1 move d_rcu from overlapping d_child to overlapping d_alias
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Change-Id: I85366e6ce0423ec9620bcc9cd3e7695e81aa1171
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[lizf: Backported to 3.4:
 - adjust context
 - need one more name change in debugfs]
2017-09-22 19:11:55 +03:00
Al Viro
f787204b2f get rid of kern_path_parent()
all callers want the same thing, actually - a kinda-sorta analog of
kern_path_create().  I.e. they want parent vfsmount/dentry (with
->i_mutex held, to make sure the child dentry is still their child)
+ the child dentry.

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>

Change-Id: I58cc7b0a087646516db9af69962447d27fb3ee8b
2017-09-22 19:11:54 +03:00
Artem Borisov
722370c1a0 flo: defconfig: don't disable NET_NS
Change-Id: I4082679c3f5d5d0f48f9a257232f06b066b86c5f
2017-09-14 21:07:54 +03:00
David S. Miller
9d814faae9 net: Do delayed neigh confirmation.
When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Change-Id: I75acf86d11c9125c8ce2c2144e4a0fb717b29f03
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 13:38:10 +03:00
Eldad Zack
0c612305c0 include/net/dst.h: neaten asterisk placement
Fix code style - place the asterisk where it belongs.

Change-Id: Icf8c4e16bfa4fa4a10843202e2ad842dff08f94c
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 13:38:10 +03:00
Eric Dumazet
4998f1cbf7 tcp: fix possible NULL dereference in tcp_vX_send_reset()
After commit ca777eff51 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.

If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.

Change-Id: I1de6666d0c0503076cb9ca28498b36ea7fb3305b
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj <jasa.bartelj@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: ca777eff51 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 13:38:09 +03:00
Adrian DC
a2b7343846 defconfig: flo: enable CONFIG_QUOTA
* Follows I7cb6b641f72085e69b90dca11d2ea68adcd02390

Change-Id: Ifbbadf90d91ac8e40dfbf17f7e1ed30c774939fc
2017-09-01 13:38:09 +03:00
Vladimir Kondratiev
45b0684fd1 BACKPORT: {nl,cfg}80211: support high bitrates
Until now, a u16 value was used to represent bitrate value.
With VHT bitrates this becomes too small.

Introduce a new 32-bit bitrate attribute. nl80211 will report
both the new and the old attribute, unless the bitrate doesn't
fit into the old u16 attribute in which case only the new one
will be reported.

User space tools encouraged to prefer the 32-bit attribute, if
available (since it won't be available on older kernels.)

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
[reword commit message and comments a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

[AdrianDC] Includes if () improvements from commit
           "nl80211/cfg80211: add VHT MCS support"
           by Johannes Berg, Fri, 9 Nov 2012

Change-Id: Ib5823475ba368954e51ac4129b400a58fd74e28d
Signed-off-by: Adrian DC <radian.dc@gmail.com>
2017-09-01 13:38:08 +03:00
Andy Lutomirski
b6dd4f6779 UPSTREAM: capabilities: ambient capabilities
Credit where credit is due: this idea comes from Christoph Lameter with
a lot of valuable input from Serge Hallyn.  This patch is heavily based
on Christoph's patch.

===== The status quo =====

On Linux, there are a number of capabilities defined by the kernel.  To
perform various privileged tasks, processes can wield capabilities that
they hold.

Each task has four capability masks: effective (pE), permitted (pP),
inheritable (pI), and a bounding set (X).  When the kernel checks for a
capability, it checks pE.  The other capability masks serve to modify
what capabilities can be in pE.

Any task can remove capabilities from pE, pP, or pI at any time.  If a
task has a capability in pP, it can add that capability to pE and/or pI.
If a task has CAP_SETPCAP, then it can add any capability to pI, and it
can remove capabilities from X.

Tasks are not the only things that can have capabilities; files can also
have capabilities.  A file can have no capabilty information at all [1].
If a file has capability information, then it has a permitted mask (fP)
and an inheritable mask (fI) as well as a single effective bit (fE) [2].
File capabilities modify the capabilities of tasks that execve(2) them.

A task that successfully calls execve has its capabilities modified for
the file ultimately being excecuted (i.e.  the binary itself if that
binary is ELF or for the interpreter if the binary is a script.) [3] In
the capability evolution rules, for each mask Z, pZ represents the old
value and pZ' represents the new value.  The rules are:

  pP' = (X & fP) | (pI & fI)
  pI' = pI
  pE' = (fE ? pP' : 0)
  X is unchanged

For setuid binaries, fP, fI, and fE are modified by a moderately
complicated set of rules that emulate POSIX behavior.  Similarly, if
euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
(primary, fP and fI usually end up being the full set).  For nonroot
users executing binaries with neither setuid nor file caps, fI and fP
are empty and fE is false.

As an extra complication, if you execute a process as nonroot and fE is
set, then the "secure exec" rules are in effect: AT_SECURE gets set,
LD_PRELOAD doesn't work, etc.

This is rather messy.  We've learned that making any changes is
dangerous, though: if a new kernel version allows an unprivileged
program to change its security state in a way that persists cross
execution of a setuid program or a program with file caps, this
persistent state is surprisingly likely to allow setuid or file-capped
programs to be exploited for privilege escalation.

===== The problem =====

Capability inheritance is basically useless.

If you aren't root and you execute an ordinary binary, fI is zero, so
your capabilities have no effect whatsoever on pP'.  This means that you
can't usefully execute a helper process or a shell command with elevated
capabilities if you aren't root.

On current kernels, you can sort of work around this by setting fI to
the full set for most or all non-setuid executable files.  This causes
pP' = pI for nonroot, and inheritance works.  No one does this because
it's a PITA and it isn't even supported on most filesystems.

If you try this, you'll discover that every nonroot program ends up with
secure exec rules, breaking many things.

This is a problem that has bitten many people who have tried to use
capabilities for anything useful.

===== The proposed change =====

This patch adds a fifth capability mask called the ambient mask (pA).
pA does what most people expect pI to do.

pA obeys the invariant that no bit can ever be set in pA if it is not
set in both pP and pI.  Dropping a bit from pP or pI drops that bit from
pA.  This ensures that existing programs that try to drop capabilities
still do so, with a complication.  Because capability inheritance is so
broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
then calling execve effectively drops capabilities.  Therefore,
setresuid from root to nonroot conditionally clears pA unless
SECBIT_NO_SETUID_FIXUP is set.  Processes that don't like this can
re-add bits to pA afterwards.

The capability evolution rules are changed:

  pA' = (file caps or setuid or setgid ? 0 : pA)
  pP' = (X & fP) | (pI & fI) | pA'
  pI' = pI
  pE' = (fE ? pP' : pA')
  X is unchanged

If you are nonroot but you have a capability, you can add it to pA.  If
you do so, your children get that capability in pA, pP, and pE.  For
example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
automatically bind low-numbered ports.  Hallelujah!

Unprivileged users can create user namespaces, map themselves to a
nonzero uid, and create both privileged (relative to their namespace)
and unprivileged process trees.  This is currently more or less
impossible.  Hallelujah!

You cannot use pA to try to subvert a setuid, setgid, or file-capped
program: if you execute any such program, pA gets cleared and the
resulting evolution rules are unchanged by this patch.

Users with nonzero pA are unlikely to unintentionally leak that
capability.  If they run programs that try to drop privileges, dropping
privileges will still work.

It's worth noting that the degree of paranoia in this patch could
possibly be reduced without causing serious problems.  Specifically, if
we allowed pA to persist across executing non-pA-aware setuid binaries
and across setresuid, then, naively, the only capabilities that could
leak as a result would be the capabilities in pA, and any attacker
*already* has those capabilities.  This would make me nervous, though --
setuid binaries that tried to privilege-separate might fail to do so,
and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
unexpected side effects.  (Whether these unexpected side effects would
be exploitable is an open question.) I've therefore taken the more
paranoid route.  We can revisit this later.

An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
ambient capabilities.  I think that this would be annoying and would
make granting otherwise unprivileged users minor ambient capabilities
(CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
it is with this patch.

===== Footnotes =====

[1] Files that are missing the "security.capability" xattr or that have
unrecognized values for that xattr end up with has_cap set to false.
The code that does that appears to be complicated for no good reason.

[2] The libcap capability mask parsers and formatters are dangerously
misleading and the documentation is flat-out wrong.  fE is *not* a mask;
it's a single bit.  This has probably confused every single person who
has tried to use file capabilities.

[3] Linux very confusingly processes both the script and the interpreter
if applicable, for reasons that elude me.  The results from thinking
about a script's file capabilities and/or setuid bits are mostly
discarded.

Preliminary userspace code is here, but it needs updating:
https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2

Here is a test program that can be used to verify the functionality
(from Christoph):

/*
 * Test program for the ambient capabilities. This program spawns a shell
 * that allows running processes with a defined set of capabilities.
 *
 * (C) 2015 Christoph Lameter <cl@linux.com>
 * Released under: GPL v3 or later.
 *
 *
 * Compile using:
 *
 *	gcc -o ambient_test ambient_test.o -lcap-ng
 *
 * This program must have the following capabilities to run properly:
 * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
 *
 * A command to equip the binary with the right caps is:
 *
 *	setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
 *
 *
 * To get a shell with additional caps that can be inherited by other processes:
 *
 *	./ambient_test /bin/bash
 *
 *
 * Verifying that it works:
 *
 * From the bash spawed by ambient_test run
 *
 *	cat /proc/$$/status
 *
 * and have a look at the capabilities.
 */

/*
 * Definitions from the kernel header files. These are going to be removed
 * when the /usr/include files have these defined.
 */

static void set_ambient_cap(int cap)
{
	int rc;

	capng_get_caps_process();
	rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
	if (rc) {
		printf("Cannot add inheritable cap\n");
		exit(2);
	}
	capng_apply(CAPNG_SELECT_CAPS);

	/* Note the two 0s at the end. Kernel checks for these */
	if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
		perror("Cannot set cap");
		exit(1);
	}
}

int main(int argc, char **argv)
{
	int rc;

	set_ambient_cap(CAP_NET_RAW);
	set_ambient_cap(CAP_NET_ADMIN);
	set_ambient_cap(CAP_SYS_NICE);

	printf("Ambient_test forking shell\n");
	if (execv(argv[1], argv + 1))
		perror("Cannot exec");

	return 0;
}

Signed-off-by: Christoph Lameter <cl@linux.com> # Original author
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Aaron Jones <aaronmdjones@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Markku Savela <msa@moth.iki.fi>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 58319057b7847667f0c9585b9de0e8932b0fdb08)

Bug: 31038224
Change-Id: I88bc5caa782dc6be23dc7e839ff8e11b9a903f8c
Signed-off-by: Jorge Lucangeli Obes <jorgelo@google.com>
2017-09-01 13:38:08 +03:00
Lorenzo Colitti
966db7310d net: core: add UID to flows, rules, and routes
- Define a new FIB rule attributes, FRA_UID_RANGE, to describe a
  range of UIDs.
- Define a RTA_UID attribute for per-UID route lookups and dumps.
- Support passing these attributes to and from userspace via
  rtnetlink. The value INVALID_UID indicates no UID was
  specified.
- Add a UID field to the flow structures.

[Backport of net-next 622ec2c9d52405973c9f1ca5116eb1c393adfc7d]

Bug: 16355602
Change-Id: Iea98e6fedd0fd4435a1f4efa3deb3629505619ab
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 13:38:07 +03:00
Eric W. Biederman
d621e5dc9c userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Change-Id: Ifa426537c47cce669099cc96e80b17e1d814457b
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-09-01 13:38:07 +03:00
Eric W. Biederman
8b4596c91f userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
Start distinguishing between internal kernel uids and gids and
values that userspace can use.  This is done by introducing two
new types: kuid_t and kgid_t.  These types and their associated
functions are infrastructure are declared in the new header
uidgid.h.

Ultimately there will be a different implementation of the mapping
functions for use with user namespaces.  But to keep it simple
we introduce the mapping functions first to separate the meat
from the mechanical code conversions.

Export overflowuid and overflowgid so we can use from_kuid_munged
and from_kgid_munged in modular code.

Change-Id: Id934f981c91aa9c656e045d4f45f958c7fc2f1b0
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-09-01 13:38:06 +03:00
Simon Shields
f33f0fc3ed Revert "net: core: Support UID-based routing."
This reverts commit dbadd302523011ab746840885b93e8bd0b2ff499.

Change-Id: I8d996d5a84a3203443987c844d10534cd0dfc2c1
2017-08-27 19:09:20 +03:00
Simon Shields
8ab95e11f6 nl80211: include NL80211_ATTR_MAC in iface info
This is expected by AOSP 8.0, and was added in commit
98104fdeda upstream.

Change-Id: Ia93df0f8d549d8ce1a62a481a027640f6d2e7ff4
2017-08-27 19:07:23 +03:00
Greg Hackmann
84a377cc19 timerfd: support CLOCK_BOOTTIME clock
Add CLOCK_BOOTTIME support to timerfd

Change-Id: I14dee6d1104f15a05f463a632268ac4564753faf
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2017-08-27 19:07:23 +03:00
Todd Poynor
9ae9589197 timerfd: add alarm timers
Add support for clocks CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM.

Change-Id: Iafc8445d3d7ffb35110c860f1607bf03f1edb895
Signed-off-by: Todd Poynor <toddpoynor@google.com>
2017-08-27 19:07:22 +03:00
Adrian DC
b37223073f defconfig: flo: Enable CONFIG_ANDROID_BINDER_IPC_32BIT
* Use the old 32-bits binder API as our kernel runs on
    ARM 32bits, it should not compile  elements in 64bits,
    which would end in failures with following output:
    "undefined reference to `__get_user_bad'"

Change-Id: Ib27c00f19847d19ad574e07d6f3c0a53cfeebb32
2017-08-27 19:06:59 +03:00
Artem Borisov
c01d8676bd drivers: staging: binder: add missing SIZE_MAX declaration
Change-Id: I3139d434257b909386da1bd0d892310b2970fb40
2017-08-25 20:00:24 +03:00
Artem Borisov
305347c3b1 drivers: rtc: remove leftovers of old alarm drivers
Change-Id: I9954b5dfaa8611b225246a435098f43d5c7fe7dc
2017-08-25 20:00:24 +03:00
John Stultz
5d231460e1 hrtimer: Provide clock_was_set_delayed()
This is a backport of f55a6faa38

clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
  rid of all this ifdeffery ASAP ]

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-25 20:00:23 +03:00
Feng Tang
21278d1f24 clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NONSTOP
Some x86 processors have a TSC clocksource, which continues to run
even when system is suspended. Also most OMAP platforms have a
32 KHz timer which has similar capability. Add a feature flag so that
it could be utilized.

CRs-fixed: 536881
Change-Id: I863678fec5a3021158240f2c5b144c552ba0b16a
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 5caf463625
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2017-08-25 20:00:23 +03:00
Thomas Gleixner
0cf6cbf5c2 tick: Upstream fixes
* Addresses the issue of timers being scheduled on offline CPUs.

tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline

commit 5b39939a4 (nohz: Move ts->idle_calls incrementation into strict
idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
to bail out when the cpu is offline. That's causing subsequent
failures as an offline CPU is supposed to die and not to fiddle with
nohz magic.

Return false in can_stop_idle_tick() if the cpu is offline.

Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

timers: Consolidate base->next_timer update

Another bunch of mindlessly copied code. All callers of
internal_add_timer() except the recascading code updates
base->next_timer.

Move this into internal_add_timer() and let the cascading code call
__internal_add_timer().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20120525214819.189946224@linutronix.de

timers: Create detach_if_pending() and use it

Most callers of detach_timer() have the same pattern around
them. Check whether the timer is pending and eventually updating
base->next_timer.

Create detach_if_pending() and replace the duplicated code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20120525214819.131246037@linutronix.de

timers: Add accounting of non deferrable timers

The code in get_next_timer_interrupt() is suboptimal as it has to run
through the cascade to find the next expiring timer. On a completely
idle core we should only do that when there is an active timer
enqueued and base->next_timer does not give us a fast answer.

Add accounting of the active timers to the now consolidated
attach/detach code. I deliberately avoided sanity checks because the
code is fully symetric and any fiddling with timers w/o using the API
functions will lead to cute explosions anyway. ulong is big enough
even on 32bit and if we really run into the situation to have more
than 1<<32 timers enqueued there, then we are definitely not in a
state to go idle and run through that code.

This allows us to fix another shortcoming of get_next_timer_interrupt().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20120525214819.236377028@linutronix.de

timers: Improve get_next_timer_interrupt()

Gilad reported at

 http://lkml.kernel.org/r/1336056962-10465-2-git-send-email-gilad@benyossef.com

"Current timer code fails to correctly return a value meaning that
 there is no future timer event, with the result that the timer keeps
 getting re-armed in HZ one shot mode even when we could turn it off,
 generating unneeded interrupts.

 What is happening is that when __next_timer_interrupt() wishes
 to return a value that signifies "there is no future timer
 event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA).

 However, the code in tick_nohz_stop_sched_tick(), which called
 __next_timer_interrupt() via get_next_timer_interrupt(),
 compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA)
 to see if the timer needs to be re-armed.

 base->timer_jiffies != last_jiffies and so tick_nohz_stop_sched_tick()
 interperts the return value as indication that there is a distant
 future event 12 days from now and programs the timer to fire next
 after KTIME_MAX nsecs instead of avoiding to arm it. This ends up
 causing a needless interrupt once every KTIME_MAX nsecs."

Fix this by using the new active timer accounting. This avoids scans
when no active timer is enqueued completely, so we don't have to rely
on base->timer_next and base->timer_jiffies anymore.

Change-Id: I874ee5e5f837a228cebf5e6a084baf520d33cd0f
Reported-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20120525214819.317535385@linutronix.de
2017-08-25 20:00:22 +03:00