Commit graph

1717 commits

Author SHA1 Message Date
Trond Myklebust
6c6dff6553 SUNRPC: Prevent races in xs_abort_connection()
commit 4bc1e68ed6 upstream.

The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().

All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-31 10:02:57 -07:00
Trond Myklebust
dedf1c2d17 Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
commit b9d2bb2ee5 upstream.

This reverts commit 55420c24a0.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-31 10:02:57 -07:00
Trond Myklebust
ea2887242a SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT
commit d0bea455dd upstream.

This is needed to ensure that we call xprt_connect() upon the next
call to call_connect().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-31 10:02:57 -07:00
Trond Myklebust
9f659caf90 SUNRPC: Get rid of the xs_error_report socket callback
commit f878b657ce upstream.

Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.

Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-31 10:02:56 -07:00
Trond Myklebust
9b62355a6d SUNRPC: Fix a UDP transport regression
commit f39c1bfb5a and
commit 84e28a307e upstream.

Commit 43cedbf0e8 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28 10:14:13 -07:00
Sasha Levin
baecc6ef79 SUNRPC: Prevent kernel stack corruption on long values of flush
commit 212ba90696 upstream.

The buffer size in read_flush() is too small for the longest possible values
for it. This can lead to a kernel stack corruption:

[   43.047329] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff833e64b4
[   43.047329]
[   43.049030] Pid: 6015, comm: trinity-child18 Tainted: G        W    3.5.0-rc7-next-20120716-sasha #221
[   43.050038] Call Trace:
[   43.050435]  [<ffffffff836c60c2>] panic+0xcd/0x1f4
[   43.050931]  [<ffffffff833e64b4>] ? read_flush.isra.7+0xe4/0x100
[   43.051602]  [<ffffffff810e94e6>] __stack_chk_fail+0x16/0x20
[   43.052206]  [<ffffffff833e64b4>] read_flush.isra.7+0xe4/0x100
[   43.052951]  [<ffffffff833e6500>] ? read_flush_pipefs+0x30/0x30
[   43.053594]  [<ffffffff833e652c>] read_flush_procfs+0x2c/0x30
[   43.053596]  [<ffffffff812b9a8c>] proc_reg_read+0x9c/0xd0
[   43.053596]  [<ffffffff812b99f0>] ? proc_reg_write+0xd0/0xd0
[   43.053596]  [<ffffffff81250d5b>] do_loop_readv_writev+0x4b/0x90
[   43.053596]  [<ffffffff81250fd6>] do_readv_writev+0xf6/0x1d0
[   43.053596]  [<ffffffff812510ee>] vfs_readv+0x3e/0x60
[   43.053596]  [<ffffffff812511b8>] sys_readv+0x48/0xb0
[   43.053596]  [<ffffffff8378167d>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28 10:14:13 -07:00
Trond Myklebust
b77a7a0e9f SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT
commit a519fc7a70 upstream.

Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-21 09:27:58 -07:00
J. Bruce Fields
234c04ccc3 svcrpc: sends on closed socket should stop immediately
commit f06f00a24d upstream.

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14 10:00:19 -07:00
J. Bruce Fields
973caa9ec6 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
commit d10f27a750 upstream.

The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14 10:00:19 -07:00
J. Bruce Fields
389aec3383 svcrpc: fix BUG() in svc_tcp_clear_pages
commit be1e44441a upstream.

Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY.  However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-14 10:00:19 -07:00
Stanislav Kinsbursky
d281dc7448 SUNRPC: return negative value in case rpcbind client creation error
commit caea33da89 upstream.

Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15 08:10:05 -07:00
Joe Perches
613975f7a6 sunrpc: clnt: Add missing braces
commit cac5d07e3c upstream.

Add a missing set of braces that commit 4e0038b6b2
("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
forgot.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15 08:10:04 -07:00
Jeff Layton
032da6abbc nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
commit 5cf02d09b5 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09 08:31:40 -07:00
Stanislav Kinsbursky
ee92389156 SUNRPC: move per-net operations from svc_destroy()
upstream commit 786185b5f8.

The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.

Notes:

1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.

2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 09:04:39 -07:00
Stanislav Kinsbursky
10762419ca SUNRPC: new svc_bind() routine introduced
upstream commit 9793f7c889.

This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16 09:04:39 -07:00
Jeff Layton
b7013d0a16 rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
commit 92123e068e upstream.

In the event that we don't have a dentry for a rpc_pipefs pipe, we still
need to allow the queue_timeout job to clean out the queue. There's just
no waitq to wake up in that event.

Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl>
Reported-by: Joerg Platte <jplatte@naasa.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 11:36:55 -07:00
Trond Myklebust
5f1d81e9fb NFSv4.1: Fix a request leak on the back channel
commit b3b02ae586 upstream.

If the call to svc_process_common() fails, then the request
needs to be freed before we can exit bc_svc_process.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 11:36:54 -07:00
Trond Myklebust
c7faf5bbae sunrpc: fix loss of task->tk_status after rpc_delay call in xprt_alloc_slot
commit 1afeaf5c29 upstream.

xprt_alloc_slot will call rpc_delay() to make the task wait a bit before
retrying when it gets back an -ENOMEM error from xprt_dynamic_alloc_slot.
The problem is that rpc_delay will clear the task->tk_status, causing
call_reserveresult to abort the task.

The solution is simply to let call_reserveresult handle the ENOMEM error
directly.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10 00:36:09 +09:00
Steve Dickson
7bdf7415a6 auth_gss: the list of pseudoflavors not being parsed correctly
gss_mech_list_pseudoflavors() parses a list of registered mechanisms.
On that list contains a list of pseudo flavors which was not being
parsed correctly, causing only the first pseudo flavor to be found.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-03 12:35:33 -04:00
Trond Myklebust
cbbb34498f SUNRPC: RPC client must use the current utsname hostname string
Now that the rpc client is namespace aware, it needs to use the
utsname of the process that created it instead of using the
init_utsname. Both rpc_new_client and rpc_clone_client need to
be fixed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-04-30 11:58:51 -04:00
Stanislav Kinsbursky
ea8cfa0679 SUNRPC: traverse clients tree on PipeFS event
v2: recursion was replaced by loop

If client is a clone, then it's parent can not be in the list.
But parent's Pipefs dentries have to be created and destroyed.

Note: event skip helper for clients introduced

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:00 -04:00
Stanislav Kinsbursky
37629b572c SUNRPC: set per-net PipeFS superblock before notification
There can be a case, when on MOUNT event RPC client (after it's dentries were
created) is not longer hold by anyone except notification callback.
I.e. on release this client will be destoroyed. And it's dentries have to be
destroyed as well. Which in turn requires per-net PipeFS superblock to be set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:00 -04:00
Stanislav Kinsbursky
7aab449e5a SUNRPC: skip clients with program without PipeFS entries
1) This is sane.
2) Otherwise there will be soft lockup:

do {
	rpc_get_client_for_event (clnt->cl_dentry == NULL ==> choose)
	__rpc_pipefs_event (clnt->cl_program->pipe_dir_name == NULL ==> return)
} while (1)

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:59 -04:00
Stanislav Kinsbursky
a4dff1bc49 SUNRPC: skip dead but not buried clients on PipeFS events
These clients can't be safely dereferenced if their counter in 0.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:59 -04:00
Stanislav Kinsbursky
adae0fe0ea SUNRPC: register PipeFS file system after pernet sybsystem
PipeFS superblock creation routine relays on SUNRPC pernet data presense, which
is created on register_pernet_subsys() call in SUNRPC module init function.
Registering of PipeFS filesystem prior to registering of per-net subsystem
leads to races (mount of PipeFS can dereference uninitialized data).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-18 11:05:48 -04:00
Linus Torvalds
71db34fc43 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:

Highlights:
 - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
   moving us closer to a complete 4.1 implementation.
 - Bernd Schubert fixed a long-standing problem with readdir cookies on
   ext2/3/4.
 - Jeff Layton performed a long-overdue overhaul of the server reboot
   recovery code which will allow us to deprecate the current code (a
   rather unusual user of the vfs), and give us some needed flexibility
   for further improvements.
 - Like the client, we now support numeric uid's and gid's in the
   auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.

Plus miscellaneous bugfixes and cleanup.

Thanks to everyone!

There are also some delegation fixes waiting on vfs review that I
suppose will have to wait for 3.5.  With that done I think we'll finally
turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
symbolic as it's been on by default in distro's for a while).

And the list of 4.1 todo's should be achievable for 3.5 as well:

   http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

though we may still want a bit more experience with it before turning it
on by default.

* 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
  nfsd4: use auth_unix unconditionally on backchannel
  nfsd: fix NULL pointer dereference in cld_pipe_downcall
  nfsd4: memory corruption in numeric_name_to_id()
  sunrpc: skip portmap calls on sessions backchannel
  nfsd4: allow numeric idmapping
  nfsd: don't allow legacy client tracker init for anything but init_net
  nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
  nfsd: add the infrastructure to handle the cld upcall
  nfsd: add a header describing upcall to nfsdcld
  nfsd: add a per-net-namespace struct for nfsd
  sunrpc: create nfsd dir in rpc_pipefs
  nfsd: add nfsd4_client_tracking_ops struct and a way to set it
  nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
  NFSD: Fix nfs4_verifier memory alignment
  NFSD: Fix warnings when NFSD_DEBUG is not defined
  nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
  nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
  ext4: return 32/64-bit dir name hash according to usage type
  fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
  ...
2012-03-29 14:53:25 -07:00
Linus Torvalds
58df9b387c NFS client bugfixes for Linux 3.4
Highlights include:
 - Fix infinite loops in the mount code
 - Fix a userspace buffer overflow in __nfs4_get_acl_uncached
 - Fix a memory leak due to a double reference count in rpcb_getport_async()
 
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPc8DPAAoJEGcL54qWCgDyxAkP/2YMxAnZ+8kzuhKi5J073y/K
 G9H8v6zV6WaMut9YeX+8sX/4xJrFPSZJoRALSdEbI9wpIGMMHjIxQD8RuY4imt1L
 +YwB3rAke+mr2YAvW/2Q6HVTV75/00Kui+Jkgsa2wm/4Fyz8PfCHe71bBLG4UPJg
 ZgauN9rCIHIQpT/sbuN5mPU5C4jbOZa959CogWyUKeMnbuts1Z3qbS+aFB0wcxAW
 279E88oqDISYLo5XQiC/NJzLKmKJ3iDl1UbjqS6T2g74i1zX+lyxl0K9MdTFmAmm
 9UQ1pjIFIMySKJI/LahY6ZXM76hGLW4TbwO0q+fmhuv4cFBM3LWZPUm3ggPyzAp1
 O8TEeMNIfkNVGRYVnC7GcPFjnUlt9ahiPzk0lQ7l0RzWfXiDt8y/EUOwhqdxULqx
 S5SwLmSKDw9bQXFiHJtqe7lnB9hrVWNUvHqE1iTC20YTtQYovYGWaT8YLP7/y/Iz
 4jSLv8VeOzA6BP1jfQkacsBucxpn0fhPjwuCOZl2ooa+8jldsr8in8nwo0nVgrJF
 PGzJvyvGSR2BT2BL+QSUPX0Pn8qe7eAFePlS8zFS/p3X8rkDNxYI51VwuyXOQQl3
 lJHUyTbdTd/G4H5ObEU+ClIhxIvqgDrawIdXrbiMonoShsaFoVkuX7IFviPiiPDE
 s7KGNtpLh45+GZsBVe6M
 =yiL7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes for Linux 3.4 from Trond Myklebust

Highlights include:
- Fix infinite loops in the mount code
- Fix a userspace buffer overflow in __nfs4_get_acl_uncached
- Fix a memory leak due to a double reference count in rpcb_getport_async()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

* tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
  NFSv4.1: Fix layoutcommit error handling
  NFSv4: Fix two infinite loops in the mount code
  SUNRPC: Use the already looked-up xprt in rpcb_getport_async()
  NFS4.1: remove duplicate variable declaration in filelayout_clear_request_commit
  Fix length of buffer copied in __nfs4_get_acl_uncached
2012-03-28 19:02:35 -07:00
Linus Torvalds
0195c00244 Disintegrate and delete asm/system.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
 u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
 ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
 rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
 eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
 /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
 FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
 NypQthI85pc=
 =G9mT
 -----END PGP SIGNATURE-----

Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
 "Here are a bunch of patches to disintegrate asm/system.h into a set of
  separate bits to relieve the problem of circular inclusion
  dependencies.

  I've built all the working defconfigs from all the arches that I can
  and made sure that they don't break.

  The reason for these patches is that I recently encountered a circular
  dependency problem that came about when I produced some patches to
  optimise get_order() by rewriting it to use ilog2().

  This uses bitops - and on the SH arch asm/bitops.h drags in
  asm-generic/get_order.h by a circuituous route involving asm/system.h.

  The main difficulty seems to be asm/system.h.  It holds a number of
  low level bits with no/few dependencies that are commonly used (eg.
  memory barriers) and a number of bits with more dependencies that
  aren't used in many places (eg.  switch_to()).

  These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

        Move memory barriers here.  This already done for MIPS and Alpha.

    (2) asm/switch_to.h

        Move switch_to() and related stuff here.

    (3) asm/exec.h

        Move arch_align_stack() here.  Other process execution related bits
        could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

        Move xchg() and cmpxchg() here as they're full word atomic ops and
        frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

        Move die() and related bits.

    (6) asm/auxvec.h

        Move AT_VECTOR_SIZE_ARCH here.

  Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that.  We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
  Delete all instances of asm/system.h
  Remove all #inclusions of asm/system.h
  Add #includes needed to permit the removal of asm/system.h
  Move all declarations of free_initmem() to linux/mm.h
  Disintegrate asm/system.h for OpenRISC
  Split arch_align_stack() out from asm-generic/system.h
  Split the switch_to() wrapper out of asm-generic/system.h
  Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
  Create asm-generic/barrier.h
  Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
  Disintegrate asm/system.h for Xtensa
  Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
  Disintegrate asm/system.h for Tile
  Disintegrate asm/system.h for Sparc
  Disintegrate asm/system.h for SH
  Disintegrate asm/system.h for Score
  Disintegrate asm/system.h for S390
  Disintegrate asm/system.h for PowerPC
  Disintegrate asm/system.h for PA-RISC
  Disintegrate asm/system.h for MN10300
  ...
2012-03-28 15:58:21 -07:00
David Howells
9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Bryan Schumaker
864cf9bf99 SUNRPC: Use the already looked-up xprt in rpcb_getport_async()
rbcb_getport_async() was looking up the rpc_xprt (reference++) and then
later looking it up again (reference++) to pass through the
rpcbind_args.  The xprt would only be dereferenced once, when we were
done with the rpcbind_args (reference--).  This leaves an extra
reference to the transport that would never go away.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-27 16:33:35 -04:00
J. Bruce Fields
92769108f5 sunrpc: skip portmap calls on sessions backchannel
There's obviously no point to doing portmap calls over the sessions
backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:48 -04:00
Jeff Layton
b3537c35c2 sunrpc: create nfsd dir in rpc_pipefs
Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid
upcall.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:47 -04:00
J. Bruce Fields
1df00640c9 Merge nfs containerization work from Trond's tree
The nfs containerization work is a prerequisite for Jeff Layton's reboot
recovery rework.
2012-03-26 11:48:54 -04:00
Linus Torvalds
f63d395d47 NFS client updates for Linux 3.4
New features include:
 - Add NFS client support for containers.
   This should enable most of the necessary functionality, including
   lockd support, and support for rpc.statd, NFSv4 idmapper and
   RPCSEC_GSS upcalls into the correct network namespace from
   which the mount system call was issued.
 - NFSv4 idmapper scalability improvements
   Base the idmapper cache on the keyring interface to allow concurrent
   access to idmapper entries. Start the process of migrating users from
   the single-threaded daemon-based approach to the multi-threaded
   request-key based approach.
 - NFSv4.1 implementation id.
   Allows the NFSv4.1 client and server to mutually identify each other
   for logging and debugging purposes.
 - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
   having to use the more counterintuitive 'vers=4,minorversion=1'.
 - SUNRPC tracepoints.
   Start the process of adding tracepoints in order to improve debugging
   of the RPC layer.
 - pNFS object layout support for autologin.
 
 Important bugfixes include:
 - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
   to wake up all tasks when applied to priority waitqueues.
 - Ensure that we handle read delegations correctly, when we try to
   truncate a file.
 - A number of fixes for NFSv4 state manager loops (mostly to do with
   delegation recovery).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
 xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
 dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
 oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
 xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
 5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
 7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
 yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
 3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
 WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
 odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
 VBm+EI00Pvmm5+fUjIlp
 =LuHE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates for Linux 3.4 from Trond Myklebust:
 "New features include:
   - Add NFS client support for containers.

     This should enable most of the necessary functionality, including
     lockd support, and support for rpc.statd, NFSv4 idmapper and
     RPCSEC_GSS upcalls into the correct network namespace from which
     the mount system call was issued.

   - NFSv4 idmapper scalability improvements

     Base the idmapper cache on the keyring interface to allow
     concurrent access to idmapper entries.  Start the process of
     migrating users from the single-threaded daemon-based approach to
     the multi-threaded request-key based approach.

   - NFSv4.1 implementation id.

     Allows the NFSv4.1 client and server to mutually identify each
     other for logging and debugging purposes.

   - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
     having to use the more counterintuitive 'vers=4,minorversion=1'.

   - SUNRPC tracepoints.

     Start the process of adding tracepoints in order to improve
     debugging of the RPC layer.

   - pNFS object layout support for autologin.

  Important bugfixes include:

   - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
     fail to wake up all tasks when applied to priority waitqueues.

   - Ensure that we handle read delegations correctly, when we try to
     truncate a file.

   - A number of fixes for NFSv4 state manager loops (mostly to do with
     delegation recovery)."

* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
  NFS: fix sb->s_id in nfs debug prints
  xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
  xprtrdma: The transport should not bug-check when a dup reply is received
  pnfs-obj: autologin: Add support for protocol autologin
  NFS: Remove nfs4_setup_sequence from generic rename code
  NFS: Remove nfs4_setup_sequence from generic unlink code
  NFS: Remove nfs4_setup_sequence from generic read code
  NFS: Remove nfs4_setup_sequence from generic write code
  NFS: Fix more NFS debug related build warnings
  SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
  nfs: non void functions must return a value
  SUNRPC: Kill compiler warning when RPC_DEBUG is unset
  SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
  NFS: Use cond_resched_lock() to reduce latencies in the commit scans
  NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
  NFS: ncommit count is being double decremented
  SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
  Try using machine credentials for RENEW calls
  NFSv4.1: Fix a few issues in filelayout_commit_pagelist
  NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
  ...
2012-03-23 08:53:47 -07:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Tom Tucker
9b78145c0f xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
The xprtrdma FRMR mapping logic assumes that a segment is <= PAGE_SIZE.
This is not true for NFS4.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:47 -04:00
Tom Tucker
4a6862b364 xprtrdma: The transport should not bug-check when a dup reply is received
The client side RDMA transport will bug check if it receives a duplicate
reply, instead we should simply drop the duplicate reply.

Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:47 -04:00
Trond Myklebust
ffa94db604 SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
Stephen Rothwell reports:
net/sunrpc/rpcb_clnt.c: In function 'rpcb_enc_mapping':
net/sunrpc/rpcb_clnt.c:820:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_getport':
net/sunrpc/rpcb_clnt.c:837:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_set':
net/sunrpc/rpcb_clnt.c:860:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_enc_getaddr':
net/sunrpc/rpcb_clnt.c:892:19: warning: unused variable 'task' [-Wunused-variable]
net/sunrpc/rpcb_clnt.c: In function 'rpcb_dec_getaddr':
net/sunrpc/rpcb_clnt.c:914:19: warning: unused variable 'task' [-Wunused-variable]
fs/lockd/svclock.c:49:20: warning: 'nlmdbg_cookie2a' declared 'static' but never defined [-Wunused-function]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:44 -04:00
Al Viro
48fde701af switch open-coded instances of d_make_root() to new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:35 -04:00
Trond Myklebust
e27d359e9b SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
This allows us to turn on/off the dprintk() debugging interfaces for
those distributions that don't ship the 'rpcdebug' utility.
It also allows us to add Kbuild dependencies. Specifically, we already
know that dprintk() in general relies on CONFIG_SYSCTL. Now it turns out
that the NFS dprintks depend on CONFIG_CRC32 after we added support
for the filehandle hash.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20 13:08:26 -04:00
Cong Wang
b854178601 sunrpc: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:28 +08:00
Trond Myklebust
540a0f7584 SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.

Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-19 14:15:02 -04:00
Trond Myklebust
0097143c12 SUNRPC: Don't use variable length automatic arrays in kernel code
Replace the variable length array in the RPCSEC_GSS crypto code with
a fixed length one. The size should be bounded by the variable
GSS_KRB5_MAX_BLOCKSIZE, so use that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-12 13:37:16 -04:00
Trond Myklebust
09acfea5d8 SUNRPC: Fix a few sparse warnings
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
 - svc_partial_recvfrom now takes a struct kvec, so the variable
   save_iovbase needs to be an ordinary (void *)

Make a bunch of variables in net/sunrpc/xprtsock.c static

Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.

Fix a couple of conflicting function declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 19:30:02 -04:00
Chuck Lever
2e738fdce2 SUNRPC: Add API to acquire source address
NFSv4.0 clients must send endpoint information for their callback
service to NFSv4.0 servers during their first contact with a server.
Traditionally on Linux, user space provides the callback endpoint IP
address via the "clientaddr=" mount option.

During an NFSv4 migration event, it is possible that an FSID may be
migrated to a destination server that is accessible via a different
source IP address than the source server was.  The client must update
callback endpoint information on the destination server so that it can
maintain leases and allow delegation.

Without a new "clientaddr=" option from user space, however, the
kernel itself must construct an appropriate IP address for the
callback update.  Provide an API in the RPC client for upper layer
RPC consumers to acquire a source address for a remote.

The mechanism used by the mount.nfs command is copied: set up a
connected UDP socket to the designated remote, then scrape the source
address off the socket.  We are careful to select the correct network
namespace when setting up the temporary UDP socket.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:53 -05:00
Trond Myklebust
4e0038b6b2 SUNRPC: Move clnt->cl_server into struct rpc_xprt
When the cl_xprt field is updated, the cl_server field will also have
to change.  Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:41 -05:00
Trond Myklebust
2446ab6070 SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field
A migration event will replace the rpc_xprt used by an rpc_clnt.  To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().

Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: fix lockdep splats and layering violations ]
[ cel: forward ported to 3.4 ]
[ cel: remove rpc_max_reqs(), add rpc_net_ns() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:38 -05:00
Stanislav Kinsbursky
591ad7feae SUNRPC: move waitq from RPC pipe to RPC inode
Currently, wait queue, used for polling of RPC pipe changes from user-space,
is a part of RPC pipe. But the pipe data itself can be released on NFS umount
prior to dentry-inode pair, connected to it (is case of this pair is open by
some process).
This is not a problem for almost all pipe users, because all PipeFS file
operations checks pipe reference prior to using it.
Except evenfd. This thing registers itself with "poll" file operation and thus
has a reference to pipe wait queue. This leads to oopses on destroying eventfd
after NFS umount (like rpc_idmapd do) since not pipe data left to the point
already.
The solution is to wait queue from pipe data to internal RPC inode data. This
looks more logical, because this wiat queue used only for user-space processes,
which already holds inode reference.

Note: upcalls have to get pipe->dentry prior to dereferecing wait queue to make
sure, that mount point won't disappear from underneath us.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-27 13:37:48 -05:00
Stanislav Kinsbursky
2c9030eef9 SUNRPC: check RPC inode's pipe reference before dereferencing
There are 2 tightly bound objects: pipe data (created for kernel needs, has
reference to dentry, which depends on PipeFS mount/umount) and PipeFS
dentry/inode pair (created on mount for user-space needs). They both
independently may have or have not a valid reference to each other.
This means, that we have to make sure, that pipe->dentry reference is valid on
upcalls, and dentry->pipe reference is valid on downcalls. The latter check is
absent - my fault.
IOW, PipeFS dentry can be opened by some process (rpc.idmapd for example), but
it's pipe data can belong to NFS mount, which was unmounted already and thus
pipe data was destroyed.
To fix this, pipe reference have to be set to NULL on rpc_unlink() and checked
on PipeFS file operations instead of pipe->dentry check.

Note: PipeFS "poll" file operation will be updated in next patch, because it's
logic is more complicated.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-27 13:37:09 -05:00
Stanislav Kinsbursky
da3b462296 SUNRPC: release per-net clients lock before calling PipeFS dentries creation
v3:
1) Lookup for client is performed from the beginning of the list on each PipeFS
event handling operation.

Lockdep is sad otherwise, because inode mutex is taken on PipeFS dentry
creation, which can be called on mount notification, where this per-net client
lock is taken on clients list walk.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-27 13:36:16 -05:00