net: rest NAPI bit if IPI failed

During hotplug if an RPS CPU goes offline,
then there is a possibility that the IPI
delivery to the RPS core might fail, this
happens in the cases when unruly drivers
use netif_rx API in the wrong context.

This happens due to two reasons

a) Firstly using netif_rx API in non preemptive
context leads to enough latencies that the IPI
delivery might fail to an RPS core. This is because
the softIRQ trigger will become unpredictable.

b) by using netif_rx it  becomes an architectural
issue where we are trying to do two things in two
different contexts. We set the NAPI bit in context
and sent the IPI in other context. Now since the
context switch is allowed, the remote CPU is allowed
to go finish its hotplug.

If there was no context switch in the first place,
which typically happens by either using the correct
version of netif_rx or switching to NAPI framework,
then the remote CPU is not allowed to go to CPU_DOWN
state. This is by design since hotplug framework causes
 the remote dying CPU to wait until atleast one context
switch happens on all other CPUS. If preemption is
disabled then the dying CPU has to wait until preemption
is enabled and a context switch happens.

This patch catches these unruly drivers and handles
IPI misses by clearing NAPI sate on remote RPS CPUs.

Please refere here for more documentation on hotplug
and preemption cases https://lwn.net/Articles/569686/

Change-Id: I072f91bdb4d7e444e3624e8e010ef1b66a67b1ed
Acked-by: Abhishek Chauhan <abchauha@qti.qualcomm.com>
Signed-off-by: Ravinder Konka <rkonka@codeaurora.org>
This commit is contained in:
Ravinder Konka 2015-10-27 20:31:25 +05:30 committed by Gerrit - the friendly Code Review server
parent e83913d1f9
commit 418752d4f3
1 changed files with 9 additions and 1 deletions

View File

@ -4007,9 +4007,16 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
while (remsd) {
struct softnet_data *next = remsd->rps_ipi_next;
if (cpu_online(remsd->cpu))
if (cpu_online(remsd->cpu)) {
__smp_call_function_single(remsd->cpu,
&remsd->csd, 0);
} else {
pr_err("%s(), cpu was offline and IPI was not "
"delivered so clean up NAPI", __func__);
rps_lock(remsd);
remsd->backlog.state = 0;
rps_unlock(remsd);
}
remsd = next;
}
} else
@ -6016,6 +6023,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
if (action != CPU_DEAD && action != CPU_DEAD_FROZEN)
return NOTIFY_OK;
local_irq_disable();
cpu = smp_processor_id();
sd = &per_cpu(softnet_data, cpu);