Commit graph

42 commits

Author SHA1 Message Date
Linas Vepstas
4980d5eb75 [POWERPC] EEH: restructure multi-function support
Rework how multi-function PCI devices are identified and traversed.
This fixes a bug with multi-function recovery on Power4 that was
introduced by a recent Power4 EEH patch.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:57 +11:00
Linas Vepstas
fa1be476a2 [POWERPC] EEH: verify state change
After requesting a state change, verify that the state change
actually ocurred, and the system ends up in the expected state.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:56 +11:00
Linas Vepstas
d0ab95ca98 [POWERPC] EEH: rm un-needed data
The EEH event notification system passes around data that is
not needed or at least, not used properly. Stop passing this
data; get it in a more reliable fashion.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:55 +11:00
Linas Vepstas
9c547768e7 [POWERPC] EEH: wait for slot status
Modify routine that returns PCI slot status to wait for slot status
to become available. This is needed, as slots that are in some remote
card cage may go offline for extended periods of time. New users for
this routine in following patches.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:54 +11:00
Linas Vepstas
90375f5396 [POWERPC] EEH: handle reset state high
Some firmware versions will return a slot reset state of "1"
when a slot is EEH frozen. Recognize this as a state that can be
handled.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:54 +11:00
Linas Vepstas
147d6a3750 [POWERPC] EEH: support ibm,get-config-addr-info2 RTAS call
Provide support for the new ibm,get-config-addr-info2 RTAS token,
whenever it is actually available.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:52 +11:00
Linas Vepstas
2fd30be8da [POWERPC] EEH: Tolerate high mmio
Some drivers will attempt to perform a lot of mmio even after
an EEH event was detected. This is especially the case for fast cpu's
and PCI-E slots. Be a bit more lenient in allowing this.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:51 +11:00
Linas Vepstas
39d16e2959 [POWERPC] EEH: modify order of EEH state checking
Change the order in which pci error state is examined;
the "capabilites" is not valid if "reset state" is 5.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-03-22 22:52:49 +11:00
Arjan van de Ven
5dfe4c964a [PATCH] mark struct file_operations const 2
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:44 -08:00
Linas Vepstas
25c4a46f0e [POWERPC] pSeries: EEH improperly enabled for some Power4 systems
It appears that EEH is improperly enabled for some Power4 systems.
On these systems, the ibm,set-eeh-option returns a value of success
even when EEH is not supported on the given node. Thus, an explicit
check for support is required.

During boot, on power4, without this patch, one sees messages
similar to:

EEH: event on unsupported device, rc=0 dn=/pci@400000000110/IBM,sp@1
EEH: event on unsupported device, rc=0 dn=/pci@400000000110/pci@2
EEH: event on unsupported device, rc=0 dn=/pci@400000000110/pci@2,2
etc.

The patch makes these go away.

Without this patch, EEH recovery does seem to work correctly for
at least some devices (I tested ethernet e1000), but fails to
recover others (the Emulex LightPulse LPFC, most notably).
Off the top of my head, I don't remember why some devices are
affected, but not others.

The PAPR indicates that the correct way to test for EEH is as
done in this patch; its not clear to me if this was in the PAPR
all along, or recently added; if it was there all along, its not
clear to me why this hadn't been fixed long ago. I suspect only
certain firmware levels are affected.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-07 14:03:17 +11:00
Linas Vepstas
d0e70341c0 [POWERPC] EEH recovery tweaks
If one attempts to create a device driver recovery sequence that
does not depend on a hard reset of the device, but simply just
attempts to resume processing, then one discovers that the
recovery sequence implemented on powerpc is not quite right.
This patch fixes this up.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-08 17:10:18 +11:00
Linas Vepstas
022d51b1b2 [POWERPC] EEH failure to mark pci slot as frozen.
Bug fix: when marking a slot as frozen, we forgot to mark
pci device itself as frozen. (we did manage to mark the
pci children, but forget the parent itself.)

This is needed so that some device drivers can check the
pci status in critical sections (e.g. in spin loops with
interrupts disabled).

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-09-26 15:41:03 +10:00
Linas Vepstas
e102926385 [POWERPC] EEH: Power4 systems sometimes need multiple resets.
On detection of an EEH error, some Power4 systems seem to occasionally
want to be reset twice before they report themselves as fully recovered.
This patch re-arranges the code to attempt additional resets if the first
one doesn't take.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-09-22 15:19:58 +10:00
Linas Vepstas
47b5c838af [POWERPC] EEH: enable MMIO/DMA on frozen slot
Add wrapper around the rtas call to enable MMIO or DMA on a frozen pci
slot.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-09-21 22:59:14 +10:00
Linas Vepstas
cb5b562444 [POWERPC] EEH: code comment cleanup
Clean up subroutine documentation; mostly formatting changes, with
some new content.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-09-21 22:59:10 +10:00
Jeremy Kerr
954a46e2d5 [POWERPC] pseries: Constify & voidify get_property()
Now that get_property() returns a void *, there's no need to cast its
return value. Also, treat the return value as const, so we can
constify get_property later.

pseries platform changes.

Built for pseries_defconfig

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-07-31 15:55:04 +10:00
Linas Vepstas
b055a9e10f [PATCH] powerpc/pseries: bugfix: balance calls to pci_device_put
Repeated calls to eeh_remove_device() can result in multiple
(and thus unbalanced) calls to pci_dev_put(). Make sure the
pci_device_put() is called only once (since there was only
one call to the matching pci_device_get()).

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-13 09:34:15 -07:00
Nathan Fontenot
794e085e56 [PATCH] powerpc/pseries: EEH Cleanup
This patch removes unnecessary exports, marks functions as static when
possible, and simplifies some list-related code.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01 22:37:11 +11:00
Benjamin Herrenschmidt
e8222502ee [PATCH] powerpc: Kill _machine and hard-coded platform numbers
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism.  With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants.  This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 23:15:54 +11:00
John Rose
827c1a6c1a [PATCH] powerpc: fix dynamic PCI probe regression
Some hotplug driver functions were migrated to the kernel for use by EEH
in commit 2bf6a8fa21.

Previously, the PCI Hotplug module had been changed to use the new
OFDT-based PCI probe when appropriate:
5fa80fcdca

When rpaphp_pci_config_slot() was moved from the rpaphp driver to the
new kernel function pcibios_add_pci_devices(), the OFDT-based probe
stuff was dropped.  This patch restores it.

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-02-28 16:25:54 +11:00
Olof Johansson
ea183a957a [PATCH] powerpc: remove warning in EEH code
Remove warning in eeh code about mixed variables and code.

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-12 20:09:29 +11:00
linas
0f17574a65 [PATCH] powerpc/pseries: dlpar-add crash on null pointer deref
This fixes a crash on null-pointer deref during dlpar slot addition.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 1c87c0f84943fbbc91826967ff4fea1b059a526f commit)
2006-01-10 15:32:54 +11:00
Linas Vepstas
257ffc64a6 [PATCH] powerpc: get rid of per_cpu EEH counters
242-eeh-no-percpu-counters.patch

Remove per-cpu counters from the EEH code.  These statistics counters
are incremented at a very low frequency, and the performance gains of
per-cpu variables are negligable.  By contrast, the counters weren't
safe against cpu off/online operations, and its not worth the effort
to make them so (other than to turn them into plain globals).

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from be3b5d1be053ccb41e91fa5a6f43ef5db301357d commit)
2006-01-10 15:30:48 +11:00
Linas Vepstas
7684b40cb5 [PATCH] powerpc: Save device BARs much earlier in the boot sequence
241-eeh-save-bars-earlier.patch

Save the PCI device bars *before* any PCI probing is done.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 76c902b919098860f3d4e125f847abcc4cb1782a commit)
2006-01-10 15:30:39 +11:00
Linas Vepstas
3914ac7b0e [PATCH] powerpc: handle multifunction PCI devices properly
239-eeh-multifunction-consolidate.patch

New-style firmware will often place multiple different functions
under a non-EEH-aware parent.  However, these devices might share
a common PE "partition endpoint" and config address, ad thus any
EEH events will affect all of the devices in common.  This patch
makes the effort to find all of these common devices and handle
them together.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)
2006-01-10 15:30:23 +11:00
Linas Vepstas
b6495c0c8f [PATCH] powerpc: Don't continue with PCI Error recovery if slot reset failed.
238-eeh-stop-if-reset_failed.patch

If the firmware is unable to reset the PCI slot for some reason, then
don't attempt any further recovery steps after that point.  Instead,
mark the device as permanently failed.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
2006-01-10 15:30:14 +11:00
Linas Vepstas
21e464dd7c [PATCH] powerpc: set up the RTAS token just like the rest of them.
237-eeh-bridge-token.patch

Minor: the rtas-bridge token should be set up the same way that all
the other rtas tokens are set up.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 78379b6c5fc17b6666c40b05988e6708e98479c0 commit)
2006-01-10 15:30:05 +11:00
Linas Vepstas
fcb7543e3d [PATCH] powerpc: Use PE configuration address consistently
236-eeh-config-addr.patch

The PE configuration address wasn't being cnsistently used in all locations
where a config address is called for.  This patch adds it to the places it
should have appeared in.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from c2bc904a28095aca0b04a37854b63b78622a032e commit)
2006-01-10 15:29:55 +11:00
Linas Vepstas
9fb40eb883 [PATCH] powerpc: Remove duplicate code
234-eeh-find-pe.patch

The find_device_pe() routine is duplicated in two files. Remove one of
the two copies, declare the other extern.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
2006-01-10 15:29:33 +11:00
Linas Vepstas
f751f84164 [PATCH] powerpc: remove bogus printk
233-eeh-buid-fix.patch

Remove un-desired warning print from EEH code.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 241239e6aff69788a177d97c5d06fe9995c74cca commit)
2006-01-10 15:29:25 +11:00
Linas Vepstas
25e591f6dd [PATCH] powerpc: Add "partitionable endpoint" support
26-eeh-partition-endpoint.patch

New versions of firmware introduce a new method by which the
"partitionable endpoint" (the point at which the pci bus is cut)
should be located.  This code adds the support for this (mandatory)
new feature.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 9fcfb5d35b5294659f9299aa9cae6fd16325c07e commit)
2006-01-10 15:29:14 +11:00
Linas Vepstas
5d5a0936b3 [PATCH] powerpc: Split out PCI address cache to its own file
25-pci-address-cache.patch

The core EEH file is rather large. This patch splits out a self-contained
chunk of it into its own file.  This is the chunk that performes the
caching and lookup of pci devices based on the i/o addresses of thier
resoures.  This code is almos architecture-independent and could be
used by any system that wanted to find a pci device based only on
the i/o address used by the device.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from b0b291d59906d4a9a89ed9e34d9fd684c7188924 commit)
2006-01-10 15:29:04 +11:00
Linas Vepstas
77bd741561 [PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines
Various PCI bus errors can be signaled by newer PCI controllers.  The
core error recovery routines are architecture dependent.  This patch adds
a recovery infrastructure for the  PPC64 pSeries systems.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
2006-01-10 15:28:32 +11:00
Linas Vepstas
e2a296eeaa [PATCH] powerpc: PCI hotplug common code elimination
20-rpaphp-eeh-cleanup.patch

This patch move some code from the rpaphp directory, to the powerpc
directory, where it should have been all along (Among other things, I
need it in the powerpc directory for the PCI error recovery.)

Please note that patch affects TWO maintainers: Paul, after applying
the powerpc part, please ask that GregKH appli the PCI part. It is safe
to have the powerpc part go in first. It would be bad to have the
PCI part go in first.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:51:05 +11:00
David Woodhouse
1e28a7ddd3 [PATCH] Avoid use of uninitialised spinlock in EEH.
If the kernel supports both G5 and pSeries, and CONFIG_EEH is enabled,
eeh_init() is (quite reasonably) never called when we boot on a G5. Yet
eeh_check_failure() still gets called. We should avoid doing that if
!eeh_subsystem_enabled.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-17 16:53:38 +11:00
Linas Vepstas
d9564ad114 [PATCH] ppc64: mark failed devices
17-eeh-slot-marking-bug.patch

A device that experiences a PCI outage may be just one deivce out
of many that was affected. In order to avoid repeated reports of
a failure, the entire tree of affected devices should be marked
as failed. This patch marks up the entire tree.

Signed-off-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 16:00:32 +11:00
Paul Mackerras
0a5cab42a1 powerpc: Fix compile error in EEH code with gcc4
Gcc 4 doesn't like being told to inline a recursive function...

Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 14:23:54 +11:00
Paul Mackerras
799d6046d3 [PATCH] powerpc: merge code values for identifying platforms
This patch merges platform codes.  systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32.  Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 13:37:51 +11:00
Linas Vepstas
8b553f32db [PATCH] ppc64: Save & restore of PCI device BARS
14-eeh-device-bar-save.patch

After a PCI device has been resest, the device BAR's and other config
space info must be restored to the same state as they were in when
the firmware first handed us this device.  This will allow the
PCI device driver, when restarted, to correctly recognize and set up
the device.

Tis patch saves the device config space as early as reasonable after
the firmware has handed over the device.  Te state resore funcion
is inteded for use by the EEH recovery routines.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 11:38:14 +11:00
Linas Vepstas
6dee3fb940 [PATCH] ppc64: PCI reset support routines
13-eeh-recovery-support-routines.patch

EEH Recovery support routines

This patch adds routines required to help drive the recovery of
EEH-frozen slots.  The main function is to drive the PCI #RST
signal line high for a qurter of a second, and then allow for
a second & a half of settle time.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 11:38:11 +11:00
Linas Vepstas
172ca92618 [PATCH] ppc64: PCI error event dispatcher
12-eeh-event-dispatcher.patch

ppc64: EEH Recovery dispatcher thread

This patch adds a mechanism to create recovery threads when an
EEH event is received.  Since an EEH freeze state may be detected
within an interrupt context, we need to get out of the interrupt
context before starting recovery. This dispatcher does this in
two steps: first, it uses a workqueue to get out, and then
lanuches a kernel thread, so that the recovery routine can
sleep for exteded periods without upseting the keventd.

A kernel thread is created with each EEH event, rather than
having one long-running daemon started at boot time.  This is
because it is anticipated that EEH events will be very rare
(very very rare, ideally) and so its pointless to cluter the
process tables with a daemon that will almost never run.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 11:38:05 +11:00
Linas Vepstas
7f79da7acc [PATCH] ppc64: move eeh.c to powerpc directory from ppc64
11-eeh-move-to-powerpc.patch

Move arch/ppc64/kernel/eeh.c to arch//powerpc/platforms/pseries/eeh.c
No other changes (except for Makefile to build it)

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 11:37:59 +11:00
Renamed from arch/ppc64/kernel/eeh.c (Browse further)