Commit graph

425519 commits

Author SHA1 Message Date
Mel Gorman
ddff702505 mm: vmscan: stall page reclaim after a list of pages have been processed
Commit "mm: vmscan: Block kswapd if it is encountering pages under
writeback" blocks page reclaim if it encounters pages under writeback
marked for immediate reclaim.  It blocks while pages are still isolated
from the LRU which is unnecessary.  This patch defers the blocking until
after the isolated pages have been processed and tidies up some of the
comments.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: b1a6f21e3b2315d46ae8af88a8f4eb8ea2763107
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ia6da0949d7bf81cd7c8d3951a7f9c723131b9037
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
8a1c1c901a mm: vmscan: stall page reclaim and writeback pages based on dirty/writepage pages encountered
Further testing of the "Reduce system disruption due to kswapd"
discovered a few problems.  First and foremost, it's possible for pages
under writeback to be freed which will lead to badness.  Second, as
pages were not being swapped the file LRU was being scanned faster and
clean file pages were being reclaimed.  In some cases this results in
increased read IO to re-read data from disk.  Third, more pages were
being written from kswapd context which can adversly affect IO
performance.  Lastly, it was observed that PageDirty pages are not
necessarily dirty on all filesystems (buffers can be clean while
PageDirty is set and ->writepage generates no IO) and not all
filesystems set PageWriteback when the page is being written (e.g.
ext3).  This disconnect confuses the reclaim stalling logic.  This
follow-up series is aimed at these problems.

The tests were based on three kernels

vanilla:	kernel 3.9 as that is what the current mmotm uses as a baseline
mmotm-20130522	is mmotm as of 22nd May with "Reduce system disruption due to
		kswapd" applied on top as per what should be in Andrew's tree
		right now
lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel

The first test used memcached+memcachetest while some background IO was
in progress as implemented by the parallel IO tests implement in MM
Tests.  memcachetest benchmarks how many operations/second memcached can
service.  It starts with no background IO on a freshly created ext4
filesystem and then re-runs the test with larger amounts of IO in the
background to roughly simulate a large copy in progress.  The
expectation is that the IO should have little or no impact on
memcachetest which is running entirely in memory.

parallelio
                                             3.9.0                       3.9.0                       3.9.0
                                           vanilla          mm1-mmotm-20130522       mm1-lessdisrupt-v7r10
Ops memcachetest-0M             23117.00 (  0.00%)          22780.00 ( -1.46%)          22763.00 ( -1.53%)
Ops memcachetest-715M           23774.00 (  0.00%)          23299.00 ( -2.00%)          22934.00 ( -3.53%)
Ops memcachetest-2385M           4208.00 (  0.00%)          24154.00 (474.00%)          23765.00 (464.76%)
Ops memcachetest-4055M           4104.00 (  0.00%)          25130.00 (512.33%)          24614.00 (499.76%)
Ops io-duration-0M                  0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops io-duration-715M               12.00 (  0.00%)              7.00 ( 41.67%)              6.00 ( 50.00%)
Ops io-duration-2385M             116.00 (  0.00%)             21.00 ( 81.90%)             21.00 ( 81.90%)
Ops io-duration-4055M             160.00 (  0.00%)             36.00 ( 77.50%)             35.00 ( 78.12%)
Ops swaptotal-0M                    0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swaptotal-715M             140138.00 (  0.00%)             18.00 ( 99.99%)             18.00 ( 99.99%)
Ops swaptotal-2385M            385682.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swaptotal-4055M            418029.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-0M                       0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-715M                   144.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-2385M               134227.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-4055M               125618.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops minorfaults-0M            1536429.00 (  0.00%)        1531632.00 (  0.31%)        1533541.00 (  0.19%)
Ops minorfaults-715M          1786996.00 (  0.00%)        1612148.00 (  9.78%)        1608832.00 (  9.97%)
Ops minorfaults-2385M         1757952.00 (  0.00%)        1614874.00 (  8.14%)        1613541.00 (  8.21%)
Ops minorfaults-4055M         1774460.00 (  0.00%)        1633400.00 (  7.95%)        1630881.00 (  8.09%)
Ops majorfaults-0M                  1.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops majorfaults-715M              184.00 (  0.00%)            167.00 (  9.24%)            166.00 (  9.78%)
Ops majorfaults-2385M           24444.00 (  0.00%)            155.00 ( 99.37%)             93.00 ( 99.62%)
Ops majorfaults-4055M           21357.00 (  0.00%)            147.00 ( 99.31%)            134.00 ( 99.37%)

memcachetest is the transactions/second reported by memcachetest. In
        the vanilla kernel note that performance drops from around
        23K/sec to just over 4K/second when there is 2385M of IO going
        on in the background. With current mmotm, there is no collapse
	in performance and with this follow-up series there is little
	change.

swaptotal is the total amount of swap traffic. With mmotm and the follow-up
	series, the total amount of swapping is much reduced.

                                 3.9.0       3.9.0       3.9.0
                               vanillamm1-mmotm-20130522mm1-lessdisrupt-v7r10
Minor Faults                  11160152    10706748    10622316
Major Faults                     46305         755         678
Swap Ins                        260249           0           0
Swap Outs                       683860          18          18
Direct pages scanned                 0         678        2520
Kswapd pages scanned           6046108     8814900     1639279
Kswapd pages reclaimed         1081954     1172267     1094635
Direct pages reclaimed               0         566        2304
Kswapd efficiency                  17%         13%         66%
Kswapd velocity               5217.560    7618.953    1414.879
Direct efficiency                 100%         83%         91%
Direct velocity                  0.000       0.586       2.175
Percentage direct scans             0%          0%          0%
Zone normal velocity          5105.086    6824.681     671.158
Zone dma32 velocity            112.473     794.858     745.896
Zone dma velocity                0.000       0.000       0.000
Page writes by reclaim     1929612.000 6861768.000   32821.000
Page writes file               1245752     6861750       32803
Page writes anon                683860          18          18
Page reclaim immediate            7484          40         239
Sector Reads                   1130320       93996       86900
Sector Writes                 13508052    10823500    11804436
Page rescued immediate               0           0           0
Slabs scanned                    33536       27136       18560
Direct inode steals                  0           0           0
Kswapd inode steals               8641        1035           0
Kswapd skipped wait                  0           0           0
THP fault alloc                      8          37          33
THP collapse alloc                 508         552         515
THP splits                          24           1           1
THP fault fallback                   0           0           0
THP collapse fail                    0           0           0

There are a number of observations to make here

1. Swap outs are almost eliminated. Swap ins are 0 indicating that the
   pages swapped were really unused anonymous pages. Related to that,
   major faults are much reduced.

2. kswapd efficiency was impacted by the initial series but with these
   follow-up patches, the efficiency is now at 66% indicating that far
   fewer pages were skipped during scanning due to dirty or writeback
   pages.

3. kswapd velocity is reduced indicating that fewer pages are being scanned
   with the follow-up series as kswapd now stalls when the tail of the
   LRU queue is full of unqueued dirty pages. The stall gives flushers a
   chance to catch-up so kswapd can reclaim clean pages when it wakes

4. In light of Zlatko's recent reports about zone scanning imbalances,
   mmtests now reports scanning velocity on a per-zone basis. With mainline,
   you can see that the scanning activity is dominated by the Normal
   zone with over 45 times more scanning in Normal than the DMA32 zone.
   With the series currently in mmotm, the ratio is slightly better but it
   is still the case that the bulk of scanning is in the highest zone. With
   this follow-up series, the ratio of scanning between the Normal and
   DMA32 zone is roughly equal.

5. As Dave Chinner observed, the current patches in mmotm increased the
   number of pages written from kswapd context which is expected to adversly
   impact IO performance. With the follow-up patches, far fewer pages are
   written from kswapd context than the mainline kernel

6. With the series in mmotm, fewer inodes were reclaimed by kswapd. With
   the follow-up series, there is less slab shrinking activity and no inodes
   were reclaimed.

7. Note that "Sectors Read" is drastically reduced implying that the source
   data being used for the IO is not being aggressively discarded due to
   page reclaim skipping over dirty pages and reclaiming clean pages. Note
   that the reducion in reads could also be due to inode data not being
   re-read from disk after a slab shrink.

                       3.9.0       3.9.0       3.9.0
                     vanillamm1-mmotm-20130522mm1-lessdisrupt-v7r10
Mean sda-avgqz        166.99       32.09       33.44
Mean sda-await        853.64      192.76      185.43
Mean sda-r_await        6.31        9.24        5.97
Mean sda-w_await     2992.81      202.65      192.43
Max  sda-avgqz       1409.91      718.75      698.98
Max  sda-await       6665.74     3538.00     3124.23
Max  sda-r_await       58.96      111.95       58.00
Max  sda-w_await    28458.94     3977.29     3148.61

In light of the changes in writes from reclaim context, the number of
reads and Dave Chinner's concerns about IO performance I took a closer
look at the IO stats for the test disk. Few observations

1. The average queue size is reduced by the initial series and roughly
   the same with this follow up.

2. Average wait times for writes are reduced and as the IO
   is completing faster it at least implies that the gain is because
   flushers are writing the files efficiently instead of page reclaim
   getting in the way.

3. The reduction in maximum write latency is staggering. 28 seconds down
   to 3 seconds.

Jan Kara asked how NFS is affected by all of this. Unstable pages can
be taken into account as one of the patches in the series shows but it
is still the case that filesystems with unusual handling of dirty or
writeback could still be treated better.

Tests like postmark, fsmark and largedd showed up nothing useful. On my test
setup, pages are simply not being written back from reclaim context with or
without the patches and there are no changes in performance. My test setup
probably is just not strong enough network-wise to be really interesting.

I ran a longer-lived memcached test with IO going to NFS instead of a local disk

parallelio
                                             3.9.0                       3.9.0                       3.9.0
                                           vanilla          mm1-mmotm-20130522       mm1-lessdisrupt-v7r10
Ops memcachetest-0M             23323.00 (  0.00%)          23241.00 ( -0.35%)          23321.00 ( -0.01%)
Ops memcachetest-715M           25526.00 (  0.00%)          24763.00 ( -2.99%)          23242.00 ( -8.95%)
Ops memcachetest-2385M           8814.00 (  0.00%)          26924.00 (205.47%)          23521.00 (166.86%)
Ops memcachetest-4055M           5835.00 (  0.00%)          26827.00 (359.76%)          25560.00 (338.05%)
Ops io-duration-0M                  0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops io-duration-715M               65.00 (  0.00%)             71.00 ( -9.23%)             11.00 ( 83.08%)
Ops io-duration-2385M             129.00 (  0.00%)             94.00 ( 27.13%)             53.00 ( 58.91%)
Ops io-duration-4055M             301.00 (  0.00%)            100.00 ( 66.78%)            108.00 ( 64.12%)
Ops swaptotal-0M                    0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swaptotal-715M              14394.00 (  0.00%)            949.00 ( 93.41%)             63.00 ( 99.56%)
Ops swaptotal-2385M            401483.00 (  0.00%)          24437.00 ( 93.91%)          30118.00 ( 92.50%)
Ops swaptotal-4055M            554123.00 (  0.00%)          35688.00 ( 93.56%)          63082.00 ( 88.62%)
Ops swapin-0M                       0.00 (  0.00%)              0.00 (  0.00%)              0.00 (  0.00%)
Ops swapin-715M                  4522.00 (  0.00%)            560.00 ( 87.62%)             63.00 ( 98.61%)
Ops swapin-2385M               169861.00 (  0.00%)           5026.00 ( 97.04%)          13917.00 ( 91.81%)
Ops swapin-4055M               192374.00 (  0.00%)          10056.00 ( 94.77%)          25729.00 ( 86.63%)
Ops minorfaults-0M            1445969.00 (  0.00%)        1520878.00 ( -5.18%)        1454024.00 ( -0.56%)
Ops minorfaults-715M          1557288.00 (  0.00%)        1528482.00 (  1.85%)        1535776.00 (  1.38%)
Ops minorfaults-2385M         1692896.00 (  0.00%)        1570523.00 (  7.23%)        1559622.00 (  7.87%)
Ops minorfaults-4055M         1654985.00 (  0.00%)        1581456.00 (  4.44%)        1596713.00 (  3.52%)
Ops majorfaults-0M                  0.00 (  0.00%)              1.00 (-99.00%)              0.00 (  0.00%)
Ops majorfaults-715M              763.00 (  0.00%)            265.00 ( 65.27%)             75.00 ( 90.17%)
Ops majorfaults-2385M           23861.00 (  0.00%)            894.00 ( 96.25%)           2189.00 ( 90.83%)
Ops majorfaults-4055M           27210.00 (  0.00%)           1569.00 ( 94.23%)           4088.00 ( 84.98%)

1. Performance does not collapse due to IO which is good. IO is also completing
   faster. Note with mmotm, IO completes in a third of the time and faster again
   with this series applied

2. Swapping is reduced, although not eliminated. The figures for the follow-up
   look bad but it does vary a bit as the stalling is not perfect for nfs
   or filesystems like ext3 with unusual handling of dirty and writeback
   pages

3. There are swapins, particularly with larger amounts of IO indicating
   that active pages are being reclaimed. However, the number of much
   reduced.

                                 3.9.0       3.9.0       3.9.0
                               vanillamm1-mmotm-20130522mm1-lessdisrupt-v7r10
Minor Faults                  36339175    35025445    35219699
Major Faults                    310964       27108       51887
Swap Ins                       2176399      173069      333316
Swap Outs                      3344050      357228      504824
Direct pages scanned              8972       77283       43242
Kswapd pages scanned          20899983     8939566    14772851
Kswapd pages reclaimed         6193156     5172605     5231026
Direct pages reclaimed            8450       73802       39514
Kswapd efficiency                  29%         57%         35%
Kswapd velocity               3929.743    1847.499    3058.840
Direct efficiency                  94%         95%         91%
Direct velocity                  1.687      15.972       8.954
Percentage direct scans             0%          0%          0%
Zone normal velocity          3721.907     939.103    2185.142
Zone dma32 velocity            209.522     924.368     882.651
Zone dma velocity                0.000       0.000       0.000
Page writes by reclaim     4082185.000  526319.000  537114.000
Page writes file                738135      169091       32290
Page writes anon               3344050      357228      504824
Page reclaim immediate            9524         170     5595843
Sector Reads                   8909900      861192     1483680
Sector Writes                 13428980     1488744     2076800
Page rescued immediate               0           0           0
Slabs scanned                    38016       31744       28672
Direct inode steals                  0           0           0
Kswapd inode steals                424           0           0
Kswapd skipped wait                  0           0           0
THP fault alloc                     14          15         119
THP collapse alloc                1767        1569        1618
THP splits                          30          29          25
THP fault fallback                   0           0           0
THP collapse fail                    8           5           0
Compaction stalls                   17          41         100
Compaction success                   7          31          95
Compaction failures                 10          10           5
Page migrate success              7083       22157       62217
Page migrate failure                 0           0           0
Compaction pages isolated        14847       48758      135830
Compaction migrate scanned       18328       48398      138929
Compaction free scanned        2000255      355827     1720269
Compaction cost                      7          24          68

I guess the main takeaway again is the much reduced page writes
from reclaim context and reduced reads.

                       3.9.0       3.9.0       3.9.0
                     vanillamm1-mmotm-20130522mm1-lessdisrupt-v7r10
Mean sda-avgqz         23.58        0.35        0.44
Mean sda-await        133.47       15.72       15.46
Mean sda-r_await        4.72        4.69        3.95
Mean sda-w_await      507.69       28.40       33.68
Max  sda-avgqz        680.60       12.25       23.14
Max  sda-await       3958.89      221.83      286.22
Max  sda-r_await       63.86       61.23       67.29
Max  sda-w_await    11710.38      883.57     1767.28

And as before, write wait times are much reduced.

This patch:

The patch "mm: vmscan: Have kswapd writeback pages based on dirty pages
encountered, not priority" decides whether to writeback pages from reclaim
context based on the number of dirty pages encountered.  This situation is
flagged too easily and flushers are not given the chance to catch up
resulting in more pages being written from reclaim context and potentially
impacting IO performance.  The check for PageWriteback is also misplaced
as it happens within a PageDirty check which is nonsense as the dirty may
have been cleared for IO.  The accounting is updated very late and pages
that are already under writeback, were reactivated, could not unmapped or
could not be released are all missed.  Similarly, a page is considered
congested for reasons other than being congested and pages that cannot be
written out in the correct context are skipped.  Finally, it considers
stalling and writing back filesystem pages due to encountering dirty
anonymous pages at the tail of the LRU which is dumb.

This patch causes kswapd to begin writing filesystem pages from reclaim
context only if page reclaim found that all filesystem pages at the tail
of the LRU were unqueued dirty pages.  Before it starts writing filesystem
pages, it will stall to give flushers a chance to catch up.  The decision
on whether wait_iff_congested is also now determined by dirty filesystem
pages only.  Congested pages are based on whether the underlying BDI is
congested regardless of the context of the reclaiming process.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: e2be15f6c3eecedfbe1550cca8d72c5057abbbd2
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I2c8aee00da5e3e9562984e792d16f9e11bd4a435
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
6fe90c0c0c mm: vmscan: move logic from balance_pgdat() to kswapd_shrink_zone()
balance_pgdat() is very long and some of the logic can and should be
internal to kswapd_shrink_zone().  Move it so the flow of
balance_pgdat() is marginally easier to follow.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 7c954f6de6b630de30f265a079aad359f159ebe9
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I6c4e76e6e132c5982c228863c99195d7ad7768bc
[vinmenon@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
bc6bd99ed4 mm: vmscan: check if kswapd should writepage once per pgdat scan
Currently kswapd checks if it should start writepage as it shrinks each
zone without taking into consideration if the zone is balanced or not.
This is not wrong as such but it does not make much sense either.  This
patch checks once per pgdat scan if kswapd should be writing pages.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: b7ea3c417b6c2e74ca1cb051568f60377908928d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[vinmenon@codeaurora.org: resolve trivial merge conflicts]
Change-Id: I7cb0fb685f8346f07d0fc4810f6c593334cd1590
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
488cb10a86 mm: vmscan: block kswapd if it is encountering pages under writeback
Historically, kswapd used to congestion_wait() at higher priorities if
it was not making forward progress.  This made no sense as the failure
to make progress could be completely independent of IO.  It was later
replaced by wait_iff_congested() and removed entirely by commit 258401a6
(mm: don't wait on congested zones in balance_pgdat()) as it was
duplicating logic in shrink_inactive_list().

This is problematic.  If kswapd encounters many pages under writeback
and it continues to scan until it reaches the high watermark then it
will quickly skip over the pages under writeback and reclaim clean young
pages or push applications out to swap.

The use of wait_iff_congested() is not suited to kswapd as it will only
stall if the underlying BDI is really congested or a direct reclaimer
was unable to write to the underlying BDI.  kswapd bypasses the BDI
congestion as it sets PF_SWAPWRITE but even if this was taken into
account then it would cause direct reclaimers to stall on writeback
which is not desirable.

This patch sets a ZONE_WRITEBACK flag if direct reclaim or kswapd is
encountering too many pages under writeback.  If this flag is set and
kswapd encounters a PageReclaim page under writeback then it'll assume
that the LRU lists are being recycled too quickly before IO can complete
and block waiting for some IO to complete.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 283aba9f9e0e4882bf09bd37a2983379a6fae805
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ib34f1959c0e5265242152f98cc52c62ab7015993
[vinmenon@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
ba3a73862f mm: vmscan: have kswapd writeback pages based on dirty pages encountered, not priority
Currently kswapd queues dirty pages for writeback if scanning at an
elevated priority but the priority kswapd scans at is not related to the
number of unqueued dirty encountered.  Since commit "mm: vmscan: Flatten
kswapd priority loop", the priority is related to the size of the LRU
and the zone watermark which is no indication as to whether kswapd
should write pages or not.

This patch tracks if an excessive number of unqueued dirty pages are
being encountered at the end of the LRU.  If so, it indicates that dirty
pages are being recycled before flusher threads can clean them and flags
the zone so that kswapd will start writing pages until the zone is
balanced.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: d43006d503ac921c7df4f94d13c17db6f13c9d26
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I565caf3aef9f3e5f59cda1adc70207412719a2ed
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
64fe14e549 mm: vmscan: do not allow kswapd to scan at maximum priority
Page reclaim at priority 0 will scan the entire LRU as priority 0 is
considered to be a near OOM condition.  Kswapd can reach priority 0
quite easily if it is encountering a large number of pages it cannot
reclaim such as pages under writeback.  When this happens, kswapd
reclaims very aggressively even though there may be no real risk of
allocation failure or OOM.

This patch prevents kswapd reaching priority 0 and trying to reclaim the
world.  Direct reclaimers will still reach priority 0 in the event of an
OOM situation.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 9aa41348a8d11427feec350b21dcdd4330fd20c4
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I6bd5891e9f2b670b3c495cfad26d69af92e6d856
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:47 +05:30
Mel Gorman
89e36de3c4 mm: vmscan: decide whether to compact the pgdat based on reclaim progress
In the past, kswapd makes a decision on whether to compact memory after
the pgdat was considered balanced.  This more or less worked but it is
late to make such a decision and does not fit well now that kswapd makes
a decision whether to exit the zone scanning loop depending on reclaim
progress.

This patch will compact a pgdat if at least the requested number of
pages were reclaimed from unbalanced zones for a given priority.  If any
zone is currently balanced, kswapd will not call compaction as it is
expected the necessary pages are already available.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 2ab44f434586b8ccb11f781b4c2730492e6628f5
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: Ie490e6df9576de1de1bc0c3c1b634618394dcf8e
[vinmenon@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:46 +05:30
Mel Gorman
54715ebd2e mm: vmscan: flatten kswapd priority loop
kswapd stops raising the scanning priority when at least
SWAP_CLUSTER_MAX pages have been reclaimed or the pgdat is considered
balanced.  It then rechecks if it needs to restart at DEF_PRIORITY and
whether high-order reclaim needs to be reset.  This is not wrong per-se
but it is confusing to follow and forcing kswapd to stay at DEF_PRIORITY
may require several restarts before it has scanned enough pages to meet
the high watermark even at 100% efficiency.  This patch irons out the
logic a bit by controlling when priority is raised and removing the
"goto loop_again".

This patch has kswapd raise the scanning priority until it is scanning
enough pages that it could meet the high watermark in one shrink of the
LRU lists if it is able to reclaim at 100% efficiency.  It will not
raise the scanning prioirty higher unless it is failing to reclaim any
pages.

To avoid infinite looping for high-order allocation requests kswapd will
not reclaim for high-order allocations when it has reclaimed at least
twice the number of pages as the allocation request.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: b8e83b942a16eb73e63406592d3178207a4f07a1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I93ee675006800f2805408f2865150182bfd4b22b
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:46 +05:30
Mel Gorman
ece8cc6ba1 mm: vmscan: obey proportional scanning requirements for kswapd
Simplistically, the anon and file LRU lists are scanned proportionally
depending on the value of vm.swappiness although there are other factors
taken into account by get_scan_count().  The patch "mm: vmscan: Limit
the number of pages kswapd reclaims" limits the number of pages kswapd
reclaims but it breaks this proportional scanning and may evenly shrink
anon/file LRUs regardless of vm.swappiness.

This patch preserves the proportional scanning and reclaim.  It does
mean that kswapd will reclaim more than requested but the number of
pages will be related to the high watermark.

[mhocko@suse.cz: Correct proportional reclaim for memcg and simplify]
[kamezawa.hiroyu@jp.fujitsu.com: Recalculate scan based on target]
[hannes@cmpxchg.org: Account for already scanned pages properly]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: e82e0561dae9f3ae5a21fc2d3d3ccbe69d90be46
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I9dc9b73c0d73c27cda72181b4eb3f625e491f114
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2014-12-09 13:25:46 +05:30
Linux Build Service Account
6c644c0f75 Merge "scsi: ufs: avoid exception event handler racing with PM callbacks" 2014-12-08 16:42:06 -08:00
Linux Build Service Account
2ddda43da9 Merge "ARM: dts: msm: msm8992: Add device tree for Silabs FM on 8992" 2014-12-08 16:42:04 -08:00
Linux Build Service Account
ef97c0fc08 Merge "ARM: dts: msm: msm8992: Add BT node, GPIO regulator" 2014-12-08 16:42:03 -08:00
Linux Build Service Account
e824fa42a4 Merge "wil6210: Fix potential memory leaks on error paths" 2014-12-08 16:42:01 -08:00
Linux Build Service Account
6f259226b6 Merge "msm-core: Set up thresholds only until the hotplug temperature point" 2014-12-08 16:41:58 -08:00
Linux Build Service Account
3e1dab5909 Merge "msm: kgsl: Elevate the priority of the thread during adreno start" 2014-12-08 16:41:57 -08:00
Linux Build Service Account
4f042c4413 Merge "ARM: dts: msm: Account fudge factor for honest BW voting" 2014-12-08 16:41:55 -08:00
Linux Build Service Account
cff9aa5e25 Merge "mm: vmscan: limit the number of pages kswapd reclaims at each priority" 2014-12-08 16:41:53 -08:00
Linux Build Service Account
247909f4aa Merge "ARM: dts: msm: update LPM parameter values for msm8909 platform" 2014-12-08 13:08:34 -08:00
Linux Build Service Account
38dd9d4431 Merge "Ultrasound: Add new proximity format" 2014-12-08 13:08:28 -08:00
Linux Build Service Account
245e868589 Merge "ARM: dts: msm: Enable BMS controlled charging for msm8909 skuc" 2014-12-08 13:08:23 -08:00
Linux Build Service Account
14d7a764d9 Merge "power: bcl: Update to fix BCL frequency mitigation" 2014-12-08 13:08:21 -08:00
Linux Build Service Account
45f6110af2 Merge "USB: EHCI: Add a quirk to skip the suspend during test mode selection" 2014-12-08 13:08:20 -08:00
Linux Build Service Account
6c8812ccc8 Merge "power: qpnp-linear-charger: Add debugfs for devicetree parameters" 2014-12-08 13:08:19 -08:00
Linux Build Service Account
157f6fcddb Merge "ARM: dts: msm: Move model and compatible property to dts file" 2014-12-08 13:08:17 -08:00
Linux Build Service Account
03de82e29f Merge "ARM: dts: msm: audio support for MSM8909 with pm8916" 2014-12-08 13:08:11 -08:00
Linux Build Service Account
92b69a1701 Merge "ARM: dts: msm: Add support for qHD panel for MSM8909" 2014-12-08 13:08:08 -08:00
Linux Build Service Account
a1a6fb2b52 Merge "ARM: dts: msm: Add support for MSM8909 1GB on RCM" 2014-12-08 13:08:07 -08:00
Linux Build Service Account
1ec2c754aa Merge "msm: lmh_lite: Add memory barrier before reading data from trustzone" 2014-12-08 13:08:05 -08:00
Linux Build Service Account
26463157f4 Merge "ARM: dts: msm: add coresight components for msmtellurium" 2014-12-08 13:08:04 -08:00
Linux Build Service Account
933d757856 Merge "ARM: dts: msm: specify the quotient offset fuse parameters for msm8992" 2014-12-08 13:07:58 -08:00
Linux Build Service Account
56d3092045 Merge "regulator: cpr-regulator: Add support for fuse based quotient offset" 2014-12-08 13:07:57 -08:00
Linux Build Service Account
e9afd9db31 Merge "ARM: dts: msm: Modify vdd_mx supply handle for MSM8909" 2014-12-08 13:07:56 -08:00
Bhakthavatsala Raghavendra
9e902fc57c ARM: dts: msm: msm8992: Add device tree for Silabs FM on 8992
Add device tree support to enumerate drivers
for I2C based Silabs FM chip on 8992.

Change-Id: If3f8325eb34a84db422d8dcbcc6be51ce9835b02
Signed-off-by: Bhakthavatsala Raghavendra <braghave@codeaurora.org>
2014-12-08 10:09:17 -08:00
Bhakthavatsala Raghavendra
0cb6aafcfe ARM: dts: msm: msm8992: Add BT node, GPIO regulator
Add BT node, PMIC GPIO 18 and 19 in the device tree.

BT chip on MSM8992 board is powered via an external
regulator controlled by PM8994 GPIO 9. This external
regulator outputs a constant of 3.3 V.

Change-Id: Ibc2edad48d3c37c0e8596aafd080b36a5694f917
Signed-off-by: Bhakthavatsala Raghavendra <braghave@codeaurora.org>
2014-12-08 10:05:06 -08:00
Jordan Crouse
3068ca73f0 msm: kgsl: Elevate the priority of the thread during adreno start
It turns out that having lots of spurious threads causes race
conditions. Who knew? Instead of spawning a new high prority
thread just bump the priority of the current thread up
while adreno_start is running.

Change-Id: Ic0dedbadf65da11c800578c8769ae844f010e925
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2014-12-08 08:47:30 -07:00
Ritesh Harjani
92a97b3052 ARM: dts: msm: Account fudge factor for honest BW voting
There has been fudge factor which is added at system level,
thus all the clients(sdhc in the patch) remove this
factor(1.53) from their BW voting values.

CRs-fixed: 759444
Change-Id: I9207b870ffbf766bfd4802208c9e8bb8e3eb9a82
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
2014-12-08 19:13:37 +05:30
Anil Kumar Mamidala
3681de5ae7 ARM: dts: msm: update LPM parameter values for msm8909 platform
PM driver takes LPM parameters as one of the input while deciding
to enter low power modes. Current LPM parameter values are not
optimal and this may lead to more power consumption for application
processor usecases and there might be stability issues as well.
This change is to tune the values of the low power mode parameter
values such as "latency", "time overhead", "steady state power" and
"energy overhead" for msm8909 platform after measuring on the target.

Change-Id: Ia471a9b7de2f5e61600c99fa8c25ae3656e080f5
Signed-off-by: Anil Kumar Mamidala <amami@codeaurora.org>
2014-12-08 03:45:51 -08:00
Maria Yu
17ee0f5341 ARM: dts: msm: Move model and compatible property to dts file
Model and compatible property is unique for each board.
So move model and compatible property to dts file.

Change-Id: I0a0fc1c186f365e5105705c5c0b8f6729727eaf6
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
2014-12-08 18:30:35 +08:00
Laxminath Kasam
eeac5edb38 ARM: dts: msm: audio support for MSM8909 with pm8916
Add support from audio for MSM8909 coupled
with PMIC8916. Update both CDP and MTP dtsi files.

Change-Id: I86c53aa79ffe11bfb4366121bdff2c6286e3e0d7
Signed-off-by: Laxminath Kasam <lkasam@codeaurora.org>
2014-12-08 15:44:41 +05:30
Shiju Mathew
2184e4fd8f power: bcl: Update to fix BCL frequency mitigation
This patch will fix an issue where the frequency
mitigation was not removed when BCL mitigation
conditions are lifted.

Change-Id: Ibddd492bef8e4382cb5e91cf2d1a871f6cfb237d
Signed-off-by: Shiju Mathew <shijum@codeaurora.org>
2014-12-07 22:30:51 -08:00
Jie Cheng
fccc03eef5 power: qpnp-linear-charger: Add debugfs for devicetree parameters
Add the debugfs support for listing the configurable devicetree parameters.
Also print these parameters with pr_debug during probe.

Change-Id: I88641bcda09b17aa76f81d74af57325693dddd5d
Signed-off-by: Jie Cheng <rockiec@codeaurora.org>
2014-12-08 11:32:19 +08:00
Xiaogang Cui
17bd53c4e1 ARM: dts: msm: add coresight components for msmtellurium
Add CoreSight components including trace sinks, replicator, funnels
and STM to the device tree required for STM tracing.

Change-Id: I706b366aeeed06ee1804d97e1e61dcbd1ed353cb
Signed-off-by: Xiaogang Cui <xiaogang@codeaurora.org>
2014-12-08 11:07:41 +08:00
Chunmei Cai
4223f7a919 ARM: dts: msm: Enable BMS controlled charging for msm8909 skuc
Add the param to enable BMS to control the recharge for msm8909
skuc.

Change-Id: Ie092459b83cee1d0ebc418fd17308c335316d1a0
Signed-off-by: Chunmei Cai <ccai@codeaurora.org>
2014-12-08 10:54:45 +08:00
Linux Build Service Account
b9e32abd7d Merge "soc: qcom: glink: remove userspace buffer argument" 2014-12-07 09:17:53 -08:00
Linux Build Service Account
6cddcb524e Merge "ARM: dts: msm: enable haptics on MSM8992 CDP and MTP" 2014-12-07 09:17:52 -08:00
Linux Build Service Account
ed9a8f4dd2 Merge "arm64: dma-mapping: swap arguments to __get_dma_pgprot" 2014-12-07 09:17:42 -08:00
Linux Build Service Account
5a74deef22 Merge "usb: bam: Allow USB to attempt LPM after disconnecting IPA" 2014-12-07 09:17:41 -08:00
Lior Barenboim
7668527652 Ultrasound: Add new proximity format
Add a new stream format for ultrasound proximity, which matches
the stream format in the aDSP.

Change-Id: Idd488b90ee082fcc58ff9d1dddf5087cba9af034
Signed-off-by: Lior Barenboim <liorb@codeaurora.org>
2014-12-07 17:19:44 +02:00
Linux Build Service Account
8ccb0778a7 Merge "msm: fsm9900: Enable uio access to multiple memory region" 2014-12-07 06:17:07 -08:00