Commit graph

441007 commits

Author SHA1 Message Date
Sergey Senozhatsky
9d7ed54d83 UPSTREAM: zsmalloc: fix a null pointer dereference in destroy_handle_cache()
(cherry-pick from commit 02f7b4145da113683ad64c74bf64605e16b71789)

If zs_create_pool()->create_handle_cache()->kmem_cache_create() or
pool->name allocation fails, zs_create_pool()->destroy_handle_cache()
will dereference the NULL pool->handle_cachep.

Modify destroy_handle_cache() to avoid this.

Bug: 25951511

Change-Id: Ie4ea82fe34ac02e6e2548e6ed47257366d7b92f5
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:56 +05:30
Weijie Yang
56d5ec9b3f UPSTREAM: zram: clear disk io accounting when reset zram device
(cherry-pick from commit d7ad41a1c498729b7584c257710b1b437a0c1470)

Clear zram disk io accounting when resetting the zram device.  Otherwise
the residual io accounting stat will affect the diskstat in the next
zram active cycle.

Bug: 25951511

Change-Id: I0afc7ba11fb65fad352e3e195944d8887b348bef
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:56 +05:30
Julia Lawall
64bc8fa4c1 UPSTREAM: zram: fix error return code
(cherry-pick from commit 201c7b72f0bf38d7f31fd229a01de035d0f10cd1)

Return a negative error code on failure.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Bug: 25951511

Change-Id: Ida80be368c03e51058e0f23ec5cfef43009464df
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:56 +05:30
Sergey Senozhatsky
f609c3c2bc UPSTREAM: zsmalloc: remove extra cond_resched() in __zs_compact
(cherry-pick from commit 160a117f0864871ae1bab26554a985a1d2861afd)

Do not perform cond_resched() before the busy compaction loop in
__zs_compact(), because this loop does it when needed.

Bug: 25951511

Change-Id: I3b20b46f3a4fb44a2bf6ccb17264acf30deb7111
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:56 +05:30
Heesub Shin
9594f03013 UPSTREAM: zsmalloc: fix fatal corruption due to wrong size class selection
(cherry-pick from commit 81da9b13f73653bf5f38c63af8029fc459198ac0)

There is no point in overriding the size class below.  It causes fatal
corruption on the next chunk on the 3264-bytes size class, which is the
last size class that is not huge.

For example, if the requested size was exactly 3264 bytes, current
zsmalloc allocates and returns a chunk from the size class of 3264 bytes,
not 4096.  User access to this chunk may overwrite head of the next
adjacent chunk.

Here is the panic log captured when freelist was corrupted due to this:

    Kernel BUG at ffffffc00030659c [verbose debug info unavailable]
    Internal error: Oops - BUG: 96000006 [#1] PREEMPT SMP
    Modules linked in:
    exynos-snapshot: core register saved(CPU:5)
    CPUMERRSR: 0000000000000000, L2MERRSR: 0000000000000000
    exynos-snapshot: context saved(CPU:5)
    exynos-snapshot: item - log_kevents is disabled
    CPU: 5 PID: 898 Comm: kswapd0 Not tainted 3.10.61-4497415-eng #1
    task: ffffffc0b8783d80 ti: ffffffc0b71e8000 task.ti: ffffffc0b71e8000
    PC is at obj_idx_to_offset+0x0/0x1c
    LR is at obj_malloc+0x44/0xe8
    pc : [<ffffffc00030659c>] lr : [<ffffffc000306604>] pstate: a0000045
    sp : ffffffc0b71eb790
    x29: ffffffc0b71eb790 x28: ffffffc00204c000
    x27: 000000000001d96f x26: 0000000000000000
    x25: ffffffc098cc3500 x24: ffffffc0a13f2810
    x23: ffffffc098cc3501 x22: ffffffc0a13f2800
    x21: 000011e1a02006e3 x20: ffffffc0a13f2800
    x19: ffffffbc02a7e000 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000feb
    x15: 0000000000000000 x14: 00000000a01003e3
    x13: 0000000000000020 x12: fffffffffffffff0
    x11: ffffffc08b264000 x10: 00000000e3a01004
    x9 : ffffffc08b263fea x8 : ffffffc0b1e611c0
    x7 : ffffffc000307d24 x6 : 0000000000000000
    x5 : 0000000000000038 x4 : 000000000000011e
    x3 : ffffffbc00003e90 x2 : 0000000000000cc0
    x1 : 00000000d0100371 x0 : ffffffbc00003e90

Bug: 25951511

Change-Id: I0c82f61aa779ddf906212ab6e47e16c088fe683c
Reported-by: Sooyong Suk <s.suk@samsung.com>
Signed-off-by: Heesub Shin <heesub.shin@samsung.com>
Tested-by: Sooyong Suk <s.suk@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:55 +05:30
Minchan Kim
3610a81ab2 UPSTREAM: zsmalloc: remove unnecessary insertion/removal of zspage in compaction
(cherry-pick from commit 839373e645d12613308d9148041c4bd967bce8d5)

In putback_zspage, we don't need to insert a zspage into list of zspage
in size_class again to just fix fullness group. We could do directly
without reinsertion so we could save some instuctions.

Bug: 25951511

Change-Id: I07ad8bac6d2f5dc90ac0d492626e067a02699979
Reported-by: Heesub Shin <heesub.shin@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Juneho Choi <juno.choi@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:55 +05:30
Sergey Senozhatsky
f3dbbaaa41 UPSTREAM: zsmalloc: micro-optimize zs_object_copy()
(cherry-pick from commit 495819ead5ad02174208994ca610852a7791a2f2)

A micro-optimization.  Avoid additional branching and reduce (a bit)
registry pressure (f.e.  s_off += size; d_off += size; may be calculated
twise: first for >= PAGE_SIZE check and later for offset update in "else"
clause).

scripts/bloat-o-meter shows some improvement

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
function                          old     new   delta
zs_object_copy                    550     540     -10

Bug: 25951511

Change-Id: Ie3255d79246493fc755e6256f12082e692c0fc3c
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:55 +05:30
Sergey Senozhatsky
4c117f8df8 UPSTREAM: zsmalloc: remove synchronize_rcu from zs_compact()
(cherry-pick from commit 1ec7cfb13acb8047ae5baafb43d2cd6b64ac85b9)

Do not synchronize rcu in zs_compact(). Neither zsmalloc not
zram use rcu.

Bug: 25951511

Change-Id: I2f2d1a81dac561ddfabb861bedcbb1ba773f207f
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:55 +05:30
Sergey Senozhatsky
44be037856 UPSTREAM: zram: deprecate zram attrs sysfs nodes
(cherry-pick from commit 8f7d282c717acaae25245c61b6b60e8995ec4ef4)

Add Documentation/ABI/obsolete/sysfs-block-zram file and list obsolete and
deprecated attributes there.  The patch also adds additional information
to zram documentation and describes the basic strategy:

- the existing RW nodes will be downgraded to WO nodes (in 4.11)
- deprecated RO sysfs nodes will eventually be removed (in 4.11)

Users will be additionally notified about deprecated attr usage by
pr_warn_once() (added to every deprecated attr _show()), as suggested by
Minchan Kim.

User space is advised to use zram<id>/stat, zram<id>/io_stat and
zram<id>/mm_stat files.

Bug: 25951511

Change-Id: I62476ead805c1485e5653d62f5e6495345fb4a59
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:54 +05:30
Sergey Senozhatsky
b38dfdda62 BACKPORT: zram: export new 'mm_stat' sysfs attrs
(cherry-pick from commit 4f2109f60881585dc04fa0b5657a60556576625c)

Per-device `zram<id>/mm_stat' file provides mm statistics of a particular
zram device in a format similar to block layer statistics.  The file
consists of a single line and represents the following stats (separated by
whitespace):

        orig_data_size
        compr_data_size
        mem_used_total
        mem_limit
        mem_used_max
        zero_pages
        num_migrated

Bug: 25951511

Change-Id: I47537dd3aa5749b5db212f62c6adff5f0ff2c858
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:54 +05:30
Sergey Senozhatsky
162e32c927 BACKPORT: zram: export new 'io_stat' sysfs attrs
(cherry-pick from commit 2f6a3bed7347ee94fe57b3501fddaa646a26d890)

Per-device `zram<id>/io_stat' file provides accumulated I/O statistics of
particular zram device in a format similar to block layer statistics.  The
file consists of a single line and represents the following stats
(separated by whitespace):

        failed_reads
        failed_writes
        invalid_io
        notify_free

Bug: 25951511

Change-Id: I0041225192b8935983790dacc26b78bc18c2531c
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:54 +05:30
Sergey Senozhatsky
3ddcf3d74c UPSTREAM: zram: describe device attrs in documentation
(cherry-pick from commit 77ba015f9d5c584226a634753e9b318cb272cd41)

Briefly describe exported device stat attrs in zram documentation.  We
will eventually get rid of per-stat sysfs nodes and, thus, clean up
Documentation/ABI/testing/sysfs-block-zram file, which is the only source
of information about device sysfs nodes.

Add `num_migrated' description, since there is no independent
`num_migrated' sysfs node (and no corresponding sysfs-block-zram entry),
it will be exported via zram<id>/mm_stat file.

At this point we can provide minimal description, because sysfs-block-zram
still contains detailed information.

Bug: 25951511

Change-Id: I1c15f99f6b011a80468a48f02975dd8da753d4e9
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:54 +05:30
Sergey Senozhatsky
6f8275af64 UPSTREAM: zram: remove `num_migrated' device attr
(cherry-pick from commit 10447b60bee52f026bdbc5fe2aca52d0492fc91d)

This patch introduces rework to zram stats.  We have per-stat sysfs nodes,
and it makes things a bit hard to use in user space: it doesn't give an
immediate stats 'snapshot', it requires user space to use more syscalls -
open, read, close for every stat file, with appropriate error checks on
every step, etc.

First, zram now accounts block layer statistics, available in
/sys/block/zram<id>/stat and /proc/diskstats files.  So some new stats are
available (see Documentation/block/stat.txt), besides, zram's activities
now can be monitored by sysstat's iostat or similar tools.

Example:
cat /sys/block/zram0/stat
248     0    1984    0   251029     0  2008232   5120   0   5116   5116

Second, group currently exported on per-stat basis nodes into two
categories (files):

-- zram<id>/io_stat
accumulates device's IO stats, that are not accounted by block layer,
and contains:
        failed_reads
        failed_writes
        invalid_io
        notify_free

Example:
cat /sys/block/zram0/io_stat
0        0        0   652572

-- zram<id>/mm_stat
accumulates zram mm stats and contains:
        orig_data_size
        compr_data_size
        mem_used_total
        mem_limit
        mem_used_max
        zero_pages
        num_migrated

Example:
cat /sys/block/zram0/mm_stat
434634752 270288572 279158784        0 579895296    15060        0

per-stat sysfs nodes are now considered to be deprecated and we plan to
remove them (and clean up some of the existing stat code) in two years (as
of now, there is no warning printed to syslog about deprecated stats being
used).  User space is advised to use the above mentioned 3 files.

This patch (of 7):

Remove sysfs `num_migrated' attribute.  We are moving away from per-stat
device attrs towards 3 stat files that will accumulate io and mm stats in
a format similar to block layer statistics in /sys/block/<dev>/stat.  That
will be easier to use in user space, and reduce the number of syscalls
needed to read zram device statistics.

`num_migrated' will return back in zram<id>/mm_stat file.

Bug: 25951511

Change-Id: I94da1a236a61d0890e4a88bc7e7b0c1f33fedf9e
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:53 +05:30
Yinghao Xie
f3ff102b86 UPSTREAM: mm/zsmalloc.c: fix comment for get_pages_per_zspage
(cherry-pick from commit 888fa374e625f3ee8f3cc2efebf52939d3bb6da1)

Bug: 25951511

Change-Id: I94fc95aff6dcab3999c3dcb275ca65b8cd4b3565
Signed-off-by: Yinghao Xie <yinghao.xie@sumsung.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:53 +05:30
Minchan Kim
588deb99ca UPSTREAM: zsmalloc: zsmalloc documentation
(cherry-pick from commit d02be50dba649b4246e0c1c4b7cb5d8a8d49de9a)

Create zsmalloc doc which explains design concept and stat information.

Bug: 25951511

Change-Id: Ie1e5ac914186558feabaf8fbea29154395267dda
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:53 +05:30
Minchan Kim
e2d2259128 UPSTREAM: zsmalloc: add fullness into stat
(cherry-pick from commit 248ca1b053c82fa22427d22b33ac51a24c88a86d)

During investigating compaction, fullness information of each class is
helpful for investigating how the compaction works well.  With that, we
could know how compaction works well more clear on each size class.

Bug: 25951511

Change-Id: Idc07b265d005b680abb55b7dc61341a3de43a62c
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:53 +05:30
Minchan Kim
2a0ff415c7 UPSTREAM: zsmalloc: record handle in page->private for huge object
(cherry-pick from commit 7b60a68529b0d827d26ea3426c2addd071bff789)

We store handle on header of each allocated object so it increases the
size of each object by sizeof(unsigned long).

If zram stores 4096 bytes to zsmalloc(ie, bad compression), zsmalloc needs
4104B-class to add handle.

However, 4104B-class has 1-pages_per_zspage so wasted size by internal
fragment is 8192 - 4104, which is terrible.

So this patch records the handle in page->private on such huge object(ie,
pages_per_zspage == 1 && maxobj_per_zspage == 1) instead of header of each
object so we could use 4096B-class, not 4104B-class.

Bug: 25951511

Change-Id: I392eed4a0e0db5a940bc8a97ef56c26a7397b0f9
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:52 +05:30
Minchan Kim
d14f8d29f7 BACKPORT: zram: support compaction
(cherry-pick from commit 4e3ba87845420e0bfa21e6c4f7f81897aed38f8c)

Now that zsmalloc supports compaction, zram can use it.  For the first
step, this patch exports compact knob via sysfs so user can do compaction
via "echo 1 > /sys/block/zram0/compact".

Bug: 25951511

Change-Id: Id197e61879e41accb159f7b8ba037629eb4aa579
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:52 +05:30
Minchan Kim
bcddd9e649 UPSTREAM: zsmalloc: adjust ZS_ALMOST_FULL
(cherry-pick from commit d3d07c92ff69f784bb8c3279fa87678bfa2f7f6f)

Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage has
under 1/4 used objects(ie, fullness_threshold_frac).  It could make result
in loose packing since zsmalloc migrates only ZS_ALMOST_EMPTY zspage out.

This patch changes the rule so that zsmalloc makes zspage which has above
3/4 used object ZS_ALMOST_FULL so it could make tight packing.

Bug: 25951511

Change-Id: I9283cd6e8ce9916ea7213b724946664e2a6f32cb
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:52 +05:30
Minchan Kim
50767af2e9 UPSTREAM: zsmalloc: support compaction
(cherry-pick from commit 312fcae227037619dc858c9ccd362c7b847730a2)

This patch provides core functions for migration of zsmalloc.  Migraion
policy is simple as follows.

for each size class {
        while {
                src_page = get zs_page from ZS_ALMOST_EMPTY
                if (!src_page)
                        break;
                dst_page = get zs_page from ZS_ALMOST_FULL
                if (!dst_page)
                        dst_page = get zs_page from ZS_ALMOST_EMPTY
                if (!dst_page)
                        break;
                migrate(from src_page, to dst_page);
        }
}

For migration, we need to identify which objects in zspage are allocated
to migrate them out.  We could know it by iterating of freed objects in a
zspage because first_page of zspage keeps free objects singly-linked list
but it's not efficient.  Instead, this patch adds a tag(ie,
OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check
whether the object is allocated easily.

This patch adds another status bit in handle to synchronize between user
access through zs_map_object and migration.  During migration, we cannot
move objects user are using due to data coherency between old object and
new object.

Bug: 25951511

Change-Id: Ideb5295570cc1f6c4fcb18a8f8609c63a38c86e4
[akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:52 +05:30
Minchan Kim
fdc42f6133 UPSTREAM: zsmalloc: factor out obj_[malloc|free]
(cherry-pick from commit c78062612fb525430b775a0bef4d3cc07e512da0)

In later patch, migration needs some part of functions in zs_malloc and
zs_free so this patch factor out them.

Bug: 25951511

Change-Id: I6079cbc1d3d107bc39f9dbb3412d9eb9039875ad
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:51 +05:30
Minchan Kim
f22049e8eb UPSTREAM: zsmalloc: decouple handle and object
(cherry-pick from commit 2e40e163a25af3bd35d128d3e2e005916de5cce6)

Recently, we started to use zram heavily and some of issues
popped.

1) external fragmentation

I got a report from Juneho Choi that fork failed although there are plenty
of free pages in the system.  His investigation revealed zram is one of
the culprit to make heavy fragmentation so there was no more contiguous
16K page for pgd to fork in the ARM.

2) non-movable pages

Other problem of zram now is that inherently, user want to use zram as
swap in small memory system so they use zRAM with CMA to use memory
efficiently.  However, unfortunately, it doesn't work well because zRAM
cannot use CMA's movable pages unless it doesn't support compaction.  I
got several reports about that OOM happened with zram although there are
lots of swap space and free space in CMA area.

3) internal fragmentation

zRAM has started support memory limitation feature to limit memory usage
and I sent a patchset(https://lkml.org/lkml/2014/9/21/148) for VM to be
harmonized with zram-swap to stop anonymous page reclaim if zram consumed
memory up to the limit although there are free space on the swap.  One
problem for that direction is zram has no way to know any hole in memory
space zsmalloc allocated by internal fragmentation so zram would regard
swap is full although there are free space in zsmalloc.  For solving the
issue, zram want to trigger compaction of zsmalloc before it decides full
or not.

This patchset is first step to support above issues.  For that, it adds
indirect layer between handle and object location and supports manual
compaction to solve 3th problem first of all.

After this patchset got merged, next step is to make VM aware of zsmalloc
compaction so that generic compaction will move zsmalloced-pages
automatically in runtime.

In my imaginary experiment(ie, high compress ratio data with heavy swap
in/out on 8G zram-swap), data is as follows,

Before =
zram allocated object :      60212066 bytes
zram total used:     140103680 bytes
ratio:         42.98 percent
MemFree:          840192 kB

Compaction

After =
frag ratio after compaction
zram allocated object :      60212066 bytes
zram total used:      76185600 bytes
ratio:         79.03 percent
MemFree:          901932 kB

Juneho reported below in his real platform with small aging.
So, I think the benefit would be bigger in real aging system
for a long time.

- frag_ratio increased 3% (ie, higher is better)
- memfree increased about 6MB
- In buddy info, Normal 2^3: 4, 2^2: 1: 2^1 increased, Highmem: 2^1 21 increased

frag ratio after swap fragment
used :        156677 kbytes
total:        166092 kbytes
frag_ratio :  94
meminfo before compaction
MemFree:           83724 kB
Node 0, zone   Normal  13642   1364     57     10     61     17      9      5      4      0      0
Node 0, zone  HighMem    425     29      1      0      0      0      0      0      0      0      0

num_migrated :  23630
compaction done

frag ratio after compaction
used :        156673 kbytes
total:        160564 kbytes
frag_ratio :  97
meminfo after compaction
MemFree:           89060 kB
Node 0, zone   Normal  14076   1544     67     14     61     17      9      5      4      0      0
Node 0, zone  HighMem    863     50      1      0      0      0      0      0      0      0      0

This patchset adds more logics(about 480 lines) in zsmalloc but when I
tested heavy swapin/out program, the regression for swapin/out speed is
marginal because most of overheads were caused by compress/decompress and
other MM reclaim stuff.

This patch (of 7):

Currently, handle of zsmalloc encodes object's location directly so it
makes support of migration hard.

This patch decouples handle and object via adding indirect layer.  For
that, it allocates handle dynamically and returns it to user.  The handle
is the address allocated by slab allocation so it's unique and we could
keep object's location in the memory space allocated for handle.

With it, we can change object's position without changing handle itself.

Bug: 25951511

Change-Id: Id50a98341f63c4e1bb39589ca992661486469dca
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Gunho Lee <gunho.lee@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:51 +05:30
Joonsoo Kim
4230a7b840 UPSTREAM: zram: use proper type to update max_used_pages
(cherry-pick from commit 2ea55a2caee016daee511431217861fb04767b27)

max_used_pages is defined as atomic_long_t so we need to use unsigned
long to keep temporary value for it rather than int which is smaller
than unsigned long in a 64 bit system.

Bug: 25951511

Change-Id: Iacae8b3edd074247d8416db9e47b5f3b4d80fb9f
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:51 +05:30
Ganesh Mahendran
9b71df804c BACKPORT: mm/zsmalloc: add statistics support
(cherry-pick from commit 0f050d997e275cf0e47ddc7006284eaa3c6fe049)

Keeping fragmentation of zsmalloc in a low level is our target.  But now
we still need to add the debug code in zsmalloc to get the quantitative
data.

This patch adds a new configuration CONFIG_ZSMALLOC_STAT to enable the
statistics collection for developers.  Currently only the objects
statatitics in each class are collected.  User can get the information via
debugfs.

     cat /sys/kernel/debug/zsmalloc/zram0/...

For example:

After I copied "jdk-8u25-linux-x64.tar.gz" to zram with ext4 filesystem:
 class  size obj_allocated   obj_used pages_used
     0    32             0          0          0
     1    48           256         12          3
     2    64            64         14          1
     3    80            51          7          1
     4    96           128          5          3
     5   112            73          5          2
     6   128            32          4          1
     7   144             0          0          0
     8   160             0          0          0
     9   176             0          0          0
    10   192             0          0          0
    11   208             0          0          0
    12   224             0          0          0
    13   240             0          0          0
    14   256            16          1          1
    15   272            15          9          1
    16   288             0          0          0
    17   304             0          0          0
    18   320             0          0          0
    19   336             0          0          0
    20   352             0          0          0
    21   368             0          0          0
    22   384             0          0          0
    23   400             0          0          0
    24   416             0          0          0
    25   432             0          0          0
    26   448             0          0          0
    27   464             0          0          0
    28   480             0          0          0
    29   496            33          1          4
    30   512             0          0          0
    31   528             0          0          0
    32   544             0          0          0
    33   560             0          0          0
    34   576             0          0          0
    35   592             0          0          0
    36   608             0          0          0
    37   624             0          0          0
    38   640             0          0          0
    40   672             0          0          0
    42   704             0          0          0
    43   720            17          1          3
    44   736             0          0          0
    46   768             0          0          0
    49   816             0          0          0
    51   848             0          0          0
    52   864            14          1          3
    54   896             0          0          0
    57   944            13          1          3
    58   960             0          0          0
    62  1024             4          1          1
    66  1088            15          2          4
    67  1104             0          0          0
    71  1168             0          0          0
    74  1216             0          0          0
    76  1248             0          0          0
    83  1360             3          1          1
    91  1488            11          1          4
    94  1536             0          0          0
   100  1632             5          1          2
   107  1744             0          0          0
   111  1808             9          1          4
   126  2048             4          4          2
   144  2336             7          3          4
   151  2448             0          0          0
   168  2720            15         15         10
   190  3072            28         27         21
   202  3264             0          0          0
   254  4096         36209      36209      36209

 Total               37022      36326      36288

We can calculate the overall fragentation by the last line:
    Total               37022      36326      36288
    (37022 - 36326) / 37022 = 1.87%

Also by analysing objects alocated in every class we know why we got so
low fragmentation: Most of the allocated objects is in <class 254>.  And
there is only 1 page in class 254 zspage.  So, No fragmentation will be
introduced by allocating objs in class 254.

And in future, we can collect other zsmalloc statistics as we need and
analyse them.

Bug: 25951511

Change-Id: I117604ad5fec8be5c10810ea93049140fd51f80b
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:51 +05:30
Ganesh Mahendran
45797ad2eb BACKPORT: mm/zpool: add name argument to create zpool
(cherry-pick from commit 3eba0c6a56c04f2b017b43641a821f1ebfb7fb4c)

Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them.  There is not a method to let zsmalloc/zbud find which caller they
belong to.

Now we want to add statistics collection in zsmalloc.  We need to name the
debugfs dir for each pool created.  The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.

    /sys/kernel/debug/zsmalloc/zram0

This patch adds an argument `name' to zs_create_pool() and other related
functions.

Bug: 25951511

Change-Id: Ib71e8e63c71e808795073bd08c0aab14b43b4c35
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	drivers/block/zram/zram_drv.c
2016-05-18 14:35:51 +05:30
Sergey Senozhatsky
a43c5c80c2 UPSTREAM: zram: remove request_queue from struct zram
(cherry-pick from commit ee98016010ae036a5b27300d83bd99ef3fd5776e)

`struct zram' contains both `struct gendisk' and `struct request_queue'.
the latter can be deleted, because zram->disk carries ->queue pointer, and
->queue carries zram pointer:

create_device()
	zram->queue->queuedata = zram
	zram->disk->queue = zram->queue
	zram->disk->private_data = zram

so zram->queue is not needed, we can access all necessary data anyway.

Bug: 25951511

Change-Id: If5bcc26c6b1ec69843afdadf745acaa506efc5e6
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	drivers/block/zram/zram_drv.c
2016-05-18 14:35:50 +05:30
Minchan Kim
6ad88f32b1 BACKPORT: zram: remove init_lock in zram_make_request
(cherry-pick from commit 08eee69fcf6baea543a2b4d2a2fcba0e61aa3160)

Admin could reset zram during I/O operation going on so we have used
zram->init_lock as read-side lock in I/O path to prevent sudden zram
meta freeing.

However, the init_lock is really troublesome.  We can't do call
zram_meta_alloc under init_lock due to lockdep splat because
zram_rw_page is one of the function under reclaim path and hold it as
read_lock while other places in process context hold it as write_lock.
So, we have used allocation out of the lock to avoid lockdep warn but
it's not good for readability and fainally, I met another lockdep splat
between init_lock and cpu_hotplug from kmem_cache_destroy during working
zsmalloc compaction.  :(

Yes, the ideal is to remove horrible init_lock of zram in rw path.  This
patch removes it in rw path and instead, add atomic refcount for meta
lifetime management and completion to free meta in process context.
It's important to free meta in process context because some of resource
destruction needs mutex lock, which could be held if we releases the
resource in reclaim context so it's deadlock, again.

As a bonus, we could remove init_done check in rw path because
zram_meta_get will do a role for it, instead.

Bug: 25951511

Change-Id: I6a9ec171f003eeb31f76dd08c70d4c84a3c5f7a3
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	drivers/block/zram/zram_drv.c
	drivers/block/zram/zram_drv.h
2016-05-18 14:35:50 +05:30
Minchan Kim
1233c44f3f UPSTREAM: zram: check bd_openers instead of bd_holders
(cherry-pick from commit 2b269ce6fcbfafc6cae37254cab4bf2309bfed0e)

bd_holders is increased only when user open the device file as FMODE_EXCL
so if something opens zram0 as !FMODE_EXCL and request I/O while another
user reset zram0, we can see following warning.

  zram0: detected capacity change from 0 to 64424509440
  Buffer I/O error on dev zram0, logical block 180823, lost async page write
  Buffer I/O error on dev zram0, logical block 180824, lost async page write
  Buffer I/O error on dev zram0, logical block 180825, lost async page write
  Buffer I/O error on dev zram0, logical block 180826, lost async page write
  Buffer I/O error on dev zram0, logical block 180827, lost async page write
  Buffer I/O error on dev zram0, logical block 180828, lost async page write
  Buffer I/O error on dev zram0, logical block 180829, lost async page write
  Buffer I/O error on dev zram0, logical block 180830, lost async page write
  Buffer I/O error on dev zram0, logical block 180831, lost async page write
  Buffer I/O error on dev zram0, logical block 180832, lost async page write
  ------------[ cut here ]------------
  WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210()
  Modules linked in:
  CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x45/0x57
    warn_slowpath_common+0x8a/0xc0
    warn_slowpath_null+0x1a/0x20
    __blkdev_put+0x1d7/0x210
    blkdev_put+0x50/0x130
    blkdev_close+0x25/0x30
    __fput+0xdf/0x1e0
    ____fput+0xe/0x10
    task_work_run+0xa7/0xe0
    do_notify_resume+0x49/0x60
    int_signal+0x12/0x17
  ---[ end trace 274fbbc5664827d2 ]---

The warning comes from bdev_write_node in blkdev_put path.

   static void bdev_write_inode(struct inode *inode)
   {
        spin_lock(&inode->i_lock);
        while (inode->i_state & I_DIRTY) {
                spin_unlock(&inode->i_lock);
                WARN_ON_ONCE(write_inode_now(inode, true)); <========= here.
                spin_lock(&inode->i_lock);
        }
        spin_unlock(&inode->i_lock);
   }

The reason is dd process encounters I/O fails due to sudden block device
disappear so in filemap_check_errors in __writeback_single_inode returns
-EIO.

If we check bd_openers instead of bd_holders, we could address the
problem.  When I see the brd, it already have used it rather than
bd_holders so although I'm not a expert of block layer, it seems to be
better.

I can make following warning with below simple script.  In addition, I
added msleep(2000) below set_capacity(zram->disk, 0) after applying your
patch to make window huge(Kudos to Ganesh!)

script:

   echo $((60<<30)) > /sys/block/zram0/disksize
   setsid dd if=/dev/zero of=/dev/zram0 &
   sleep 1
   setsid echo 1 > /sys/block/zram0/reset

Bug: 25951511

Change-Id: Ic938f9b091525ff4db68035d0e1b95332837cd31
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Ganesh Mahendran <opensource.ganesh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:50 +05:30
Sergey Senozhatsky
ac694eb8d6 UPSTREAM: zram: rework reset and destroy path
(cherry-pick from commit a096cafc31862c54da0b56c8441dc14023437008)

We need to return set_capacity(disk, 0) from reset_store() back to
zram_reset_device(), a catch by Ganesh Mahendran.  Potentially, we can
race set_capacity() calls from init and reset paths.

The problem is that zram_reset_device() is also getting called from
zram_exit(), which performs operations in misleading reversed order -- we
first create_device() and then init it, while zram_exit() perform
destroy_device() first and then does zram_reset_device().  This is done to
remove sysfs group before we reset device, so we can continue with device
reset/destruction not being raced by sysfs attr write (f.e.  disksize).

Apart from that, destroy_device() releases zram->disk (but we still have
->disk pointer), so we cannot acces zram->disk in later
zram_reset_device() call, which may cause additional errors in the future.

So, this patch rework and cleanup destroy path.

1) remove several unneeded goto labels in zram_init()

2) factor out zram_init() error path and zram_exit() into
   destroy_devices() function, which takes the number of devices to
   destroy as its argument.

3) remove sysfs group in destroy_devices() first, so we can reorder
   operations -- reset device (as expected) goes before disk destroy and
   queue cleanup.  So we can always access ->disk in zram_reset_device().

4) and, finally, return set_capacity() back under ->init_lock.

Bug: 25951511

Change-Id: I35c6555b3812249ebe1f03b281055e82a15a6ae1
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	drivers/block/zram/zram_drv.c
2016-05-18 14:35:50 +05:30
Sergey Senozhatsky
7d99dc0502 UPSTREAM: zram: fix umount-reset_store-mount race condition
(cherry-pick from commit ba6b17d68c8e3aa8d55d0474299cb931965c5ea5)

Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to
avoid ->bd_holders race condition:

        CPU0                            CPU1
umount /* zram->init_done is true */
reset_store()
bdev->bd_holders == 0                   mount
...                                     zram_make_request()
zram_reset_device()

However, his solution required some considerable amount of code movement,
which we can avoid.

Apart from using bdev->bd_mutex in reset_store(), this patch also
simplifies zram_reset_device().

zram_reset_device() has a bool parameter reset_capacity which tells it
whether disk capacity and itself disk should be reset.  There are two
zram_reset_device() callers:

-- zram_exit() passes reset_capacity=false
-- reset_store() passes reset_capacity=true

So we can move reset_capacity-sensitive work out of zram_reset_device()
and perform it unconditionally in reset_store().  This also lets us drop
reset_capacity parameter from zram_reset_device() and pass zram pointer
only.

Bug: 25951511

Change-Id: I009e5303f133d60fb022fef9397863f823bd1e33
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:49 +05:30
Ganesh Mahendran
8459eaff31 UPSTREAM: zram: free meta table in zram_meta_free
(cherry-pick from commit 1fec117281d9f5349c35279c9521f4096fa33357)

zram_meta_alloc() and zram_meta_free() are a pair.  In
zram_meta_alloc(), meta table is allocated.  So it it better to free it
in zram_meta_free().

Bug: 25951511

Change-Id: Idfc0f20c7b0756318d884bcebd2decc66371ed64
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	drivers/block/zram/zram_drv.c
2016-05-18 14:35:49 +05:30
Sergey Senozhatsky
96f9ab5003 UPSTREAM: zram: clean up zram_meta_alloc()
(cherry-pick from commit b8179958327a1f513efca095ba782a1986c7c4fb)

A trivial cleanup of zram_meta_alloc() error handling.

Bug: 25951511

Change-Id: I73b05e63dc4580191fc50857af0e4f37b56c0f05
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:49 +05:30
Ganesh Mahendran
bc7a7b9717 BACKPORT: mm/zsmalloc: adjust order of functions
(cherry-pick from commit 66cdef663cd7a97aff6bbbf41a81a0205dc81ba2)

Currently functions in zsmalloc.c does not arranged in a readable and
reasonable sequence.  With the more and more functions added, we may
meet below inconvenience.  For example:

Current functions:

    void zs_init()
    {
    }

    static void get_maxobj_per_zspage()
    {
    }

Then I want to add a func_1() which is called from zs_init(), and this
new added function func_1() will used get_maxobj_per_zspage() which is
defined below zs_init().

    void func_1()
    {
        get_maxobj_per_zspage()
    }

    void zs_init()
    {
        func_1()
    }

    static void get_maxobj_per_zspage()
    {
    }

This will cause compiling issue. So we must add a declaration:

    static void get_maxobj_per_zspage();

before func_1() if we do not put get_maxobj_per_zspage() before
func_1().

In addition, puting module_[init|exit] functions at the bottom of the
file conforms to our habit.

So, this patch ajusts function sequence as:

    /* helper functions */
    ...
    obj_location_to_handle()
    ...

    /* Some exported functions */
    ...

    zs_map_object()
    zs_unmap_object()

    zs_malloc()
    zs_free()

    zs_init()
    zs_exit()

Bug: 25951511

Change-Id: I68377a213ade041b34e99a4280ebd57a933dfa83
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:49 +05:30
Ganesh Mahendran
caeae3dbac UPSTREAM: mm/zsmalloc: allocate exactly size of struct zs_pool
(cherry-pick from commit 181366561ac1e1a7bc3b91dbe45e7614a2f758b9)

In zs_create_pool(), we allocate memory more then sizeof(struct zs_pool)
  ovhd_size = roundup(sizeof(*pool), PAGE_SIZE);

This patch allocate memory of exactly needed size.

Bug: 25951511

Change-Id: Iae1216f7ab304164ea7d687e0037574d6738c94e
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:48 +05:30
Ganesh Mahendran
0349162dc3 UPSTREAM: mm/zsmalloc: avoid duplicate assignment of prev_class
(cherry-pick from commit df8b5bb998f10cfc040ad30300f9a9ea4592ff82)

In zs_create_pool(), prev_class is assigned (ZS_SIZE_CLASSES - 1) times.
And the prev_class only references to the previous size_class.  So we do
not need unnecessary assignement.

This patch assigns *prev_class* when a new size_class structure is
allocated and uses prev_class to check whether the first class has been
allocated.

Bug: 25951511

Change-Id: Ie5e4be867976af0e9ce786a58d1ee0147b7fb0ad
[akpm@linux-foundation.org: remove now-unused ZS_SIZE_CLASSES]
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:48 +05:30
Mahendran Ganesh
7c6112c356 UPSTREAM: mm/zram: correct ZRAM_ZERO flag bit position
(cherry-pick from commit d49b1c254c997195872a9e8913660a788298921e)

In struct zram_table_entry, the element *value* contains obj size and obj
zram flags.  Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and
bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj
zram_flags.  So the first zram flag(ZRAM_ZERO) should be from
ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1).

This patch fixes this cosmetic issue.

Also fix a typo, "page in now accessed" -> "page is now accessed"

Bug: 25951511

Change-Id: I3e64045aa288d1e0bb7c1f5b4d4c5ac30d887b89
Signed-off-by: Mahendran Ganesh <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:48 +05:30
Mahendran Ganesh
3685436a0c UPSTREAM: mm/zsmalloc: support allocating obj with size of ZS_MAX_ALLOC_SIZE
(cherry-pick from commit 40f9fb8cffc6a20ae269e3b43dfba7a4f65d7f50)

I sent a patch [1] for unnecessary check in zsmalloc.  And Minchan Kim
found zsmalloc even does not support allocating an obj with the size of
ZS_MAX_ALLOC_SIZE in some situations.

For example:
   In system with 64KB PAGE_SIZE and 32 bit of physical addr. Then:
   ZS_MIN_ALLOC_SIZE is 32 bytes which is calculated by:
      MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
   ZS_MAX_ALLOC_SIZE is 64KB(in current code, is PAGE_SIZE)
   ZS_SIZE_CLASS_DELTA is 256 bytes
   So, ZS_SIZE_CLASSES = (ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) /
                          ZS_SIZE_CLASS_DELTA + 1
                       = 256

   In zs_create_pool(), the max size obj which can be allocated will be:
      ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA = 32 + 255*256 = 65312

   We can see that 65312 < 65536 (ZS_MAX_ALLOC_SIZE). So we can NOT
   allocate objs with size ZS_MAX_ALLOC_SIZE(65536) which we promise upper
   users we can do.

 [1]  http://lkml.iu.edu/hypermail/linux/kernel/1411.2/03835.html
 [2]  http://lkml.iu.edu/hypermail/linux/kernel/1411.2/04534.html

This patch fixes this issue by dynamiclly calculating zs_size_classes when
module is loaded, allocates buffer with size ZS_MAX_ALLOC_SIZE.  Then the
max obj(size is ZS_MAX_ALLOC_SIZE) can be stored in it.

Bug: 25951511

Change-Id: Ia35e3456e94ebaf14c65a13dde8b471ebe1095ab
[akpm@linux-foundation.org: restore ZS_SIZE_CLASSES to fix bisectability]
Signed-off-by: Mahendran Ganesh <opensource.ganesh@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:48 +05:30
Minchan Kim
7062763e70 UPSTREAM: zsmalloc: correct fragile [kmap|kunmap]_atomic use
(cherry-pick from commit af4ee5e977acb150371c28bd85cb7e34cac48b13)

The kunmap_atomic should use virtual address getting by kmap_atomic.
However, some pieces of code in zsmalloc uses modified address, not the
one got by kmap_atomic for kunmap_atomic.

It's okay for working because zsmalloc modifies the address inner
PAGE_SIZE bounday so it works with current kmap_atomic's implementation.
But it's still fragile with potential changing of kmap_atomic so let's
correct it.

I got a subtle bug when I implemented a new feature of zsmalloc
(compaction) due to a link's mishandling (the link was over page
boundary).  Although it was totally my mistake, it took a while to find
the cause because an unpredictable kmapped address was unmapped causing an
almost random crash.

Bug: 25951511

Change-Id: I9337684d102af93ec600077bf4c9658a942c8d09
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:47 +05:30
Sergey Senozhatsky
2f57d4ef53 UPSTREAM: zsmalloc: fix zs_init cpu notifier error handling
(cherry-pick from commit b1b00a5b8a6cf32e3973507decf1216709b55072)

Mahendran Ganesh reported that zpool-enabled zsmalloc should not call
zpool_unregister_driver() from zs_init() if cpu notifier registration has
failed, because error handling is performed before we register the driver
via zpool_register_driver() call.

Factor out cpu notifier registration and unregistration code and fix
zs_init() error handling.

Bug: 25951511

Change-Id: I9311d16de84accd9c5d3f2a333b30fe189a37222
link: http://lkml.iu.edu//hypermail/linux/kernel/1411.1/04156.html
[akpm@linux-foundation.org: squash bogus gcc warning]
[akpm@linux-foundation.org: use __init and __exit]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Mahendran Ganesh <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:47 +05:30
karam.lee
3d9892d18f BACKPORT: zram: change parameter from vaild_io_request()
(cherry-pick from commit 54850e73e86e3bc092680d1bdb84eb322f982ab1)

This patch changes parameter of valid_io_request for common usage.  The
purpose of valid_io_request() is to determine if bio request is valid or
not.

This patch use I/O start address and size instead of a BIO parameter for
common usage.

Bug: 25951511

Change-Id: I72ddd150a7cefb7f4cf33682431e284bd86c4128
Signed-off-by: karam.lee <karam.lee@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:47 +05:30
karam.lee
b09123e63d BACKPORT: zram: remove bio parameter from zram_bvec_rw()
(cherry-pick from commit b627cff3d308d3ccb3ec73a89260f5c7872e46a4)

Recently rw_page block device operation has been added.  This patchset
implements rw_page operation for zram block device and does some clean-up.

This patch (of 3):

Remove an unnecessary parameter(bio) from zram_bvec_rw() and
zram_bvec_read().  zram_bvec_read() doesn't use a bio parameter, so remove
it.  zram_bvec_rw() calls a read/write operation not using bio, so a rw
parameter replaces a bio parameter.

Bug: 25951511

Change-Id: I6aea368206cad8032a2deff25d2638df871b7629
Signed-off-by: karam.lee <karam.lee@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <seungho1.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:47 +05:30
Joonsoo Kim
0bd808f18f UPSTREAM: zsmalloc: merge size_class to reduce fragmentation
(cherry-pick from commit 9eec4cd53f9865b733dc78cf5f6465871beed014)

zsmalloc has many size_classes to reduce fragmentation and they are in 16
bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096.  And,
zsmalloc has constraint that each zspage has 4 pages at maximum.

In this situation, we can see interesting aspect.  Let's think about
size_class for 1488, 1472, ..., 1376.  To prevent external fragmentation,
they uses 4 pages per zspage and so all they can contain 11 objects at
maximum.

16384 (4096 * 4) = 1488 * 11 + remains
16384 (4096 * 4) = 1472 * 11 + remains
16384 (4096 * 4) = ...
16384 (4096 * 4) = 1376 * 11 + remains

It means that they have same characteristics and classification between
them isn't needed.  If we use one size_class for them, we can reduce
fragementation and save some memory since both the 1488 and 1472 sized
classes can only fit 11 objects into 4 pages, and an object that's 1472
bytes can fit into an object that's 1488 bytes, merging these classes to
always use objects that are 1488 bytes will reduce the total number of
size classes.  And reducing the total number of size classes reduces
overall fragmentation, because a wider range of compressed pages can fit
into a single size class, leaving less unused objects in each size class.

For this purpose, this patch implement size_class merging.  If there is
size_class that have same pages_per_zspage and same number of objects per
zspage with previous size_class, we don't create new size_class.  Instead,
we use previous, same characteristic size_class.  With this way, above
example sizes (1488, 1472, ..., 1376) use just one size_class so we can
get much more memory utilization.

Below is result of my simple test.

TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
source code, remove directory in descending order in size.  (drivers arch
fs sound include net Documentation firmware kernel tools)

Each line represents orig_data_size, compr_data_size, mem_used_total,
fragmentation overhead (mem_used - compr_data_size) and overhead ratio
(overhead to compr_data_size), respectively, after untar and remove
operation is executed.

* untar-nomerge.out

orig_size compr_size used_size overhead overhead_ratio
525.88MB 199.16MB 210.23MB  11.08MB 5.56%
288.32MB  97.43MB 105.63MB   8.20MB 8.41%
177.32MB  61.12MB  69.40MB   8.28MB 13.55%
146.47MB  47.32MB  56.10MB   8.78MB 18.55%
124.16MB  38.85MB  48.41MB   9.55MB 24.58%
103.93MB  31.68MB  40.93MB   9.25MB 29.21%
 84.34MB  22.86MB  32.72MB   9.86MB 43.13%
 66.87MB  14.83MB  23.83MB   9.00MB 60.70%
 60.67MB  11.11MB  18.60MB   7.49MB 67.48%
 55.86MB   8.83MB  16.61MB   7.77MB 88.03%
 53.32MB   8.01MB  15.32MB   7.31MB 91.24%

* untar-merge.out

orig_size compr_size used_size overhead overhead_ratio
526.23MB 199.18MB 209.81MB  10.64MB 5.34%
288.68MB  97.45MB 104.08MB   6.63MB 6.80%
177.68MB  61.14MB  66.93MB   5.79MB 9.47%
146.83MB  47.34MB  52.79MB   5.45MB 11.51%
124.52MB  38.87MB  44.30MB   5.43MB 13.96%
104.29MB  31.70MB  36.83MB   5.13MB 16.19%
 84.70MB  22.88MB  27.92MB   5.04MB 22.04%
 67.11MB  14.83MB  19.26MB   4.43MB 29.86%
 60.82MB  11.10MB  14.90MB   3.79MB 34.17%
 55.90MB   8.82MB  12.61MB   3.79MB 42.97%
 53.32MB   8.01MB  11.73MB   3.73MB 46.53%

As you can see above result, merged one has better utilization (overhead
ratio, 5th column) and uses less memory (mem_used_total, 3rd column).

Bug: 25951511

Change-Id: I00825d2b8de666abb7a0d8b47348b89e8af80571
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: <juno.choi@lge.com>
Cc: "seungho1.park" <seungho1.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:46 +05:30
Weijie Yang
8fdb993dee BACKPORT: zram: avoid kunmap_atomic() of a NULL pointer
(cherry-pick from commit c406515239376fc93a30d5d03192182160cbd3fb)

zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io.  The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.

This patch fixes this issue.

Bug: 25951511

Change-Id: Ic3737ccd35c6da56fe23c40f186eda728058278f
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang.kh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:46 +05:30
Weijie Yang
1ac6be903a UPSTREAM: zram: avoid NULL pointer access in concurrent situation
(cherry-pick from commit 5a99e95b8d1cd47f6feddcdca6c71d22060df8a2)

There is a rare NULL pointer bug in mem_used_total_show() and
mem_used_max_store() in concurrent situation, like this:

zram is not initialized, process A is a mem_used_total reader which runs
periodically, while process B try to init zram.

	process A 				process B
  access meta, get a NULL value
						init zram, done
  init_done() is true
  access meta->mem_pool, get a NULL pointer BUG

This patch fixes this issue.

Bug: 25951511

Change-Id: I972d9d7a84e32f791685d55d90ce18ce02e5183a
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:46 +05:30
Dan Streetman
40ea37de5f UPSTREAM: zsmalloc: simplify init_zspage free obj linking
(cherry-pick from commit 5538c562377580947916b3366898f1eb5f53768e)

Change zsmalloc init_zspage() logic to iterate through each object on each
of its pages, checking the offset to verify the object is on the current
page before linking it into the zspage.

The current zsmalloc init_zspage free object linking code has logic that
relies on there only being one page per zspage when PAGE_SIZE is a
multiple of class->size.  It calculates the number of objects for the
current page, and iterates through all of them plus one, to account for
the assumed partial object at the end of the page.  While this currently
works, the logic can be simplified to just link the object at each
successive offset until the offset is larger than PAGE_SIZE, which does
not rely on PAGE_SIZE being a multiple of class->size.

Bug: 25951511

Change-Id: I89e562a18b083f24f4697b4154d5b238becb36e6
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:27 +05:30
Wang Sheng-Hui
3ab34848e6 UPSTREAM: mm/zsmalloc.c: correct comment for fullness group computation
(cherry-pick from commit 6dd9737e31504f9377a8a19810ea4922e88516c1)

The letter 'f' in "n <= N/f" stands for fullness_threshold_frac, not
1/fullness_threshold_frac.

Bug: 25951511

Change-Id: I3d3f090fab39fca1011999ea12e9aab187504e39
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:27 +05:30
Sergey Senozhatsky
61eb88598b UPSTREAM: zram: use notify_free to account all free notifications
(cherry-pick from commit 015254daf1753003c19c46b90ee85a963260d270)

`notify_free' device attribute accounts the number of slot free
notifications and internally represents the number of zram_free_page()
calls.  Slot free notifications are sent only when device is used as a
swap device, hence `notify_free' is used only for swap devices.  Since
f4659d8e620d08 (zram: support REQ_DISCARD) ZRAM handles yet another one
free notification (also via zram_free_page() call) -- REQ_DISCARD
requests, which are sent by a filesystem, whenever some data blocks are
discarded.  However, there is no way to know the number of notifications
in the latter case.

Use `notify_free' to account the number of pages freed by
zram_bio_discard() and zram_slot_free_notify().  Depending on usage
scenario `notify_free' represents:

 a) the number of pages freed because of slot free notifications, which is
   equal to the number of swap_slot_free_notify() calls, so there is no
   behaviour change

 b) the number of pages freed because of REQ_DISCARD notifications

Bug: 25951511

Change-Id: Ib0f93a89c388de1f23ffecf38fdf4e7218d4c6dc
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:27 +05:30
Minchan Kim
b41b4e48f6 UPSTREAM: zram: report maximum used memory
(cherry-pick from commit 461a8eee6af3b55745be64bea403ed0b743563cf)

Normally, zram user could get maximum memory usage zram consumed via
polling mem_used_total with sysfs in userspace.

But it has a critical problem because user can miss peak memory usage
during update inverval of polling.  For avoiding that, user should poll it
with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault
delay when memory pressure is heavy.  It would be troublesome.

This patch adds new knob "mem_used_max" so user could see the maximum
memory usage easily via reading the knob and reset it via "echo 0 >
/sys/block/zram0/mem_used_max".

Bug: 25951511

Change-Id: I117b162ce92f1601b2ad2af86ab205c6c9ca6769
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Reviewed-by: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:26 +05:30
Minchan Kim
b693647fcd UPSTREAM: zram: zram memory size limitation
(cherry-pick from commit 9ada9da9573f3460b156b7755c093e30b258eacb)

Since zram has no control feature to limit memory usage, it makes hard to
manage system memrory.

This patch adds new knob "mem_limit" via sysfs to set up the a limit so
that zram could fail allocation once it reaches the limit.

In addition, user could change the limit in runtime so that he could
manage the memory more dynamically.

Initial state is no limit so it doesn't break old behavior.

Bug: 25951511

Change-Id: I306a9582a9273c521d90b607a3ba2b44860a6273
[akpm@linux-foundation.org: fix typo, per Sergey]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:26 +05:30
Minchan Kim
2e9865bcb3 UPSTREAM: zsmalloc: change return value unit of zs_get_total_size_bytes
(cherry-pick from commit 722cdc17232f0f684011407f7cf3c40d39457971)

zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with
*byte unit* but zsmalloc operates *page unit* rather than byte unit so
let's change the API so benefit we could get is that reduce unnecessary
overhead (ie, change page unit with byte unit) in zsmalloc.

Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".

Bug: 25951511

Change-Id: I2cbd9426483ae31c846923594e2cc3a8028e6cc2
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-18 14:35:26 +05:30