Commit graph

313882 commits

Author SHA1 Message Date
Minchan Kim
213b53a48f zram: promote zram from staging
(cherry pick from commit cd67e10ac6)

Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now.  Of
course, there are lots of product using zram in real practice.

The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram
which is used as swap, too and recently Android Kitkat start to use zram
for small memory smart phone.  And there was a report Google released
their ChromeOS with zram, too and cyanogenmod have been used zram long
time ago.  And I heard some disto have used zram block device for tmpfs.
In addition, I saw many report from many other peoples.  For example,
Lubuntu start to use it.

The benefit of zram is very clear.  With my experience, one of the
benefit was to remove jitter of video application with backgroud memory
pressure.  It would be effect of efficient memory usage by compression
but more issue is whether swap is there or not in the system.  Recent
mobile platforms have used JAVA so there are many anonymous pages.  But
embedded system normally are reluctant to use eMMC or SDCard as swap
because there is wear-leveling and latency issues so if we do not use
swap, it means we can't reclaim anoymous pages and at last, we could
encounter OOM kill.  :(

Although we have real storage as swap, it was a problem, too.  Because
it sometime ends up making system very unresponsible caused by slow swap
storage performance.

Quote from Luigi on Google
 "Since Chrome OS was mentioned: the main reason why we don't use swap
  to a disk (rotating or SSD) is because it doesn't degrade gracefully
  and leads to a bad interactive experience.  Generally we prefer to
  manage RAM at a higher level, by transparently killing and restarting
  processes.  But we noticed that zram is fast enough to be competitive
  with the latter, and it lets us make more efficient use of the
  available RAM.  " and he announced.
http://www.spinics.net/lists/linux-mm/msg57717.html

Other uses case is to use zram for block device.  Zram is block device
so anyone can format the block device and mount on it so some guys on
the internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html

Let's promote zram and enhance/maintain it instead of removing.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 24810447
Change-Id: Ie338290523af86fc4401a1560920de1b71100152
2018-01-01 21:26:45 +03:00
Jiang Liu
2be17ed8cc zram: kill unused zram_get_num_devices()
(cherry pick from commit 0f0e3ba346)

Now there's no caller of zram_get_num_devices(), so kill it.
And change zram_devices to static because it's only used in zram_drv.c.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: I3b72d6512bcbf5ba055b02fce68a833e28134ded
2018-01-01 21:26:44 +03:00
Jiang Liu
e0095c0a87 zram: destroy all devices on error recovery path in zram_init()
(cherry pick from commit 39a9b8ac93)

On error recovery path of zram_init(), it leaks the zram device object
causing the failure. So change create_device() to free allocated
resources on error path.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: Ifb794f34d4e999ebd35b89aa2f3eeca8fd696ce8
2018-01-01 21:26:44 +03:00
Rashika Kheria
dfbe85d845 Staging: zram: Fix memory leak by refcount mismatch
(cherry pick from commit 1b672224d1)

As suggested by Minchan Kim and Jerome Marchand "The code in reset_store
get the block device (bdget_disk()) but it does not put it (bdput()) when
it's done using it. The usage count is therefore incremented but never
decremented."

This patch also puts bdput() for all error cases.

Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: I034c6cc59426fee5ee6f069ae9cf9ca5395202a2
2018-01-01 21:26:44 +03:00
Minchan Kim
a4457001c7 zram: fix invalid memory access
(cherry pick from commit 2b86ab9cc2)

[1] tried to fix invalid memory access on zram->disk but it didn't
fix properly because get_disk failed during module exit path.

Actually, we don't need to reset zram->disk's capacity to zero
in module exit path so that this patch introduces new argument
"reset_capacity" on zram_reset_divice and it only reset it when
reset_store is called.

[1] 6030ea9b,  zram: avoid invalid memory access in zram_exit()

Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: I87e35bc5af6f82a75a9cdc620a142f86e80f2884
2018-01-01 21:26:44 +03:00
Sergey Senozhatsky
b897f1bf54 staging: zram: protect zram_reset_device() call
(cherry pick from commit 644d478793)

Commit 9b3bb7abcd (remove
zram_sysfs file (v2)) accidentally made zram_reset_device()
racy. Protect zram_reset_device() call with zram->lock.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: I8d22088bf6a2f292038e4104b71d04eec3d2c60e
2018-01-01 21:26:44 +03:00
Sergey Senozhatsky
9bc40fb297 zram: remove zram_sysfs file (v2)
(cherry pick from commit 9b3bb7abcd)

Move zram sysfs code to zram drv and remove zram_sysfs.c
file. This gives ability to make static a number of previously
exported zram functions, used from zram sysfs, e.g. internal zram
zram_meta_alloc/free(). We also can drop zram_drv wrapper
functions, used from zram sysfs:
e.g. zram_reset_device()/__zram_reset_device() pair.

v2: as suggested by Greg K-H, move MODULE description to the
bottom of the file.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: Ic0038af190db194f7891cb27502a9b6d0d9e4042
2018-01-01 21:26:43 +03:00
Jiang Liu
89d6cc3000 zram: avoid access beyond the zram device
(cherry pick from commit 12a7ad3b81)

Function valid_io_request() should verify the entire request are within
the zram device address range. Otherwise it may cause invalid memory
access when accessing/modifying zram->meta->table[index] because the
'index' is out of range. Then it may access non-exist memory, randomly
modify memory belong to other subsystems, which is hard to track down.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: I648d6f874f49714a044f53ff111f5a22a38d38f8
2018-01-01 21:26:43 +03:00
Jiang Liu
5654a4b1cc zram: use atomic64_xxx() to replace zram_stat64_xxx()
(cherry pick from commit da5cc7d338)

Use atomic64_xxx() to replace open-coded zram_stat64_xxx().
Some architectures have native support of atomic64 operations,
so we can get rid of the spin_lock() in zram_stat64_xxx().
On the other hand, for platforms use generic version of atomic64
implement, it may cause an extra save/restore of the interrupt
flag.  So it's a tradeoff.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: I75d273bd6acc4cacbe55c2161144b656da2e421c
2018-01-01 21:26:43 +03:00
Nitin Gupta
641120b18a staging: zram: fix invalid memory references during disk write
(cherry pick from commit 397c60668a)

Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
 - Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
 - Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.

Fixes bug 50081:
https://bugzilla.kernel.org/show_bug.cgi?id=50081

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@vger.kernel.org>
Reported-by: Mihail Kasadjikov <hamer.mk@gmail.com>
Reported-by: Tomas M <tomas@slax.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 24810447
Change-Id: Ia6558c1f89896522679f99472d801b1060dc3628
2018-01-01 21:26:42 +03:00
Wanpeng Li
6ecde3ae59 zram: fix zram_bvec_read duplicate dump failure message and stat accumulation
When zram decompress fails, the code unnecessarily dumps failure messages and
does stat accumulation in function zram_decompress_page(), this work is already
done in function zram_decompress_page, the patch skips the redundant work.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:42 +03:00
Joe Perches
597bdce628 staging: Remove unnecessary OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.

Cherry-picked to zram only from:
78110bb staging: Remove unnecessary OOM messages

Change-Id: Ic2f60d58face1ba9331cb45317e1e7db30f90efe
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Artem Borisov <dedsa2002@gmail.com>
2018-01-01 21:26:42 +03:00
Fengguang Wu
9bc08fb8f5 staging: zram: __zram_reset_device() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:42 +03:00
Minchan Kim
02d9724c74 zram: get rid of lockdep warning
Lockdep complains about recursive deadlock of zram->init_lock.
[1] made it false positive because we can't request IO to zram
before setting disksize. Anyway, we should shut lockdep up to
avoid many reporting from user.

[1] : zram: force disksize setting before using zram

Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:41 +03:00
Minchan Kim
bd481c089c zram: fix warning of print format
kbuild bot whinges due to print format mistmatch caused by
zram: force disksize setting before using zram.

This patch fixes it.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:41 +03:00
Minchan Kim
89428c94aa zram: give up lazy initialization of zram metadata
1) User of zram normally do mkfs.xxx or mkswap before using
   the zram block device(ex, normally, do it at booting time)
   It ends up allocating such metadata of zram before real usage so
   benefit of lazy initialzation would be mitigated.

2) Some user want to use zram when memory pressure is high.(ie, load zram
   dynamically, NOT booting time). It does make sense because people don't
   want to waste memory until memory pressure is high(ie, where zram is really
   helpful time). In this case, lazy initialzation could be failed easily
   because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
   So the benefit of lazy initialzation would be mitigated, too.

3) Metadata overhead is not critical and Nitin has a plan to diet it.
   4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
   If insane user use such big zram device up to 20, it could consume 6% of ram
   but efficieny of zram will cover the waste.

So this patch gives up lazy initialization and instead we initialize metadata
at disksize setting time.

Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:41 +03:00
Minchan Kim
93c0d12685 zram: force disksize setting before using zram
Now zram document syas "set disksize is optional"
but partly it's wrong. When you try to use zram firstly after
booting, you must set disksize, otherwise zram can't work because
zram gendisk's size is 0. But once you do it, you can use zram freely
after reset because reset doesn't reset to zero paradoxically.
So in this time, disksize setting is optional.:(
It's inconsitent for user behavior and not straightforward.

This patch forces always setting disksize firstly before using zram.
Yes. It changes current behavior so someone could complain when
he upgrades zram. Apparently it could be a problem if zram is mainline
but it still lives in staging so behavior could be changed for right
way to go. Let them excuse.

Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:41 +03:00
Minchan Kim
a784abfab7 zram: Fix deadlock bug in partial read/write
Now zram allocates new page with GFP_KERNEL in zram I/O path
if IO is partial. Unfortunately, It may cause deadlock with
reclaim path like below.

write_page from fs
fs_lock
allocation(GFP_KERNEL)
reclaim
pageout
				write_page from fs
				fs_lock <-- deadlock

This patch fixes it by using GFP_NOIO.  In read path, we
reorganize code flow so that kmap_atomic is called after the
GFP_NOIO allocation.

Cc: stable@vger.kernel.org
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
[ penberg@kernel.org: don't use GFP_ATOMIC ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:40 +03:00
Artem Borisov
b94c2f784f staging: zram: make up-to-date with 0d145a5
Change-Id: I01e89c8c5f84506d887da9889794ef7844ce4b1e
2018-01-01 21:26:40 +03:00
Davidlohr Bueso
bfdd8b2831 staging: zram: drop zram_stat_dec/inc functions
It seems like an overkill to have adding and subtracting
1 functions from the 32bit counters. Just do it directly.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:40 +03:00
Artem Borisov
328f00c45f staging: zram: make up-to-date with cad683f
Change-Id: Ife12f5f8105d3fb6cb7b0d11752839ace3165864
2018-01-01 21:26:40 +03:00
Davidlohr Bueso
acca7516db staging: zram: show correct disksize
The ->disksize variable stores values in units of bytes,
print the correct size in Kb

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:39 +03:00
Davidlohr Bueso
08eb185b80 staging: zram: simplify num_devices paramater
Simplify dealing with num_devices when initializing zram.
Also cleanup some of the output messages.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:39 +03:00
Nitin Gupta
5e6873d1c0 staging: zram: fix invalid memory references during disk write
Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
 - Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
 - Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.

Fixes bug 50081:
https://bugzilla.kernel.org/show_bug.cgi?id=50081

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@vger.kernel.org>
Reported-by: Mihail Kasadjikov <hamer.mk@gmail.com>
Reported-by: Tomas M <tomas@slax.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:39 +03:00
Masanari Iida
66fd5eec58 staging: Add angle bracket before and after the URL
Add missing angle bracket before and after the URL.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:39 +03:00
Sergey Senozhatsky
2835ff8ae7 staging: zram: handle mem suffixes in disk size zram_sysfs parameter
Use memparse() to allow mem suffixes in disksize sysfs number.
Examples:
    echo 256K > /sys/block/zram0/disksize
    echo 512M > /sys/block/zram0/disksize
    echo 1G > /sys/block/zram0/disksize

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:39 +03:00
Sergey Senozhatsky
a0df5fed8f staging: zram: factor-out zram_decompress_page() function
zram_bvec_read() shared decompress functionality with zram_read_before_write() function.
Factor-out and make commonly used zram_decompress_page() function, which also simplified
error handling in zram_bvec_read().

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:38 +03:00
Nitin Gupta
5e2f2e854c staging: zram: Fix handling of incompressible pages
Change 130f315a (staging: zram: remove special handle of uncompressed page)
introduced a bug in the handling of incompressible pages which resulted in
memory allocation failure for such pages.

When a page expands on compression, say from 4K to 4K+30, we were trying to
do zsmalloc(pool, 4K+30). However, the maximum size which zsmalloc can
allocate is PAGE_SIZE (for obvious reasons), so such allocation requests
always return failure (0).

For a page that has compressed size larger than the original size (this may
happen with already compressed or random data), there is no point storing
the compressed version as that would take more space and would also require
time for decompression when needed again. So, the fix is to store any page,
whose compressed size exceeds a threshold (max_zpage_size), as-it-is i.e.
without compression.  Memory required for storing this uncompressed page can
then be requested from zsmalloc which supports PAGE_SIZE sized allocations.

Lastly, the fix checks that we do not attempt to "decompress" the page which
we stored in the uncompressed form -- we just memcpy() out such pages.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: viechweg@gmail.com
Reported-by: paerley@gmail.com
Reported-by: wu.tommy@gmail.com
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:38 +03:00
Minchan Kim
3f53855a38 staging: zram: correct obsolete comment on max_zpage_size
Zram doesn't use xv_malloc any more so it doesn't have
limitation about zobj_header.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:37 +03:00
Minchan Kim
ca4909acf4 zsmalloc: move it under mm
(cherry pick from bcf1647d08)

This patch moves zsmalloc under mm directory.

Before that, description will explain why we have needed custom
allocator.

Zsmalloc is a new slab-based memory allocator for storing compressed
pages.  It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.

zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.

zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms.  Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab.  This allows for higher allocation success rate under memory
pressure.

Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.

This ability to span pages results in zsmalloc allocations not being
directly addressable by the user.  The user is given an
non-dereferencable handle in response to an allocation request.  That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used.  The mapping is necessary since the
object data may reside in two different noncontigious pages.

The zsmalloc fulfills the allocation needs for zram perfectly

[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 24810447
Change-Id: I7b7923baeb9989e002523c66696e4a98fb357c46
2018-01-01 21:26:37 +03:00
Arnd Bergmann
7f9570c3c2 staging/zsmalloc: don't use pgtable-mapping from modules
Building zsmalloc as a module does not work on ARM because it uses
an interface that is not exported:

ERROR: "flush_tlb_kernel_range" [drivers/staging/zsmalloc/zsmalloc.ko] undefined!

Since this is only used as a performance optimization and only on ARM,
we can avoid the problem simply by not using that optimization when
building zsmalloc it is a loadable module.

flush_tlb_kernel_range is often an inline function, but out of the
architectures that use an extern function, only powerpc exports
it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:37 +03:00
Joerg Roedel
ea5c634f86 staging: zsmalloc: Fix link error on ARM
Testing the arm chromebook config against the upstream
kernel produces a linker error for the zsmalloc module from
staging. The symbol flush_tlb_kernel_range is not available
there. Fix this by removing the reimplementation of
unmap_kernel_range in the zsmalloc module and using the
function directly. The unmap_kernel_range function is not
usable by modules, so also disallow building the driver as a
module for now.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:37 +03:00
Seth Jennings
63c0c98cd8 staging: zsmalloc: remove unused pool name
zs_create_pool() currently takes a name argument which is
never used in any useful way.

This patch removes it.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:37 +03:00
Minchan Kim
b6207c5491 staging: zsmalloc: Fix TLB coherency and build problem
Recently, Matt Sealey reported he fail to build zsmalloc caused by
using of local_flush_tlb_kernel_range which are architecture dependent
function so !CONFIG_SMP in ARM couldn't implement it so it ends up
build error following as.

  MODPOST 216 modules
  LZMA    arch/arm/boot/compressed/piggy.lzma
  AS      arch/arm/boot/compressed/lib1funcs.o
ERROR: "v7wbi_flush_kern_tlb_range"
[drivers/staging/zsmalloc/zsmalloc.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....

The reason we used that function is copy method by [1]
was really slow in ARM but at that time.

More severe problem is ARM can prefetch speculatively on other CPUs
so under us, other TLBs can have an entry only if we do flush local
CPU. Russell King pointed that. Thanks!
We don't have many choices except using flush_tlb_kernel_range.

My experiment in ARMv7 processor 4 core didn't make any difference with
zsmapbench[2] between local_flush_tlb_kernel_range and flush_tlb_kernel_range
but still page-table based is much better than copy-based.

* bigger is better.

1. local_flush_tlb_kernel_range: 3918795 mappings
2. flush_tlb_kernel_range : 3989538 mappings
3. copy-based: 635158 mappings

This patch replace local_flush_tlb_kernel_range with
flush_tlb_kernel_range which are avaialbe in all architectures
because we already have used it in vmalloc allocator which are
generic one so build problem should go away and performane loss
shoud be void.

[1] f553646, zsmalloc: add page table mapping method
[2] https://github.com/spartacus06/zsmapbench

Cc: stable@vger.kernel.org
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Reported-by: Matt Sealey <matt@genesi-usa.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:37 +03:00
Seth Jennings
e469746871 staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE
Right now ZS_SIZE_CLASS_DELTA is hardcoded to be 16.  This
creates 254 classes for systems with 4k pages. However, on
PPC64 with 64k pages, it creates 4095 classes which is far
too many.

This patch makes ZS_SIZE_CLASS_DELTA relative to PAGE_SIZE
so that regardless of the page size, there will be the same
number of classes.

Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:36 +03:00
Davidlohr Bueso
e007e13797 staging: zsmalloc: comment zs_create_pool function
Just as with zs_malloc() and zs_map_object(), it is worth
formally commenting the zs_create_pool() function.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:36 +03:00
Seth Jennings
0feb18b418 zsmalloc: collapse internal .h into .c
The patch collapses in the internal zsmalloc_int.h into
the zsmalloc-main.c file.

This is done in preparation for the promotion to mm/ where
separate internal headers are discouraged.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:36 +03:00
Seth Jennings
d42720859d staging: zsmalloc: add page table mapping method
This patchset provides page mapping via the page table.
On some archs, most notably ARM, this method has been
demonstrated to be faster than copying.

The logic controlling the method selection (copy vs page table)
is controlled by the definition of USE_PGTABLE_MAPPING which
is/can be defined for any arch that performs better with page
table mapping.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:36 +03:00
Seth Jennings
2a1d04065d staging: zsmalloc: prevent mappping in interrupt context
Because we use per-cpu mapping areas shared among the
pools/users, we can't allow mapping in interrupt context
because it can corrupt another users mappings.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:36 +03:00
Seth Jennings
c78d440431 staging: zsmalloc: s/firstpage/page in new copy map funcs
firstpage already has precedent and meaning the first page
of a zspage.  In the case of the copy mapping functions,
it is the first of a pair of pages needing to be mapped.

This patch just renames the firstpage argument to "page" to
avoid confusion.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:35 +03:00
Seth Jennings
d0c0d30ecf staging: zsmalloc: add mapping modes
This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Seth Jennings
a1dc75d3a1 staging: zram/zcache: swtich Kconfig dependency from X86 to ZSMALLOC
This patch switches zcache and zram dependency to ZSMALLOC
rather than X86.  There is no net change since ZSMALLOC
depends on X86, however, this prevent further changes to
these files as zsmalloc dependencies change.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Sam Hansen
e6702408f0 staging: zram: conventions, __aligned() attribute
Using the __aligned() attribute in favor of __attribute__((aligned(size)))

Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Sam Hansen
3de1cfcb16 staging: zram: conventions pr_warning -> pr_warn()
Porting zram to use the pr_warn() function instead of the deprecated
pr_warning().

Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Minchan Kim
06dd95071a staging: zram: remove special handle of uncompressed page
xvmalloc can't handle PAGE_SIZE page so that zram have to
handle it specially but zsmalloc can do it so let's remove
unnecessary special handling code.

Quote from Nitin
"I think page vs handle distinction was added since xvmalloc could not
handle full page allocation. Now that zsmalloc allows full page
allocation, we can just use it for both cases. This would also allow
removing the ZRAM_UNCOMPRESSED flag. The only downside will be slightly
slower code path for full page allocation but this event is anyways
supposed to be rare, so should be fine."

1. This patch reduces code very much.

 drivers/staging/zram/zram_drv.c   |  104 +++++--------------------------------
 drivers/staging/zram/zram_drv.h   |   17 +-----
 drivers/staging/zram/zram_sysfs.c |    6 +--
 3 files changed, 15 insertions(+), 112 deletions(-)

2. change pages_expand with bad_compress so it can count
   bad compression(above 75%) ratio.

3. remove zobj_header which is for back-reference for defragmentation
   because firstly, it's not used at the moment and zsmalloc can't handle
   bigger size than PAGE_SIZE so zram can't do it any more without redesign.

Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Minchan Kim
cde3a704ad staging: zram: fix random data read
fd1a30de makes a bug that it uses (struct page *) as zsmalloc's handle
although it's a uncompressed page so that it can access random page,
return random data or even crashed by get_first_page in zs_map_object.

Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Seth Jennings
7e070212f7 staging: zsmalloc: add details to zs_map_object boiler plate
Add information on the usage limits of zs_map_object()

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:34 +03:00
Seth Jennings
c080fd714f staging: zsmalloc: add single-page object fastpath in unmap
Improve zs_unmap_object() performance by adding a fast path for
objects that don't span pages.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:33 +03:00
Seth Jennings
7e7cf14e21 staging: zsmalloc: remove x86 dependency
This patch replaces the page table assisted object mapping
method, which has x86 dependencies, with a arch-independent
method that does a simple copy into a temporary per-cpu
buffer.

While a copy seems like it would be worse than mapping the pages,
tests demonstrate the copying is always faster and, in the case of
running inside a KVM guest, roughly 4x faster.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:33 +03:00
Ben Hutchings
2e231deed9 staging: zsmalloc: Finish conversion to a separate module
ZSMALLOC is tristate, but the code has no MODULE_LICENSE and since it
depends on GPL-only symbols it cannot be loaded as a module.  This in
turn breaks zram which now depends on it.  I assume it's meant to be
Dual BSD/GPL like the other z-stuff.

There is also no module_exit, which will make it impossible to unload.
Add the appropriate module_init and module_exit declarations suggested
by comments.

Reported-by: Christian Ohm <chr.ohm@gmx.net>
References: http://bugs.debian.org/677273
Cc: stable@vger.kernel.org # v3.4
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-01 21:26:32 +03:00