Commit graph

166 commits

Author SHA1 Message Date
Jaegeuk Kim
568ca9d039 f2fs: wait on page's writeback in writepages path
Likewise f2fs_write_cache_pages, let's do for node and meta pages too.
Especially, for node blocks, we should do this before marking its fsync
and dentry flags.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Chao Yu
c943a5d1aa f2fs: speed up handling holes in fiemap
This patch makes f2fs_map_blocks supporting returning next potential
page offset which skips hole region in indirect tree of inode, and
use it to speed up fiemap in handling big hole case.

Test method:
xfs_io -f /mnt/f2fs/file  -c "pwrite 1099511627776 4096"
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"

Before:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    3m3.518s
user    0m0.000s
sys     3m3.456s

After:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    0m0.008s
user    0m0.000s
sys     0m0.008s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Chao Yu
99300aead6 f2fs: introduce get_next_page_offset to speed up SEEK_DATA
When seeking data in ->llseek, if we encounter a big hole which covers
several dnode pages, we will try to seek data from index of page which
is the first page of next dnode page, at most we could skip searching
(ADDRS_PER_BLOCK - 1) pages.

However it's still not efficient, because if our indirect/double-indirect
pointer are NULL, there are no dnode page locate in the tree indirect/
double-indirect pointer point to, it's not necessary to search the whole
region.

This patch introduces get_next_page_offset to calculate next page offset
based on current searching level and max searching level returned from
get_dnode_of_data, with this, we could skip searching the entire area
indirect or double-indirect node block is not exist.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Chao Yu
e8e6b247e9 f2fs: remove unneeded pointer conversion
There are redundant pointer conversion in following call stack:
 - at position a, inode was been converted to f2fs_file_info.
 - at position b, f2fs_file_info was been converted to inode again.

 - truncate_blocks(inode,..)
  - fi = F2FS_I(inode)		---a
  - ADDRS_PER_PAGE(node_page, fi)
   - addrs_per_inode(fi)
    - inode = &fi->vfs_inode	---b
    - f2fs_has_inline_xattr(inode)
     - fi = F2FS_I(inode)
     - is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Chao Yu
ea2c43f60e f2fs: simplify __allocate_data_blocks
This patch uses existing function f2fs_map_block to simplify implementation
of __allocate_data_blocks.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Chao Yu
d65b55a172 f2fs: simplify f2fs_map_blocks
In f2fs_map_blocks, we use duplicated codes to handle first block mapping
and the following blocks mapping, it's unnecessary. This patch simplifies
f2fs_map_blocks to avoid using copied codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Shuoran Liu
9876399dc3 f2fs: introduce lifetime write IO statistics
This patch introduces lifetime IO write statistics exposed to the sysfs interface.
The write IO amount is obtained from block layer, accumulated in the file system and
stored in the hot node summary of checkpoint.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
[Jaegeuk Kim: add sysfs documentation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Jaegeuk Kim
0007854c9f f2fs: give scheduling point in shrinking path
It needs to give a chance to be rescheduled while shrinking slab entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Hou Pengyang
faf1258aa7 f2fs: improve shrink performance of extent nodes
On the worst case, we need to scan the whole radix tree and every rb-tree to
free the victimed extent_nodes when shrinking.

Pengyang initially introduced a victim_list to record the victimed extent_nodes,
and free these extent_nodes by just scanning a list.

Later, Chao Yu enhances the original patch to improve memory footprint by
removing victim list.

The policy of lru list shrinking becomes:
1) lock lru list's lock
2) trylock extent tree's lock
3) remove extent node from lru list
4) unlock lru list's lock
5) do shrink
6) repeat 1) to 5)

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Jaegeuk Kim
64a2858911 f2fs: don't set cached_en if it will be freed
If en has empty list pointer, it will be freed sooner, so we don't need to
set cached_en with it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:29 +08:00
Jaegeuk Kim
2f6563b2ef f2fs: move extent_node list operations being coupled with rbtree operation
This patch moves extent_node list operations to be handled together with
its rbtree operations.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Hou Pengyang
285b5811ff f2fs: reconstruct the code to free an extent_node
There are three steps to free an extent node:
1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free

In path f2fs_destroy_extent_tree, 1->2->3 to free a node,
But in path f2fs_update_extent_tree_range, it is 2->1->3.

This patch makes all the order to be: 1->2->3
It makes sense, since in the next patch, we import a victim list in the
path shrink_extent_tree, we could check if the extent_node is in the victim
list by checking the list_empty(). So it is necessary to put 1) first.

Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
bfe1d80660 f2fs: use wq_has_sleeper for cp_wait wait_queue
We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Fan Li
81e6758a09 f2fs: avoid unnecessary search while finding victim in gc
variable nsearched in get_victim_by_default() indicates the number of
dirty segments we already checked. There are 2 problems about the way
it updates:
1. When p.ofs_unit is greater than 1, the victim we find consists
   of multiple segments, possibly more than 1 dirty segment.
   But nsearched always increases by 1.
2. If segments have been found but not been chosen, nsearched won't
   increase. So even we have checked all dirty segments, nsearched
   may still less than p.max_search.
All these problems could cause unnecessary search after all dirty
segments have already been checked.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Yunlei He
5c863d9060 f2fs: delete unnecessary wait for page writeback
no need to wait inline file page writeback for no one
use it, so this patch delete unnecessary wait.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
fa43b1083f f2fs: use wait_for_stable_page to avoid contention
In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Chao Yu
c8d6e41c9a f2fs: enhance foreground GC
If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:

	for each segment in victim section
		blk_start_plug
		for each valid block in segment
			write out by OPU method
		submit bio cache   <---
		blk_finish_plug   <---

There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.

So refactor the code as below structure to strive for biggest
opportunity of merging IOs:

	blk_start_plug
	for each segment in victim section
		for each valid block in segment
			write out by OPU method
	submit bio cache
	blk_finish_plug

Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl

Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times

After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
1924b1d13d f2fs: don't need to call set_page_dirty for io error
If end_io gets an error, we don't need to set the page as dirty, since we
already set f2fs_stop_checkpoint which will not flush any data.

This will resolve the following warning.

======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
4.4.0+ #9 Tainted: G           O
------------------------------------------------------
xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs]

and this task is already holding:
 (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490
which would create a new lock dependency:
 (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...}

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
e038208489 f2fs: avoid needless sync_inode_page when reading inline_data
In write_begin, if there is an inline_data, f2fs loads it into 0'th data page.
Since it's the read path, we don't need to sync its inode page.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
87abb094a6 f2fs: don't need to sync node page at every time
In write_end, we don't need to sync inode page at every time.
Instead, we can expect f2fs_write_inode will update later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
c060dfefbb f2fs: avoid multiple node page writes due to inline_data
The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

So, this patch tries to flush inline_data when flushing node blocks.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
6130193402 f2fs: do f2fs_balance_fs when block is allocated
We should consider data block allocation to trigger f2fs_balance_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
54d7b9f302 f2fs: fix to overcome inline_data floods
The scenario is:
1. create lots of node blocks
2. sync
3. write lots of inline_data
-> got panic due to no free space

In that case, we should flush node blocks when writing inline_data in #3,
and trigger gc as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:28 +08:00
Jaegeuk Kim
1b914a30c4 f2fs: use writepages->lock for WB_SYNC_ALL
If there are many writepages calls by multiple threads in background, we don't
need to serialize to merge all the bios, since it's background.
In such the case, it'd better to run writepages concurrently.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:27 +08:00
Jaegeuk Kim
bfa33c6b5a f2fs: remove needless condition check
This patch removes needless condition variable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:27 +08:00
Chao Yu
58d92819d5 f2fs: correct search area in get_new_segment
get_new_segment starts from current segment position, tries to search a
free segment among its right neighbors locate in same section.

But previously our search area was set as [current segment, max segment],
which means we have to search to more bits in free_segmap bitmap for some
worse cases. So here we correct the search area to [current segment, last
segment in section] to avoid unnecessary searching.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:27 +08:00
Chao Yu
619df17d64 f2fs: export dirty_nats_ratio in sysfs
This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
of dirty nat entries, if current ratio exceeds configured threshold,
checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	Documentation/ABI/testing/sysfs-fs-f2fs
2016-10-29 23:12:27 +08:00
Chao Yu
496bb6b28a f2fs: flush dirty nat entries when exceeding threshold
When testing f2fs with xfstest, generic/251 is stuck for long time,
the case uses below serials to obtain fresh released space in device,
in order to prepare for following fstrim test.

1. rm -rf /mnt/dir
2. mkdir /mnt/dir/
3. cp -axT `pwd`/ /mnt/dir/
4. goto 1

During preparing step, all nat entries will be cached in nat cache,
most of them are dirty entries with invalid blkaddr, which means
nodes related to these entries have been truncated, and they could
be reused after the dirty entries been checkpointed.

However, there was no checkpoint been triggered, so nid allocators
(e.g. mkdir, creat) will run into long journey of iterating all NAT
pages, looking for free nids in alloc_nid->build_free_nids.

Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
to flush nat entries for reusing them in free nid cache when dirty
entry count exceeds 10% of max count.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:27 +08:00
Chao Yu
5d38e01ab0 f2fs: relocate is_merged_page
Operations in is_merged_page is related to inner bio cache, move it to
data.c.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/segment.c
2016-10-29 23:12:27 +08:00
Jaegeuk Kim
db748062ad f2fs: should unset atomic flag after successful commit
If there is an error during commit, we should keep the flag in order to
abort it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
7a2dc25c86 f2fs: fix wrong memory condition check
This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/node.c
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
85abb2f17e f2fs: monitor the number of background checkpoint
This patch adds to show the number of background checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
389d2eece6 f2fs: detect idle time depending on user behavior
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	Documentation/ABI/testing/sysfs-fs-f2fs
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
b074a0de9c f2fs: introduce time and interval facility
This patch adds time and interval arrays to store some timing variables.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Chao Yu
f1cbf37ac2 f2fs: skip releasing nodes in chindless extent tree
If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Chao Yu
157a5e1f6a f2fs: use atomic type for node count in extent tree
1. rename field in struct extent_tree from count to node_cnt for
   readability.
2. alter to use atomic type for node_cnt.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Chao Yu
4f11a62b68 f2fs: recognize encrypted data in f2fs_fiemap
This patch fixes to teach f2fs_fiemap to recognize encrypted data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
baac2dde6b f2fs: clean up f2fs_balance_fs
This patch adds one parameter to clean up all the callers of f2fs_balance_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/namei.c
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
b5199c6522 f2fs: remove redundant calls
This patch removes redundant calls.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
d5dd3ee188 f2fs: avoid unnecessary f2fs_balance_fs calls
Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:25 +08:00
Jaegeuk Kim
0788a58411 f2fs: check the page status filled from disk
After reading a page, we need to check whether there is any error.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Chao Yu
c25b54f508 f2fs: introduce __get_node_page to reuse common code
There are duplicated code in between get_node_page and get_node_page_ra,
introduce __get_node_page to includes common parts of these two, and
export get_node_page and get_node_page_ra by reusing __get_node_page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/node.c
2016-10-29 23:12:24 +08:00
Chao Yu
01fbe1c8b3 f2fs: check node id earily when readaheading node page
Add node id check in ra_node_page and get_node_page_ra like get_node_page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Fan Li
3cb1aba93a f2fs: read isize while holding i_mutex in fiemap
make sure the isize we read doesn't change during the process.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
83ef8a9fcd Revert "f2fs: check the node block address of newly allocated nid"
Original issue is fixed by:

  f2fs: cover more area with nat_tree_lock

This reverts commit 24928634f81b1592e83b37dcd89ed45c28f12feb.
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
e694adb9e1 f2fs: cover more area with nat_tree_lock
There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Chao Yu
ed22a357c2 f2fs: introduce max_file_blocks in sbi
Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Chao Yu
1b4b58672a f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink
Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
to avoid unneeded registry of ->{get,set,remove}xattr.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
b167c6ac4c f2fs: introduce zombie list for fast shrinking extent trees
This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
2dfff0e691 f2fs: monitor zombie_tree count
This patch adds an entry to show the number of zombie extent_tree.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
ad7ad27045 f2fs: use IPU for fdatasync
This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
a4aa034e8f f2fs: write pending bios when cp_error is set
When testing ioc_shutdown, put_super is able to be hanged by waiting for
writebacking pages as follows.

INFO: task umount:2723 blocked for more than 120 seconds.
      Tainted: G           O    4.4.0-rc3+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount          D ffff88000859f9d8     0  2723   2110 0x00000000
 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
 ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
Call Trace:
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8182310c>] schedule+0x3c/0x90
 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
 [<ffffffff8111617c>] ? ktime_get+0xac/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
 [<ffffffff81823a25>] bit_wait_io+0x35/0x50
 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
 [<ffffffff812639bc>] evict+0xbc/0x190
 [<ffffffff81263d19>] iput+0x229/0x2c0
 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
 [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
 [<ffffffff812478f7>] kill_block_super+0x27/0x70
 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
 [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60
 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
 [<ffffffff810ac463>] task_work_run+0x73/0xa0
 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
e1496a4648 f2fs: remove f2fs_bug_on in terms of max_depth
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:24 +08:00
Jaegeuk Kim
30a9582a1d f2fs: fix f2fs_ioc_abort_volatile_write
There are two rules to handle aborting volatile or atomic writes.

1. drop atomic writes
 - we don't need to keep any stale db data.

2. write journal data
 - we should keep the journal data with fsync for db recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Chao Yu
899b32588e f2fs: fix to skip recovering dot dentries in a readonly fs
If filesystem is readonly, leave user message info instead of recovering
inline dot inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
3b07fd693d f2fs: load largest extent all the time
Otherwise, we can get mismatched largest extent information.

One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
a15f311013 f2fs: use i_size_read to get i_size
We need to use i_size_read() to get inode->i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
0978d2a646 f2fs: early check broken symlink length in the encrypted case
If link is broken, its len is zero, and we don't need to move forward.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Chao Yu
71568f2f05 f2fs: clean up f2fs_ioc_write_checkpoint
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Yunlei He
e40bd22303 f2fs: add a max block check for get_data_block_bmap
This patch adds a max block check for get_data_block_bmap.

Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Fan Li
e465ea8aa2 f2fs: fix bugs and simplify codes of f2fs_fiemap
fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
   fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
   that ends at isize, it will also return the extent beyond isize,
   which is outside the range.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Chao Yu
562f92afb9 f2fs: let user being aware of IO error
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/data.c
2016-10-29 23:12:23 +08:00
Chao Yu
ce1ff1416d f2fs: add missing f2fs_balance_fs in __recover_dot_dentries
__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
73efe696d4 f2fs: declare static function
The __f2fs_commit_super is static.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
255a8c61fb f2fs: avoid f2fs_lock_op in f2fs_write_begin
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
15bc27f9ad f2fs: return early when trying to read null nid
If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:23 +08:00
Jaegeuk Kim
195e13c651 f2fs: introduce prepare_write_begin to clean up
This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Chao Yu
2a3eb6b7b7 f2fs: don't convert inline inode when inline_data option is disable
If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Chao Yu
8f9d57f61f f2fs: report error of do_checkpoint
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Jaegeuk Kim
0eb36bcc8f f2fs: call f2fs_balance_fs only when node was changed
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/file.c
2016-10-29 23:12:22 +08:00
Chao Yu
2feba63713 f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocks
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Jaegeuk Kim
00760e5ad2 f2fs: record node block allocation in dnode_of_data
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Jaegeuk Kim
dc4c218974 f2fs: avoid unnecessary f2fs_gc for dir operations
The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/namei.c
2016-10-29 23:12:22 +08:00
Jaegeuk Kim
43f4844aeb f2fs: check inline_data flag at converting time
We can check inode's inline_data flag  when calling to convert it.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Jaegeuk Kim
b35262640b f2fs: speed up shrinking extent tree entries
If there is no candidates for shrinking slab entries, we don't need to traverse
any trees at all.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix missing initialization reported by Yunlei He]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Jaegeuk Kim
94deb8045a f2fs: use atomic variable for total_extent_tree
It would be better to use atomic variable for total_extent_tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Chao Yu
1eea8428c2 f2fs: add a tracepoint for sync_dirty_inodes
This patch adds a tracepoint for sync_dirty_inodes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Fan Li
a763c471c0 f2fs: optimize the flow of f2fs_map_blocks
check map->m_len right after it changes to avoid excess call
to update dnode_of_data.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Chao Yu
4d0a6889ce f2fs: support data flush in background
Previously, when finishing a checkpoint, we have persisted all fs meta
info including meta inode, node inode, dentry page of directory inode, so,
after a sudden power cut, f2fs can recover from last checkpoint with full
directory structure.

But during checkpoint, we didn't flush dirty pages of regular and symlink
inode, so such dirty datas still in memory will be lost in that moment of
power off.

In order to reduce the chance of lost data, this patch enables
f2fs_balance_fs_bg with the ability of data flushing. It will try to flush
user data before starting a checkpoint. So user's data written after last
checkpoint which may not be fsynced could be saved.

When we mount with data_flush option, after every period of cp_interval
(could be configured in sysfs: /sys/fs/f2fs/device/cp_interval) seconds
user data could be flushed into device once f2fs_balance_fs_bg was called
in kworker thread or gc thread.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Chao Yu
4fcd75606e f2fs: stat dirty regular/symlink inodes
Add to stat dirty regular and symlink inode for showing in debugfs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:22 +08:00
Chao Yu
6c7d110359 f2fs: introduce new option for controlling data flush
Add a new option 'data_flush' to enable data flush functionality.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	Documentation/filesystems/f2fs.txt
2016-10-29 23:12:21 +08:00
Chao Yu
5f73323b85 f2fs: record dirty status of regular/symlink inode
Maintain regular/symlink inode which has dirty pages in global dirty list
and record their total dirty pages count like the way of handling directory
inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Chao Yu
2fff879350 f2fs: introduce __f2fs_commit_super
Introduce __f2fs_commit_super to include duplicated codes in
f2fs_commit_super for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Jaegeuk Kim
870a77a3f3 f2fs: relocate tracepoint of write_checkpoint
It needs to relocate its location to see exact trace logs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Chao Yu
55ead4163c f2fs: don't grab super block buffer header all the time
We have already got one copy of valid super block in memory, do not grab
buffer header of super block all the time.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Yunlei He
ba9a1899aa f2fs: backup raw_super in sbi
f2fs use fields of f2fs_super_block struct directly in a grabbed buffer.

Once the buffer happen to be destroyed (e.g. through dd), it may bring
in unpredictable effect on f2fs.

This patch fixes to allocate additional buffer to store datas of super
block rather than using grabbed block buffer directly.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Fan Li
c282a79134 f2fs: fix to reset variable correctlly
f2fs_map_blocks will set m_flags and m_len to 0, so we don't need to
reset m_flags ourselves, but have to reset m_len to correct value
before use it again.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Chao Yu
7a5dfa9285 f2fs: introduce __remove_dirty_inode
Introduce __remove_dirty_inode to clean up codes in remove_dirty_dir_inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Chao Yu
807a2aa41d f2fs: introduce dirty list node in inode info
Add a new dirt list node member in inode info for linking the inode to
global dirty list in superblock, instead of old implementation which
allocate slab cache memory as an entry to inode.

It avoids memory pressure due to slab cache allocation, and also makes
codes more clean.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Chao Yu
5cbea47f54 f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry
remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic
function in following patch for removing directory/regular/symlink inode
in global dirty list.

Here rename ino management related functions for readability, also in
order to avoid name conflict.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Chao Yu
e116a07e4e f2fs: do more integrity verification for superblock
Do more sanity check for superblock during ->mount.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Fan Li
bc4e9c22df f2fs: fix to update variable correctly when skip a unmapped block
map.m_len should be reduced after skip a block

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:21 +08:00
Fan Li
db88dd99f0 f2fs: write only the pages in range during defragment
@lend of filemap_write_and_wait_range is supposed to be a "offset
in bytes where the range ends (inclusive)". Subtract 1 to avoid
writing an extra page.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00
Chao Yu
732a06e287 f2fs: clean up node page updating flow
If read_node_page return LOCKED_PAGE, in its caller it's better a) skip
unneeded 'Update' flag and mapping info verfication; b) check nid value
stored in footer structure of node page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

 Conflicts:
	fs/f2fs/node.c
2016-10-29 23:12:20 +08:00
Jaegeuk Kim
ae654a8d46 f2fs: use lock_buffer when changing superblock
When modifying sb contents, we need to use lock its buffer.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00
Jaegeuk Kim
672830367a f2fs: refactor f2fs_commit_super
Previously, f2fs_commit_super hacks the bh->blocknr to write the broken
alternate superblock.
Instead of it, we should use the correct logic to retrieve its buffer head
with locking it appropriately.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00
Jaegeuk Kim
e8531acc83 f2fs: enhance the bit operation for SSR
This patch enhances the existing bit operation when f2fs allocates SSR
blocks.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00
Chao Yu
652c783647 f2fs: fix to convert inline inode in ->setattr
In commit 3c4541452748 ("f2fs: do not trim preallocated blocks when
truncating after i_size"), in order to follow the regulation: "truncate(x)
where x > i_size will not trim all blocks past i_size." like other file
systems, in ->setattr we invoked truncate_setsize instead of f2fs_truncate
to avoid unneeded block trimming in such case, but forgot to call
f2fs_convert_inline_inode keep consistency of inline data conversion rule.

This patch fixes to convert inline data if necessary.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00
Chao Yu
2f87117e0d f2fs: use sbi->blocks_per_seg to avoid unnecessary calculation
Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using
1 << sbi->log_blocks_per_seg.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00
Chao Yu
bca718dfe2 f2fs: kill f2fs_drop_largest_extent
For direct IO, f2fs only allocate new address for the block which is not
exist in the disk before, its mapping info should not exist in extent
cache previously, so here we do not need to call f2fs_drop_largest_extent
to drop related cache.

Due to no more callers for f2fs_drop_largest_extent now, kill it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:20 +08:00