migration_call() will do all the things that update_runtime() does.
So let's remove it.
Furthermore, there is potential risk that the current code will catch
BUG_ON at line 689 of rt.c when do cpu hotplug while there are realtime
threads running because of enabling runtime twice while the rt_runtime
may already changed.
Change-Id: If2d953316d93c6b7e32f94bd49f2c10e64de6ed8
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1365685499-26515-1-git-send-email-zhangwm@marvell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: c5405a495e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[mattw@codeaurora.org: resolved trivial header file context conflict]
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
commit 80e3d87b2c5582db0ab5e39610ce3707d97ba409 upstream.
This patch adds checks that prevens futile attempts to move rt tasks
to a CPU with active tasks of equal or higher priority.
This reduces run queue lock contention and improves the performance of
a well known OLTP benchmark by 0.7%.
Change-Id: I82d8e3a8523a98c060b30945fdff13d7af908220
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Suruchi Kadu <suruchi.a.kadu@intel.com>
Cc: Doug Nelson<doug.nelson@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
commit 119c0d4460 changed
ext4_new_inode() such that the inode bitmap was being modified
outside a transaction, which could lead to corruption, and was
discovered when journal_checksum found a bad checksum in the
journal during log replay.
Nix ran into this when using the journal_async_commit mount
option, which enables journal checksumming. The ensuing
journal replay failures due to the bad checksums led to
filesystem corruption reported as the now infamous
"Apparent serious progressive ext4 data corruption bug"
[ Changed by tytso to only call ext4_journal_get_write_access() only
when we're fairly certain that we're going to allocate the inode. ]
I've tested this by mounting with journal_checksum and
running fsstress then dropping power; I've also tested by
hacking DM to create snapshots w/o first quiescing, which
allows me to test journal replay repeatedly w/o actually
power-cycling the box. Without the patch I hit a journal
checksum error every time. With this fix it survives
many iterations.
Change-Id: Id679b89e870c62f83476f836650ac9068908a84e
Reported-by: Nix <nix@esperi.org.uk>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
commit 484dd30e01 upstream.
This is a longstanding bug, almost unnoticeable when calling
persistent_ram_write() for small buffers.
But when called for large data buffers, the write routine behaves
incorrectly, as the size may never update: instead of clamping
the size to the maximum buffer size, buffer_size_add_clamp() returns
an error (which is never checked by the write routine, btw).
To fix this, we now use buffer_size_add() that actually clamps the
size to the max value.
Also remove buffer_size_add_clamp(), it is no longer needed.
Change-Id: Iedde6ab812153ad47aabed3dfe5060356cdeeb55
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the VM_MAX_READAHEAD value from the default 128KB
to 512KB. This will allow the readahead window to grow to a maximum size
of 512KB, which greatly benefits to sequential read throughput.
Change-Id: Ia0780ea4e2a4ae0b6111485b72fb25376dcb1f96
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/node.c
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
Documentation/ABI/testing/sysfs-fs-f2fs
If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
1. rename field in struct extent_tree from count to node_cnt for
readability.
2. alter to use atomic type for node_cnt.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes to teach f2fs_fiemap to recognize encrypted data.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds one parameter to clean up all the callers of f2fs_balance_fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/namei.c
There are duplicated code in between get_node_page and get_node_page_ra,
introduce __get_node_page to includes common parts of these two, and
export get_node_page and get_node_page_ra by reusing __get_node_page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/node.c
Add node id check in ra_node_page and get_node_page_ra like get_node_page.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
make sure the isize we read doesn't change during the process.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce max_file_blocks in sbi to store max block index of file in f2fs,
it could be used to avoid unneeded calculation of max block index in
runtime.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: fix overflow of sbi->max_file_blocks]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
to avoid unneeded registry of ->{get,set,remove}xattr.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch removes refcount, and instead, adds zombie_list to shrink directly
without radix tree traverse.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes missing IPU condition when fdatasync is called.
With this patch, fdatasync is able to avoid additional node writes for recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There is no report on this bug_on case, but if malicious attacker changed this
field intentionally, we can just reset it as a MAX value.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are two rules to handle aborting volatile or atomic writes.
1. drop atomic writes
- we don't need to keep any stale db data.
2. write journal data
- we should keep the journal data with fsync for db recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If filesystem is readonly, leave user message info instead of recovering
inline dot inode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Otherwise, we can get mismatched largest extent information.
One example is:
1. mount f2fs w/ extent_cache
2. make a small extent
3. umount
4. mount f2fs w/o extent_cache
5. update the largest extent
6. umount
7. mount f2fs w/ extent_cache
8. get the old extent made by #2
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove unused err variable]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a max block check for get_data_block_bmap.
Trinity test program will send a block number as parameter into
ioctl_fibmap, which will be used in get_node_path(), when the block
number large than f2fs max blocks, it will trigger kernel bug.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com>
[Jaegeuk Kim: fix missing condition, pointed by Chao Yu]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
fix bugs:
1. len could be updated incorrectly when start+len is beyond isize.
2. If there is a hole consisting of more than two blocks, it could
fail to add FIEMAP_EXTENT_LAST flag for the last extent.
3. If there is an extent beyond isize, when we search extents in a range
that ends at isize, it will also return the extent beyond isize,
which is outside the range.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.
This sould be avoided, so this patch reports such kind of error to user.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/data.c
__recover_do_dentries will try to grab free space in storage, so fix to
add missing f2fs_balance_fs here.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in
order to avoid the checkpoint latency in the write syscall.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds prepare_write_begin to clean f2fs_write_begin.
The major role of this function is to convert any inline_data and allocate
or find block address.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If inline_data option is disable, when truncating an inline inode with
size which is not exceed maxinum inline size, we should not convert
inline inode to regular one to avoid the overhead of synchronizing
conversion.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.
So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If user tries to update or read data, we don't need to call f2fs_balance_fs
which triggers f2fs_gc, which increases unnecessary long latency.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/file.c
Only cover sbi->cp_rwsem on one dnode page's allocation and modification
instead of multiple's in f2fs_map_blocks, it can reduce the covered region
of cp_rwsem, then we can avoid potential long time delay for concurrent
checkpointer.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The f2fs_balance_fs doesn't need to cover f2fs_new_inode or f2fs_find_entry
works.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/namei.c
We can check inode's inline_data flag when calling to convert it.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>