Introduce a new file system, Flash-Friendly File System (F2FS), to Linux 3.8.

Highlights:
 - Add initial f2fs source codes
 - Fix an endian conversion bug
 - Fix build failures on random configs
 - Fix the power-off-recovery routine
 - Minor cleanup, coding style, and typos patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxuJcAAoJEEAUqH6CSFDSq80QAI3i7NgUkx4h225MnbJdEKRb
 YX1MfSPmgE0q/15XS2qQu/s9NGJmXLV1IR9EtRSBlCQjwWhbx9Q9URktGkWslFnx
 6mBLy8EvVKDMVdwoUS8ZY6IjfKbmSnoIHTZrGaT9+9d7k8nlOQLaj3qQF4wBuw1+
 +qhJQV642v8qw7JiVVFgxcBSLpAS9cbdOA0vxfWncMwmRLaEO45W5+rob8ZN8WaS
 BUiYIiue8vlB0VDIYfpl/sSPJC/Bn1XsLKZoS2WJl8CKioE1ptLjT3acUBbabUxp
 hNLl8Ae0PylDMFpH8hrBXhleznrVqEMOTos/Z80/UyBny2sCxJFnaQ60TayUo2l2
 hYk5Wbyj8K7IBJEke23Fepild2PnGz22zf2v+tLxxVgPH5j7/l2XHfy9gPvDbd1P
 4ENiJUC3LE49Mi4TvEIFqhbrcJfD9C+v3bxpWGe8CevrpYZaB8tv/6nQXJCC/Ixp
 tMWqLKlHyXGmk5DZpiSFaj0/GbTPT0UGqZVRzzSXQpKqxJU6eTnXDa6aLUEYH8fH
 grOCriaJrd8SgL3l7RokQSQEyRHuNjMm1tlUQWOObE+y0nJjWb9Amwn1yUtJuNzx
 Np4nnlMhxwJ48P3LeeheSCuOUbxJtOzOR8MVXm7deYiGQbYaqB1/+9TbjOZBSX4O
 fpbCXrmqe1pUBukftZsL
 =iMoX
 -----END PGP SIGNATURE-----

Merge tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull new F2FS filesystem from Jaegeuk Kim:
 "Introduce a new file system, Flash-Friendly File System (F2FS), to
  Linux 3.8.

  Highlights:
   - Add initial f2fs source codes
   - Fix an endian conversion bug
   - Fix build failures on random configs
   - Fix the power-off-recovery routine
   - Minor cleanup, coding style, and typos patches"

From the Kconfig help text:

  F2FS is based on Log-structured File System (LFS), which supports
  versatile "flash-friendly" features. The design has been focused on
  addressing the fundamental issues in LFS, which are snowball effect
  of wandering tree and high cleaning overhead.

  Since flash-based storages show different characteristics according to
  the internal geometry or flash memory management schemes aka FTL, F2FS
  and tools support various parameters not only for configuring on-disk
  layout, but also for selecting allocation and cleaning algorithms.

and there's an article by Neil Brown about it on lwn.net:

  http://lwn.net/Articles/518988/

* tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
  f2fs: fix tracking parent inode number
  f2fs: cleanup the f2fs_bio_alloc routine
  f2fs: introduce accessor to retrieve number of dentry slots
  f2fs: remove redundant call to f2fs_put_page in delete entry
  f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
  f2fs: rewrite f2fs_bio_alloc to make it simpler
  f2fs: fix a typo in f2fs documentation
  f2fs: remove unused variable
  f2fs: move error condition for mkdir at proper place
  f2fs: remove unneeded initialization
  f2fs: check read only condition before beginning write out
  f2fs: remove unneeded memset from init_once
  f2fs: show error in case of invalid mount arguments
  f2fs: fix the compiler warning for uninitialized use of variable
  f2fs: resolve build failures
  f2fs: adjust kernel coding style
  f2fs: fix endian conversion bugs reported by sparse
  f2fs: remove unneeded version.h header file from f2fs.h
  f2fs: update the f2fs document
  f2fs: update Kconfig and Makefile
  ...
This commit is contained in:
Linus Torvalds 2012-12-20 13:54:51 -08:00
commit a13eea6bd9
29 changed files with 13488 additions and 0 deletions

View file

@ -50,6 +50,8 @@ ext4.txt
- info, mount options and specifications for the Ext4 filesystem.
files.txt
- info on file management in the Linux kernel.
f2fs.txt
- info and mount options for the F2FS filesystem.
fuse.txt
- info on the Filesystem in User SpacE including mount options.
gfs2.txt

View file

@ -0,0 +1,421 @@
================================================================================
WHAT IS Flash-Friendly File System (F2FS)?
================================================================================
NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
been equipped on a variety systems ranging from mobile to server systems. Since
they are known to have different characteristics from the conventional rotating
disks, a file system, an upper layer to the storage device, should adapt to the
changes from the sketch in the design level.
F2FS is a file system exploiting NAND flash memory-based storage devices, which
is based on Log-structured File System (LFS). The design has been focused on
addressing the fundamental issues in LFS, which are snowball effect of wandering
tree and high cleaning overhead.
Since a NAND flash memory-based storage device shows different characteristic
according to its internal geometry or flash memory management scheme, namely FTL,
F2FS and its tools support various parameters not only for configuring on-disk
layout, but also for selecting allocation and cleaning algorithms.
The file system formatting tool, "mkfs.f2fs", is available from the following
git tree:
>> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
For reporting bugs and sending patches, please use the following mailing list:
>> linux-f2fs-devel@lists.sourceforge.net
================================================================================
BACKGROUND AND DESIGN ISSUES
================================================================================
Log-structured File System (LFS)
--------------------------------
"A log-structured file system writes all modifications to disk sequentially in
a log-like structure, thereby speeding up both file writing and crash recovery.
The log is the only structure on disk; it contains indexing information so that
files can be read back from the log efficiently. In order to maintain large free
areas on disk for fast writing, we divide the log into segments and use a
segment cleaner to compress the live information from heavily fragmented
segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
implementation of a log-structured file system", ACM Trans. Computer Systems
10, 1, 2652.
Wandering Tree Problem
----------------------
In LFS, when a file data is updated and written to the end of log, its direct
pointer block is updated due to the changed location. Then the indirect pointer
block is also updated due to the direct pointer block update. In this manner,
the upper index structures such as inode, inode map, and checkpoint block are
also updated recursively. This problem is called as wandering tree problem [1],
and in order to enhance the performance, it should eliminate or relax the update
propagation as much as possible.
[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
Cleaning Overhead
-----------------
Since LFS is based on out-of-place writes, it produces so many obsolete blocks
scattered across the whole storage. In order to serve new empty log space, it
needs to reclaim these obsolete blocks seamlessly to users. This job is called
as a cleaning process.
The process consists of three operations as follows.
1. A victim segment is selected through referencing segment usage table.
2. It loads parent index structures of all the data in the victim identified by
segment summary blocks.
3. It checks the cross-reference between the data and its parent index structure.
4. It moves valid data selectively.
This cleaning job may cause unexpected long delays, so the most important goal
is to hide the latencies to users. And also definitely, it should reduce the
amount of valid data to be moved, and move them quickly as well.
================================================================================
KEY FEATURES
================================================================================
Flash Awareness
---------------
- Enlarge the random write area for better performance, but provide the high
spatial locality
- Align FS data structures to the operational units in FTL as best efforts
Wandering Tree Problem
----------------------
- Use a term, “node”, that represents inodes as well as various pointer blocks
- Introduce Node Address Table (NAT) containing the locations of all the “node”
blocks; this will cut off the update propagation.
Cleaning Overhead
-----------------
- Support a background cleaning process
- Support greedy and cost-benefit algorithms for victim selection policies
- Support multi-head logs for static/dynamic hot and cold data separation
- Introduce adaptive logging for efficient block allocation
================================================================================
MOUNT OPTIONS
================================================================================
background_gc_off Turn off cleaning operations, namely garbage collection,
triggered in background when I/O subsystem is idle.
disable_roll_forward Disable the roll-forward recovery routine
discard Issue discard/TRIM commands when a segment is cleaned.
no_heap Disable heap-style segment allocation which finds free
segments for data from the beginning of main area, while
for node from the end of main area.
nouser_xattr Disable Extended User Attributes. Note: xattr is enabled
by default if CONFIG_F2FS_FS_XATTR is selected.
noacl Disable POSIX Access Control List. Note: acl is enabled
by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
active_logs=%u Support configuring the number of active logs. In the
current design, f2fs supports only 2, 4, and 6 logs.
Default number is 6.
disable_ext_identify Disable the extension list configured by mkfs, so f2fs
does not aware of cold files such as media files.
================================================================================
DEBUGFS ENTRIES
================================================================================
/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
f2fs. Each file shows the whole f2fs information.
/sys/kernel/debug/f2fs/status includes:
- major file system information managed by f2fs currently
- average SIT information about whole segments
- current memory footprint consumed by f2fs.
================================================================================
USAGE
================================================================================
1. Download userland tools and compile them.
2. Skip, if f2fs was compiled statically inside kernel.
Otherwise, insert the f2fs.ko module.
# insmod f2fs.ko
3. Create a directory trying to mount
# mkdir /mnt/f2fs
4. Format the block device, and then mount as f2fs
# mkfs.f2fs -l label /dev/block_device
# mount -t f2fs /dev/block_device /mnt/f2fs
Format options
--------------
-l [label] : Give a volume label, up to 256 unicode name.
-a [0 or 1] : Split start location of each area for heap-based allocation.
1 is set by default, which performs this.
-o [int] : Set overprovision ratio in percent over volume size.
5 is set by default.
-s [int] : Set the number of segments per section.
1 is set by default.
-z [int] : Set the number of sections per zone.
1 is set by default.
-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
================================================================================
DESIGN
================================================================================
On-disk Layout
--------------
F2FS divides the whole volume into a number of segments, each of which is fixed
to 2MB in size. A section is composed of consecutive segments, and a zone
consists of a set of sections. By default, section and zone sizes are set to one
segment size identically, but users can easily modify the sizes by mkfs.
F2FS splits the entire volume into six areas, and all the areas except superblock
consists of multiple segments as described below.
align with the zone size <-|
|-> align with the segment size
_________________________________________________________________________
| | | Node | Segment | Segment | |
| Superblock | Checkpoint | Address | Info. | Summary | Main |
| (SB) | (CP) | Table (NAT) | Table (SIT) | Area (SSA) | |
|____________|_____2______|______N______|______N______|______N_____|__N___|
. .
. .
. .
._________________________________________.
|_Segment_|_..._|_Segment_|_..._|_Segment_|
. .
._________._________
|_section_|__...__|_
. .
.________.
|__zone__|
- Superblock (SB)
: It is located at the beginning of the partition, and there exist two copies
to avoid file system crash. It contains basic partition information and some
default parameters of f2fs.
- Checkpoint (CP)
: It contains file system information, bitmaps for valid NAT/SIT sets, orphan
inode lists, and summary entries of current active segments.
- Node Address Table (NAT)
: It is composed of a block address table for all the node blocks stored in
Main area.
- Segment Information Table (SIT)
: It contains segment information such as valid block count and bitmap for the
validity of all the blocks.
- Segment Summary Area (SSA)
: It contains summary entries which contains the owner information of all the
data and node blocks stored in Main area.
- Main Area
: It contains file and directory data including their indices.
In order to avoid misalignment between file system and flash-based storage, F2FS
aligns the start block address of CP with the segment size. Also, it aligns the
start block address of Main area with the zone size by reserving some segments
in SSA area.
Reference the following survey for additional technical details.
https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
File System Metadata Structure
------------------------------
F2FS adopts the checkpointing scheme to maintain file system consistency. At
mount time, F2FS first tries to find the last valid checkpoint data by scanning
CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
One of them always indicates the last valid data, which is called as shadow copy
mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
For file system consistency, each CP points to which NAT and SIT copies are
valid, as shown as below.
+--------+----------+---------+
| CP | NAT | SIT |
+--------+----------+---------+
. . . .
. . . .
. . . .
+-------+-------+--------+--------+--------+--------+
| CP #0 | CP #1 | NAT #0 | NAT #1 | SIT #0 | SIT #1 |
+-------+-------+--------+--------+--------+--------+
| ^ ^
| | |
`----------------------------------------'
Index Structure
---------------
The key data structure to manage the data locations is a "node". Similar to
traditional file structures, F2FS has three types of node: inode, direct node,
indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
indices, two direct node pointers, two indirect node pointers, and one double
indirect node pointer as described below. One direct node block contains 1018
data blocks, and one indirect node block contains also 1018 node blocks. Thus,
one inode block (i.e., a file) covers:
4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
Inode block (4KB)
|- data (923)
|- direct node (2)
| `- data (1018)
|- indirect node (2)
| `- direct node (1018)
| `- data (1018)
`- double indirect node (1)
`- indirect node (1018)
`- direct node (1018)
`- data (1018)
Note that, all the node blocks are mapped by NAT which means the location of
each node is translated by the NAT table. In the consideration of the wandering
tree problem, F2FS is able to cut off the propagation of node updates caused by
leaf data writes.
Directory Structure
-------------------
A directory entry occupies 11 bytes, which consists of the following attributes.
- hash hash value of the file name
- ino inode number
- len the length of file name
- type file type such as directory, symlink, etc
A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
used to represent whether each dentry is valid or not. A dentry block occupies
4KB with the following composition.
Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
dentries(11 * 214 bytes) + file name (8 * 214 bytes)
[Bucket]
+--------------------------------+
|dentry block 1 | dentry block 2 |
+--------------------------------+
. .
. .
. [Dentry Block Structure: 4KB] .
+--------+----------+----------+------------+
| bitmap | reserved | dentries | file names |
+--------+----------+----------+------------+
[Dentry Block: 4KB] . .
. .
. .
+------+------+-----+------+
| hash | ino | len | type |
+------+------+-----+------+
[Dentry Structure: 11 bytes]
F2FS implements multi-level hash tables for directory structure. Each level has
a hash table with dedicated number of hash buckets as shown below. Note that
"A(2B)" means a bucket includes 2 data blocks.
----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------
level #0 | A(2B)
|
level #1 | A(2B) - A(2B)
|
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
. | . . . .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
The number of blocks and buckets are determined by,
,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
# of blocks in level #n = |
`- 4, Otherwise
,- 2^n, if n < MAX_DIR_HASH_DEPTH / 2,
# of buckets in level #n = |
`- 2^((MAX_DIR_HASH_DEPTH / 2) - 1), Otherwise
When F2FS finds a file name in a directory, at first a hash value of the file
name is calculated. Then, F2FS scans the hash table in level #0 to find the
dentry consisting of the file name and its inode number. If not found, F2FS
scans the next hash table in level #1. In this way, F2FS scans hash tables in
each levels incrementally from 1 to N. In each levels F2FS needs to scan only
one bucket determined by the following equation, which shows O(log(# of files))
complexity.
bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
In the case of file creation, F2FS finds empty consecutive slots that cover the
file name. F2FS searches the empty slots in the hash tables of whole levels from
1 to N in the same way as the lookup operation.
The following figure shows an example of two cases holding children.
--------------> Dir <--------------
| |
child child
child - child [hole] - child
child - child - child [hole] - [hole] - child
Case 1: Case 2:
Number of children = 6, Number of children = 3,
File size = 7 File size = 7
Default Block Allocation
------------------------
At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
and Hot/Warm/Cold data.
- Hot node contains direct node blocks of directories.
- Warm node contains direct node blocks except hot node blocks.
- Cold node contains indirect node blocks
- Hot data contains dentry blocks
- Warm data contains data blocks except hot and cold data blocks
- Cold data contains multimedia data or migrated data blocks
LFS has two schemes for free space management: threaded log and copy-and-compac-
tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
for devices showing very good sequential write performance, since free segments
are served all the time for writing new data. However, it suffers from cleaning
overhead under high utilization. Contrarily, the threaded log scheme suffers
from random writes, but no cleaning process is needed. F2FS adopts a hybrid
scheme where the copy-and-compaction scheme is adopted by default, but the
policy is dynamically changed to the threaded log scheme according to the file
system status.
In order to align F2FS with underlying flash-based storage, F2FS allocates a
segment in a unit of section. F2FS expects that the section size would be the
same as the unit size of garbage collection in FTL. Furthermore, with respect
to the mapping granularity in FTL, F2FS allocates each section of the active
logs from different zones as much as possible, since FTL can write the data in
the active logs into one allocation unit according to its mapping granularity.
Cleaning process
----------------
F2FS does cleaning both on demand and in the background. On-demand cleaning is
triggered when there are not enough free segments to serve VFS calls. Background
cleaner is operated by a kernel thread, and triggers the cleaning job when the
system is idle.
F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
In the greedy algorithm, F2FS selects a victim segment having the smallest number
of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
according to the segment age and the number of valid blocks in order to address
log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
algorithm.
In order to identify whether the data in the victim segment are valid or not,
F2FS manages a bitmap. Each bit represents the validity of a block, and the
bitmap is composed of a bit stream covering whole blocks in main area.

View file

@ -220,6 +220,7 @@ source "fs/pstore/Kconfig"
source "fs/sysv/Kconfig"
source "fs/ufs/Kconfig"
source "fs/exofs/Kconfig"
source "fs/f2fs/Kconfig"
endif # MISC_FILESYSTEMS

View file

@ -123,6 +123,7 @@ obj-$(CONFIG_DEBUG_FS) += debugfs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_BTRFS_FS) += btrfs/
obj-$(CONFIG_GFS2_FS) += gfs2/
obj-$(CONFIG_F2FS_FS) += f2fs/
obj-y += exofs/ # Multiple modules
obj-$(CONFIG_CEPH_FS) += ceph/
obj-$(CONFIG_PSTORE) += pstore/

53
fs/f2fs/Kconfig Normal file
View file

@ -0,0 +1,53 @@
config F2FS_FS
tristate "F2FS filesystem support (EXPERIMENTAL)"
depends on BLOCK
help
F2FS is based on Log-structured File System (LFS), which supports
versatile "flash-friendly" features. The design has been focused on
addressing the fundamental issues in LFS, which are snowball effect
of wandering tree and high cleaning overhead.
Since flash-based storages show different characteristics according to
the internal geometry or flash memory management schemes aka FTL, F2FS
and tools support various parameters not only for configuring on-disk
layout, but also for selecting allocation and cleaning algorithms.
If unsure, say N.
config F2FS_STAT_FS
bool "F2FS Status Information"
depends on F2FS_FS && DEBUG_FS
default y
help
/sys/kernel/debug/f2fs/ contains information about all the partitions
mounted as f2fs. Each file shows the whole f2fs information.
/sys/kernel/debug/f2fs/status includes:
- major file system information managed by f2fs currently
- average SIT information about whole segments
- current memory footprint consumed by f2fs.
config F2FS_FS_XATTR
bool "F2FS extended attributes"
depends on F2FS_FS
default y
help
Extended attributes are name:value pairs associated with inodes by
the kernel or by users (see the attr(5) manual page, or visit
<http://acl.bestbits.at/> for details).
If unsure, say N.
config F2FS_FS_POSIX_ACL
bool "F2FS Access Control Lists"
depends on F2FS_FS_XATTR
select FS_POSIX_ACL
default y
help
Posix Access Control Lists (ACLs) support permissions for users and
gourps beyond the owner/group/world scheme.
To learn more about Access Control Lists, visit the POSIX ACLs for
Linux website <http://acl.bestbits.at/>.
If you don't know what Access Control Lists are, say N

7
fs/f2fs/Makefile Normal file
View file

@ -0,0 +1,7 @@
obj-$(CONFIG_F2FS_FS) += f2fs.o
f2fs-y := dir.o file.o inode.o namei.o hash.o super.o
f2fs-y += checkpoint.o gc.o data.o node.o segment.o recovery.o
f2fs-$(CONFIG_F2FS_STAT_FS) += debug.o
f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o

414
fs/f2fs/acl.c Normal file
View file

@ -0,0 +1,414 @@
/*
* fs/f2fs/acl.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* Portions of this code from linux/fs/ext2/acl.c
*
* Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/f2fs_fs.h>
#include "f2fs.h"
#include "xattr.h"
#include "acl.h"
#define get_inode_mode(i) ((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \
(F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
static inline size_t f2fs_acl_size(int count)
{
if (count <= 4) {
return sizeof(struct f2fs_acl_header) +
count * sizeof(struct f2fs_acl_entry_short);
} else {
return sizeof(struct f2fs_acl_header) +
4 * sizeof(struct f2fs_acl_entry_short) +
(count - 4) * sizeof(struct f2fs_acl_entry);
}
}
static inline int f2fs_acl_count(size_t size)
{
ssize_t s;
size -= sizeof(struct f2fs_acl_header);
s = size - 4 * sizeof(struct f2fs_acl_entry_short);
if (s < 0) {
if (size % sizeof(struct f2fs_acl_entry_short))
return -1;
return size / sizeof(struct f2fs_acl_entry_short);
} else {
if (s % sizeof(struct f2fs_acl_entry))
return -1;
return s / sizeof(struct f2fs_acl_entry) + 4;
}
}
static struct posix_acl *f2fs_acl_from_disk(const char *value, size_t size)
{
int i, count;
struct posix_acl *acl;
struct f2fs_acl_header *hdr = (struct f2fs_acl_header *)value;
struct f2fs_acl_entry *entry = (struct f2fs_acl_entry *)(hdr + 1);
const char *end = value + size;
if (hdr->a_version != cpu_to_le32(F2FS_ACL_VERSION))
return ERR_PTR(-EINVAL);
count = f2fs_acl_count(size);
if (count < 0)
return ERR_PTR(-EINVAL);
if (count == 0)
return NULL;
acl = posix_acl_alloc(count, GFP_KERNEL);
if (!acl)
return ERR_PTR(-ENOMEM);
for (i = 0; i < count; i++) {
if ((char *)entry > end)
goto fail;
acl->a_entries[i].e_tag = le16_to_cpu(entry->e_tag);
acl->a_entries[i].e_perm = le16_to_cpu(entry->e_perm);
switch (acl->a_entries[i].e_tag) {
case ACL_USER_OBJ:
case ACL_GROUP_OBJ:
case ACL_MASK:
case ACL_OTHER:
acl->a_entries[i].e_id = ACL_UNDEFINED_ID;
entry = (struct f2fs_acl_entry *)((char *)entry +
sizeof(struct f2fs_acl_entry_short));
break;
case ACL_USER:
acl->a_entries[i].e_uid =
make_kuid(&init_user_ns,
le32_to_cpu(entry->e_id));
entry = (struct f2fs_acl_entry *)((char *)entry +
sizeof(struct f2fs_acl_entry));
break;
case ACL_GROUP:
acl->a_entries[i].e_gid =
make_kgid(&init_user_ns,
le32_to_cpu(entry->e_id));
entry = (struct f2fs_acl_entry *)((char *)entry +
sizeof(struct f2fs_acl_entry));
break;
default:
goto fail;
}
}
if ((char *)entry != end)
goto fail;
return acl;
fail:
posix_acl_release(acl);
return ERR_PTR(-EINVAL);
}
static void *f2fs_acl_to_disk(const struct posix_acl *acl, size_t *size)
{
struct f2fs_acl_header *f2fs_acl;
struct f2fs_acl_entry *entry;
int i;
f2fs_acl = kmalloc(sizeof(struct f2fs_acl_header) + acl->a_count *
sizeof(struct f2fs_acl_entry), GFP_KERNEL);
if (!f2fs_acl)
return ERR_PTR(-ENOMEM);
f2fs_acl->a_version = cpu_to_le32(F2FS_ACL_VERSION);
entry = (struct f2fs_acl_entry *)(f2fs_acl + 1);
for (i = 0; i < acl->a_count; i++) {
entry->e_tag = cpu_to_le16(acl->a_entries[i].e_tag);
entry->e_perm = cpu_to_le16(acl->a_entries[i].e_perm);
switch (acl->a_entries[i].e_tag) {
case ACL_USER:
entry->e_id = cpu_to_le32(
from_kuid(&init_user_ns,
acl->a_entries[i].e_uid));
entry = (struct f2fs_acl_entry *)((char *)entry +
sizeof(struct f2fs_acl_entry));
break;
case ACL_GROUP:
entry->e_id = cpu_to_le32(
from_kgid(&init_user_ns,
acl->a_entries[i].e_gid));
entry = (struct f2fs_acl_entry *)((char *)entry +
sizeof(struct f2fs_acl_entry));
break;
case ACL_USER_OBJ:
case ACL_GROUP_OBJ:
case ACL_MASK:
case ACL_OTHER:
entry = (struct f2fs_acl_entry *)((char *)entry +
sizeof(struct f2fs_acl_entry_short));
break;
default:
goto fail;
}
}
*size = f2fs_acl_size(acl->a_count);
return (void *)f2fs_acl;
fail:
kfree(f2fs_acl);
return ERR_PTR(-EINVAL);
}
struct posix_acl *f2fs_get_acl(struct inode *inode, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
int name_index = F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT;
void *value = NULL;
struct posix_acl *acl;
int retval;
if (!test_opt(sbi, POSIX_ACL))
return NULL;
acl = get_cached_acl(inode, type);
if (acl != ACL_NOT_CACHED)
return acl;
if (type == ACL_TYPE_ACCESS)
name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
retval = f2fs_getxattr(inode, name_index, "", NULL, 0);
if (retval > 0) {
value = kmalloc(retval, GFP_KERNEL);
if (!value)
return ERR_PTR(-ENOMEM);
retval = f2fs_getxattr(inode, name_index, "", value, retval);
}
if (retval < 0) {
if (retval == -ENODATA)
acl = NULL;
else
acl = ERR_PTR(retval);
} else {
acl = f2fs_acl_from_disk(value, retval);
}
kfree(value);
if (!IS_ERR(acl))
set_cached_acl(inode, type, acl);
return acl;
}
static int f2fs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct f2fs_inode_info *fi = F2FS_I(inode);
int name_index;
void *value = NULL;
size_t size = 0;
int error;
if (!test_opt(sbi, POSIX_ACL))
return 0;
if (S_ISLNK(inode->i_mode))
return -EOPNOTSUPP;
switch (type) {
case ACL_TYPE_ACCESS:
name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
if (acl) {
error = posix_acl_equiv_mode(acl, &inode->i_mode);
if (error < 0)
return error;
set_acl_inode(fi, inode->i_mode);
if (error == 0)
acl = NULL;
}
break;
case ACL_TYPE_DEFAULT:
name_index = F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT;
if (!S_ISDIR(inode->i_mode))
return acl ? -EACCES : 0;
break;
default:
return -EINVAL;
}
if (acl) {
value = f2fs_acl_to_disk(acl, &size);
if (IS_ERR(value)) {
cond_clear_inode_flag(fi, FI_ACL_MODE);
return (int)PTR_ERR(value);
}
}
error = f2fs_setxattr(inode, name_index, "", value, size);
kfree(value);
if (!error)
set_cached_acl(inode, type, acl);
cond_clear_inode_flag(fi, FI_ACL_MODE);
return error;
}
int f2fs_init_acl(struct inode *inode, struct inode *dir)
{
struct posix_acl *acl = NULL;
struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
int error = 0;
if (!S_ISLNK(inode->i_mode)) {
if (test_opt(sbi, POSIX_ACL)) {
acl = f2fs_get_acl(dir, ACL_TYPE_DEFAULT);
if (IS_ERR(acl))
return PTR_ERR(acl);
}
if (!acl)
inode->i_mode &= ~current_umask();
}
if (test_opt(sbi, POSIX_ACL) && acl) {
if (S_ISDIR(inode->i_mode)) {
error = f2fs_set_acl(inode, ACL_TYPE_DEFAULT, acl);
if (error)
goto cleanup;
}
error = posix_acl_create(&acl, GFP_KERNEL, &inode->i_mode);
if (error < 0)
return error;
if (error > 0)
error = f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl);
}
cleanup:
posix_acl_release(acl);
return error;
}
int f2fs_acl_chmod(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct posix_acl *acl;
int error;
mode_t mode = get_inode_mode(inode);
if (!test_opt(sbi, POSIX_ACL))
return 0;
if (S_ISLNK(mode))
return -EOPNOTSUPP;
acl = f2fs_get_acl(inode, ACL_TYPE_ACCESS);
if (IS_ERR(acl) || !acl)
return PTR_ERR(acl);
error = posix_acl_chmod(&acl, GFP_KERNEL, mode);
if (error)
return error;
error = f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl);
posix_acl_release(acl);
return error;
}
static size_t f2fs_xattr_list_acl(struct dentry *dentry, char *list,
size_t list_size, const char *name, size_t name_len, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
const char *xname = POSIX_ACL_XATTR_DEFAULT;
size_t size;
if (!test_opt(sbi, POSIX_ACL))
return 0;
if (type == ACL_TYPE_ACCESS)
xname = POSIX_ACL_XATTR_ACCESS;
size = strlen(xname) + 1;
if (list && size <= list_size)
memcpy(list, xname, size);
return size;
}
static int f2fs_xattr_get_acl(struct dentry *dentry, const char *name,
void *buffer, size_t size, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
struct posix_acl *acl;
int error;
if (strcmp(name, "") != 0)
return -EINVAL;
if (!test_opt(sbi, POSIX_ACL))
return -EOPNOTSUPP;
acl = f2fs_get_acl(dentry->d_inode, type);
if (IS_ERR(acl))
return PTR_ERR(acl);
if (!acl)
return -ENODATA;
error = posix_acl_to_xattr(&init_user_ns, acl, buffer, size);
posix_acl_release(acl);
return error;
}
static int f2fs_xattr_set_acl(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
struct inode *inode = dentry->d_inode;
struct posix_acl *acl = NULL;
int error;
if (strcmp(name, "") != 0)
return -EINVAL;
if (!test_opt(sbi, POSIX_ACL))
return -EOPNOTSUPP;
if (!inode_owner_or_capable(inode))
return -EPERM;
if (value) {
acl = posix_acl_from_xattr(&init_user_ns, value, size);
if (IS_ERR(acl))
return PTR_ERR(acl);
if (acl) {
error = posix_acl_valid(acl);
if (error)
goto release_and_out;
}
} else {
acl = NULL;
}
error = f2fs_set_acl(inode, type, acl);
release_and_out:
posix_acl_release(acl);
return error;
}
const struct xattr_handler f2fs_xattr_acl_default_handler = {
.prefix = POSIX_ACL_XATTR_DEFAULT,
.flags = ACL_TYPE_DEFAULT,
.list = f2fs_xattr_list_acl,
.get = f2fs_xattr_get_acl,
.set = f2fs_xattr_set_acl,
};
const struct xattr_handler f2fs_xattr_acl_access_handler = {
.prefix = POSIX_ACL_XATTR_ACCESS,
.flags = ACL_TYPE_ACCESS,
.list = f2fs_xattr_list_acl,
.get = f2fs_xattr_get_acl,
.set = f2fs_xattr_set_acl,
};

57
fs/f2fs/acl.h Normal file
View file

@ -0,0 +1,57 @@
/*
* fs/f2fs/acl.h
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* Portions of this code from linux/fs/ext2/acl.h
*
* Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#ifndef __F2FS_ACL_H__
#define __F2FS_ACL_H__
#include <linux/posix_acl_xattr.h>
#define F2FS_ACL_VERSION 0x0001
struct f2fs_acl_entry {
__le16 e_tag;
__le16 e_perm;
__le32 e_id;
};
struct f2fs_acl_entry_short {
__le16 e_tag;
__le16 e_perm;
};
struct f2fs_acl_header {
__le32 a_version;
};
#ifdef CONFIG_F2FS_FS_POSIX_ACL
extern struct posix_acl *f2fs_get_acl(struct inode *inode, int type);
extern int f2fs_acl_chmod(struct inode *inode);
extern int f2fs_init_acl(struct inode *inode, struct inode *dir);
#else
#define f2fs_check_acl NULL
#define f2fs_get_acl NULL
#define f2fs_set_acl NULL
static inline int f2fs_acl_chmod(struct inode *inode)
{
return 0;
}
static inline int f2fs_init_acl(struct inode *inode, struct inode *dir)
{
return 0;
}
#endif
#endif /* __F2FS_ACL_H__ */

794
fs/f2fs/checkpoint.c Normal file
View file

@ -0,0 +1,794 @@
/*
* fs/f2fs/checkpoint.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/bio.h>
#include <linux/mpage.h>
#include <linux/writeback.h>
#include <linux/blkdev.h>
#include <linux/f2fs_fs.h>
#include <linux/pagevec.h>
#include <linux/swap.h>
#include "f2fs.h"
#include "node.h"
#include "segment.h"
static struct kmem_cache *orphan_entry_slab;
static struct kmem_cache *inode_entry_slab;
/*
* We guarantee no failure on the returned page.
*/
struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
{
struct address_space *mapping = sbi->meta_inode->i_mapping;
struct page *page = NULL;
repeat:
page = grab_cache_page(mapping, index);
if (!page) {
cond_resched();
goto repeat;
}
/* We wait writeback only inside grab_meta_page() */
wait_on_page_writeback(page);
SetPageUptodate(page);
return page;
}
/*
* We guarantee no failure on the returned page.
*/
struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
{
struct address_space *mapping = sbi->meta_inode->i_mapping;
struct page *page;
repeat:
page = grab_cache_page(mapping, index);
if (!page) {
cond_resched();
goto repeat;
}
if (f2fs_readpage(sbi, page, index, READ_SYNC)) {
f2fs_put_page(page, 1);
goto repeat;
}
mark_page_accessed(page);
/* We do not allow returning an errorneous page */
return page;
}
static int f2fs_write_meta_page(struct page *page,
struct writeback_control *wbc)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
int err;
wait_on_page_writeback(page);
err = write_meta_page(sbi, page, wbc);
if (err) {
wbc->pages_skipped++;
set_page_dirty(page);
}
dec_page_count(sbi, F2FS_DIRTY_META);
/* In this case, we should not unlock this page */
if (err != AOP_WRITEPAGE_ACTIVATE)
unlock_page(page);
return err;
}
static int f2fs_write_meta_pages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
struct block_device *bdev = sbi->sb->s_bdev;
long written;
if (wbc->for_kupdate)
return 0;
if (get_pages(sbi, F2FS_DIRTY_META) == 0)
return 0;
/* if mounting is failed, skip writing node pages */
mutex_lock(&sbi->cp_mutex);
written = sync_meta_pages(sbi, META, bio_get_nr_vecs(bdev));
mutex_unlock(&sbi->cp_mutex);
wbc->nr_to_write -= written;
return 0;
}
long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
long nr_to_write)
{
struct address_space *mapping = sbi->meta_inode->i_mapping;
pgoff_t index = 0, end = LONG_MAX;
struct pagevec pvec;
long nwritten = 0;
struct writeback_control wbc = {
.for_reclaim = 0,
};
pagevec_init(&pvec, 0);
while (index <= end) {
int i, nr_pages;
nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
PAGECACHE_TAG_DIRTY,
min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
if (nr_pages == 0)
break;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
lock_page(page);
BUG_ON(page->mapping != mapping);
BUG_ON(!PageDirty(page));
clear_page_dirty_for_io(page);
f2fs_write_meta_page(page, &wbc);
if (nwritten++ >= nr_to_write)
break;
}
pagevec_release(&pvec);
cond_resched();
}
if (nwritten)
f2fs_submit_bio(sbi, type, nr_to_write == LONG_MAX);
return nwritten;
}
static int f2fs_set_meta_page_dirty(struct page *page)
{
struct address_space *mapping = page->mapping;
struct f2fs_sb_info *sbi = F2FS_SB(mapping->host->i_sb);
SetPageUptodate(page);
if (!PageDirty(page)) {
__set_page_dirty_nobuffers(page);
inc_page_count(sbi, F2FS_DIRTY_META);
F2FS_SET_SB_DIRT(sbi);
return 1;
}
return 0;
}
const struct address_space_operations f2fs_meta_aops = {
.writepage = f2fs_write_meta_page,
.writepages = f2fs_write_meta_pages,
.set_page_dirty = f2fs_set_meta_page_dirty,
};
int check_orphan_space(struct f2fs_sb_info *sbi)
{
unsigned int max_orphans;
int err = 0;
/*
* considering 512 blocks in a segment 5 blocks are needed for cp
* and log segment summaries. Remaining blocks are used to keep
* orphan entries with the limitation one reserved segment
* for cp pack we can have max 1020*507 orphan entries
*/
max_orphans = (sbi->blocks_per_seg - 5) * F2FS_ORPHANS_PER_BLOCK;
mutex_lock(&sbi->orphan_inode_mutex);
if (sbi->n_orphans >= max_orphans)
err = -ENOSPC;
mutex_unlock(&sbi->orphan_inode_mutex);
return err;
}
void add_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
{
struct list_head *head, *this;
struct orphan_inode_entry *new = NULL, *orphan = NULL;
mutex_lock(&sbi->orphan_inode_mutex);
head = &sbi->orphan_inode_list;
list_for_each(this, head) {
orphan = list_entry(this, struct orphan_inode_entry, list);
if (orphan->ino == ino)
goto out;
if (orphan->ino > ino)
break;
orphan = NULL;
}
retry:
new = kmem_cache_alloc(orphan_entry_slab, GFP_ATOMIC);
if (!new) {
cond_resched();
goto retry;
}
new->ino = ino;
INIT_LIST_HEAD(&new->list);
/* add new_oentry into list which is sorted by inode number */
if (orphan) {
struct orphan_inode_entry *prev;
/* get previous entry */
prev = list_entry(orphan->list.prev, typeof(*prev), list);
if (&prev->list != head)
/* insert new orphan inode entry */
list_add(&new->list, &prev->list);
else
list_add(&new->list, head);
} else {
list_add_tail(&new->list, head);
}
sbi->n_orphans++;
out:
mutex_unlock(&sbi->orphan_inode_mutex);
}
void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
{
struct list_head *this, *next, *head;
struct orphan_inode_entry *orphan;
mutex_lock(&sbi->orphan_inode_mutex);
head = &sbi->orphan_inode_list;
list_for_each_safe(this, next, head) {
orphan = list_entry(this, struct orphan_inode_entry, list);
if (orphan->ino == ino) {
list_del(&orphan->list);
kmem_cache_free(orphan_entry_slab, orphan);
sbi->n_orphans--;
break;
}
}
mutex_unlock(&sbi->orphan_inode_mutex);
}
static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
{
struct inode *inode = f2fs_iget(sbi->sb, ino);
BUG_ON(IS_ERR(inode));
clear_nlink(inode);
/* truncate all the data during iput */
iput(inode);
}
int recover_orphan_inodes(struct f2fs_sb_info *sbi)
{
block_t start_blk, orphan_blkaddr, i, j;
if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG))
return 0;
sbi->por_doing = 1;
start_blk = __start_cp_addr(sbi) + 1;
orphan_blkaddr = __start_sum_addr(sbi) - 1;
for (i = 0; i < orphan_blkaddr; i++) {
struct page *page = get_meta_page(sbi, start_blk + i);
struct f2fs_orphan_block *orphan_blk;
orphan_blk = (struct f2fs_orphan_block *)page_address(page);
for (j = 0; j < le32_to_cpu(orphan_blk->entry_count); j++) {
nid_t ino = le32_to_cpu(orphan_blk->ino[j]);
recover_orphan_inode(sbi, ino);
}
f2fs_put_page(page, 1);
}
/* clear Orphan Flag */
clear_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG);
sbi->por_doing = 0;
return 0;
}
static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk)
{
struct list_head *head, *this, *next;
struct f2fs_orphan_block *orphan_blk = NULL;
struct page *page = NULL;
unsigned int nentries = 0;
unsigned short index = 1;
unsigned short orphan_blocks;
orphan_blocks = (unsigned short)((sbi->n_orphans +
(F2FS_ORPHANS_PER_BLOCK - 1)) / F2FS_ORPHANS_PER_BLOCK);
mutex_lock(&sbi->orphan_inode_mutex);
head = &sbi->orphan_inode_list;
/* loop for each orphan inode entry and write them in Jornal block */
list_for_each_safe(this, next, head) {
struct orphan_inode_entry *orphan;
orphan = list_entry(this, struct orphan_inode_entry, list);
if (nentries == F2FS_ORPHANS_PER_BLOCK) {
/*
* an orphan block is full of 1020 entries,
* then we need to flush current orphan blocks
* and bring another one in memory
*/
orphan_blk->blk_addr = cpu_to_le16(index);
orphan_blk->blk_count = cpu_to_le16(orphan_blocks);
orphan_blk->entry_count = cpu_to_le32(nentries);
set_page_dirty(page);
f2fs_put_page(page, 1);
index++;
start_blk++;
nentries = 0;
page = NULL;
}
if (page)
goto page_exist;
page = grab_meta_page(sbi, start_blk);
orphan_blk = (struct f2fs_orphan_block *)page_address(page);
memset(orphan_blk, 0, sizeof(*orphan_blk));
page_exist:
orphan_blk->ino[nentries++] = cpu_to_le32(orphan->ino);
}
if (!page)
goto end;
orphan_blk->blk_addr = cpu_to_le16(index);
orphan_blk->blk_count = cpu_to_le16(orphan_blocks);
orphan_blk->entry_count = cpu_to_le32(nentries);
set_page_dirty(page);
f2fs_put_page(page, 1);
end:
mutex_unlock(&sbi->orphan_inode_mutex);
}
static struct page *validate_checkpoint(struct f2fs_sb_info *sbi,
block_t cp_addr, unsigned long long *version)
{
struct page *cp_page_1, *cp_page_2 = NULL;
unsigned long blk_size = sbi->blocksize;
struct f2fs_checkpoint *cp_block;
unsigned long long cur_version = 0, pre_version = 0;
unsigned int crc = 0;
size_t crc_offset;
/* Read the 1st cp block in this CP pack */
cp_page_1 = get_meta_page(sbi, cp_addr);
/* get the version number */
cp_block = (struct f2fs_checkpoint *)page_address(cp_page_1);
crc_offset = le32_to_cpu(cp_block->checksum_offset);
if (crc_offset >= blk_size)
goto invalid_cp1;
crc = *(unsigned int *)((unsigned char *)cp_block + crc_offset);
if (!f2fs_crc_valid(crc, cp_block, crc_offset))
goto invalid_cp1;
pre_version = le64_to_cpu(cp_block->checkpoint_ver);
/* Read the 2nd cp block in this CP pack */
cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1;
cp_page_2 = get_meta_page(sbi, cp_addr);
cp_block = (struct f2fs_checkpoint *)page_address(cp_page_2);
crc_offset = le32_to_cpu(cp_block->checksum_offset);
if (crc_offset >= blk_size)
goto invalid_cp2;
crc = *(unsigned int *)((unsigned char *)cp_block + crc_offset);
if (!f2fs_crc_valid(crc, cp_block, crc_offset))
goto invalid_cp2;
cur_version = le64_to_cpu(cp_block->checkpoint_ver);
if (cur_version == pre_version) {
*version = cur_version;
f2fs_put_page(cp_page_2, 1);
return cp_page_1;
}
invalid_cp2:
f2fs_put_page(cp_page_2, 1);
invalid_cp1:
f2fs_put_page(cp_page_1, 1);
return NULL;
}
int get_valid_checkpoint(struct f2fs_sb_info *sbi)
{
struct f2fs_checkpoint *cp_block;
struct f2fs_super_block *fsb = sbi->raw_super;
struct page *cp1, *cp2, *cur_page;
unsigned long blk_size = sbi->blocksize;
unsigned long long cp1_version = 0, cp2_version = 0;
unsigned long long cp_start_blk_no;
sbi->ckpt = kzalloc(blk_size, GFP_KERNEL);
if (!sbi->ckpt)
return -ENOMEM;
/*
* Finding out valid cp block involves read both
* sets( cp pack1 and cp pack 2)
*/
cp_start_blk_no = le32_to_cpu(fsb->cp_blkaddr);
cp1 = validate_checkpoint(sbi, cp_start_blk_no, &cp1_version);
/* The second checkpoint pack should start at the next segment */
cp_start_blk_no += 1 << le32_to_cpu(fsb->log_blocks_per_seg);
cp2 = validate_checkpoint(sbi, cp_start_blk_no, &cp2_version);
if (cp1 && cp2) {
if (ver_after(cp2_version, cp1_version))
cur_page = cp2;
else
cur_page = cp1;
} else if (cp1) {
cur_page = cp1;
} else if (cp2) {
cur_page = cp2;
} else {
goto fail_no_cp;
}
cp_block = (struct f2fs_checkpoint *)page_address(cur_page);
memcpy(sbi->ckpt, cp_block, blk_size);
f2fs_put_page(cp1, 1);
f2fs_put_page(cp2, 1);
return 0;
fail_no_cp:
kfree(sbi->ckpt);
return -EINVAL;
}
void set_dirty_dir_page(struct inode *inode, struct page *page)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct list_head *head = &sbi->dir_inode_list;
struct dir_inode_entry *new;
struct list_head *this;
if (!S_ISDIR(inode->i_mode))
return;
retry:
new = kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
if (!new) {
cond_resched();
goto retry;
}
new->inode = inode;
INIT_LIST_HEAD(&new->list);
spin_lock(&sbi->dir_inode_lock);
list_for_each(this, head) {
struct dir_inode_entry *entry;
entry = list_entry(this, struct dir_inode_entry, list);
if (entry->inode == inode) {
kmem_cache_free(inode_entry_slab, new);
goto out;
}
}
list_add_tail(&new->list, head);
sbi->n_dirty_dirs++;
BUG_ON(!S_ISDIR(inode->i_mode));
out:
inc_page_count(sbi, F2FS_DIRTY_DENTS);
inode_inc_dirty_dents(inode);
SetPagePrivate(page);
spin_unlock(&sbi->dir_inode_lock);
}
void remove_dirty_dir_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct list_head *head = &sbi->dir_inode_list;
struct list_head *this;
if (!S_ISDIR(inode->i_mode))
return;
spin_lock(&sbi->dir_inode_lock);
if (atomic_read(&F2FS_I(inode)->dirty_dents))
goto out;
list_for_each(this, head) {
struct dir_inode_entry *entry;
entry = list_entry(this, struct dir_inode_entry, list);
if (entry->inode == inode) {
list_del(&entry->list);
kmem_cache_free(inode_entry_slab, entry);
sbi->n_dirty_dirs--;
break;
}
}
out:
spin_unlock(&sbi->dir_inode_lock);
}
void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
{
struct list_head *head = &sbi->dir_inode_list;
struct dir_inode_entry *entry;
struct inode *inode;
retry:
spin_lock(&sbi->dir_inode_lock);
if (list_empty(head)) {
spin_unlock(&sbi->dir_inode_lock);
return;
}
entry = list_entry(head->next, struct dir_inode_entry, list);
inode = igrab(entry->inode);
spin_unlock(&sbi->dir_inode_lock);
if (inode) {
filemap_flush(inode->i_mapping);
iput(inode);
} else {
/*
* We should submit bio, since it exists several
* wribacking dentry pages in the freeing inode.
*/
f2fs_submit_bio(sbi, DATA, true);
}
goto retry;
}
/*
* Freeze all the FS-operations for checkpoint.
*/
void block_operations(struct f2fs_sb_info *sbi)
{
int t;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
.for_reclaim = 0,
};
/* Stop renaming operation */
mutex_lock_op(sbi, RENAME);
mutex_lock_op(sbi, DENTRY_OPS);
retry_dents:
/* write all the dirty dentry pages */
sync_dirty_dir_inodes(sbi);
mutex_lock_op(sbi, DATA_WRITE);
if (get_pages(sbi, F2FS_DIRTY_DENTS)) {
mutex_unlock_op(sbi, DATA_WRITE);
goto retry_dents;
}
/* block all the operations */
for (t = DATA_NEW; t <= NODE_TRUNC; t++)
mutex_lock_op(sbi, t);
mutex_lock(&sbi->write_inode);
/*
* POR: we should ensure that there is no dirty node pages
* until finishing nat/sit flush.
*/
retry:
sync_node_pages(sbi, 0, &wbc);
mutex_lock_op(sbi, NODE_WRITE);
if (get_pages(sbi, F2FS_DIRTY_NODES)) {
mutex_unlock_op(sbi, NODE_WRITE);
goto retry;
}
mutex_unlock(&sbi->write_inode);
}
static void unblock_operations(struct f2fs_sb_info *sbi)
{
int t;
for (t = NODE_WRITE; t >= RENAME; t--)
mutex_unlock_op(sbi, t);
}
static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
nid_t last_nid = 0;
block_t start_blk;
struct page *cp_page;
unsigned int data_sum_blocks, orphan_blocks;
unsigned int crc32 = 0;
void *kaddr;
int i;
/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META))
sync_meta_pages(sbi, META, LONG_MAX);
next_free_nid(sbi, &last_nid);
/*
* modify checkpoint
* version number is already updated
*/
ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi));
ckpt->valid_block_count = cpu_to_le64(valid_user_blocks(sbi));
ckpt->free_segment_count = cpu_to_le32(free_segments(sbi));
for (i = 0; i < 3; i++) {
ckpt->cur_node_segno[i] =
cpu_to_le32(curseg_segno(sbi, i + CURSEG_HOT_NODE));
ckpt->cur_node_blkoff[i] =
cpu_to_le16(curseg_blkoff(sbi, i + CURSEG_HOT_NODE));
ckpt->alloc_type[i + CURSEG_HOT_NODE] =
curseg_alloc_type(sbi, i + CURSEG_HOT_NODE);
}
for (i = 0; i < 3; i++) {
ckpt->cur_data_segno[i] =
cpu_to_le32(curseg_segno(sbi, i + CURSEG_HOT_DATA));
ckpt->cur_data_blkoff[i] =
cpu_to_le16(curseg_blkoff(sbi, i + CURSEG_HOT_DATA));
ckpt->alloc_type[i + CURSEG_HOT_DATA] =
curseg_alloc_type(sbi, i + CURSEG_HOT_DATA);
}
ckpt->valid_node_count = cpu_to_le32(valid_node_count(sbi));
ckpt->valid_inode_count = cpu_to_le32(valid_inode_count(sbi));
ckpt->next_free_nid = cpu_to_le32(last_nid);
/* 2 cp + n data seg summary + orphan inode blocks */
data_sum_blocks = npages_for_summary_flush(sbi);
if (data_sum_blocks < 3)
set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
else
clear_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
orphan_blocks = (sbi->n_orphans + F2FS_ORPHANS_PER_BLOCK - 1)
/ F2FS_ORPHANS_PER_BLOCK;
ckpt->cp_pack_start_sum = cpu_to_le32(1 + orphan_blocks);
if (is_umount) {
set_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
ckpt->cp_pack_total_block_count = cpu_to_le32(2 +
data_sum_blocks + orphan_blocks + NR_CURSEG_NODE_TYPE);
} else {
clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
ckpt->cp_pack_total_block_count = cpu_to_le32(2 +
data_sum_blocks + orphan_blocks);
}
if (sbi->n_orphans)
set_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
else
clear_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
/* update SIT/NAT bitmap */
get_sit_bitmap(sbi, __bitmap_ptr(sbi, SIT_BITMAP));
get_nat_bitmap(sbi, __bitmap_ptr(sbi, NAT_BITMAP));
crc32 = f2fs_crc32(ckpt, le32_to_cpu(ckpt->checksum_offset));
*(__le32 *)((unsigned char *)ckpt +
le32_to_cpu(ckpt->checksum_offset))
= cpu_to_le32(crc32);
start_blk = __start_cp_addr(sbi);
/* write out checkpoint buffer at block 0 */
cp_page = grab_meta_page(sbi, start_blk++);
kaddr = page_address(cp_page);
memcpy(kaddr, ckpt, (1 << sbi->log_blocksize));
set_page_dirty(cp_page);
f2fs_put_page(cp_page, 1);
if (sbi->n_orphans) {
write_orphan_inodes(sbi, start_blk);
start_blk += orphan_blocks;
}
write_data_summaries(sbi, start_blk);
start_blk += data_sum_blocks;
if (is_umount) {
write_node_summaries(sbi, start_blk);
start_blk += NR_CURSEG_NODE_TYPE;
}
/* writeout checkpoint block */
cp_page = grab_meta_page(sbi, start_blk);
kaddr = page_address(cp_page);
memcpy(kaddr, ckpt, (1 << sbi->log_blocksize));
set_page_dirty(cp_page);
f2fs_put_page(cp_page, 1);
/* wait for previous submitted node/meta pages writeback */
while (get_pages(sbi, F2FS_WRITEBACK))
congestion_wait(BLK_RW_ASYNC, HZ / 50);
filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
/* update user_block_counts */
sbi->last_valid_block_count = sbi->total_valid_block_count;
sbi->alloc_valid_block_count = 0;
/* Here, we only have one bio having CP pack */
if (is_set_ckpt_flags(ckpt, CP_ERROR_FLAG))
sbi->sb->s_flags |= MS_RDONLY;
else
sync_meta_pages(sbi, META_FLUSH, LONG_MAX);
clear_prefree_segments(sbi);
F2FS_RESET_SB_DIRT(sbi);
}
/*
* We guarantee that this checkpoint procedure should not fail.
*/
void write_checkpoint(struct f2fs_sb_info *sbi, bool blocked, bool is_umount)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
unsigned long long ckpt_ver;
if (!blocked) {
mutex_lock(&sbi->cp_mutex);
block_operations(sbi);
}
f2fs_submit_bio(sbi, DATA, true);
f2fs_submit_bio(sbi, NODE, true);
f2fs_submit_bio(sbi, META, true);
/*
* update checkpoint pack index
* Increase the version number so that
* SIT entries and seg summaries are written at correct place
*/
ckpt_ver = le64_to_cpu(ckpt->checkpoint_ver);
ckpt->checkpoint_ver = cpu_to_le64(++ckpt_ver);
/* write cached NAT/SIT entries to NAT/SIT area */
flush_nat_entries(sbi);
flush_sit_entries(sbi);
reset_victim_segmap(sbi);
/* unlock all the fs_lock[] in do_checkpoint() */
do_checkpoint(sbi, is_umount);
unblock_operations(sbi);
mutex_unlock(&sbi->cp_mutex);
}
void init_orphan_info(struct f2fs_sb_info *sbi)
{
mutex_init(&sbi->orphan_inode_mutex);
INIT_LIST_HEAD(&sbi->orphan_inode_list);
sbi->n_orphans = 0;
}
int create_checkpoint_caches(void)
{
orphan_entry_slab = f2fs_kmem_cache_create("f2fs_orphan_entry",
sizeof(struct orphan_inode_entry), NULL);
if (unlikely(!orphan_entry_slab))
return -ENOMEM;
inode_entry_slab = f2fs_kmem_cache_create("f2fs_dirty_dir_entry",
sizeof(struct dir_inode_entry), NULL);
if (unlikely(!inode_entry_slab)) {
kmem_cache_destroy(orphan_entry_slab);
return -ENOMEM;
}
return 0;
}
void destroy_checkpoint_caches(void)
{
kmem_cache_destroy(orphan_entry_slab);
kmem_cache_destroy(inode_entry_slab);
}

702
fs/f2fs/data.c Normal file
View file

@ -0,0 +1,702 @@
/*
* fs/f2fs/data.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/buffer_head.h>
#include <linux/mpage.h>
#include <linux/writeback.h>
#include <linux/backing-dev.h>
#include <linux/blkdev.h>
#include <linux/bio.h>
#include "f2fs.h"
#include "node.h"
#include "segment.h"
/*
* Lock ordering for the change of data block address:
* ->data_page
* ->node_page
* update block addresses in the node page
*/
static void __set_data_blkaddr(struct dnode_of_data *dn, block_t new_addr)
{
struct f2fs_node *rn;
__le32 *addr_array;
struct page *node_page = dn->node_page;
unsigned int ofs_in_node = dn->ofs_in_node;
wait_on_page_writeback(node_page);
rn = (struct f2fs_node *)page_address(node_page);
/* Get physical address of data block */
addr_array = blkaddr_in_node(rn);
addr_array[ofs_in_node] = cpu_to_le32(new_addr);
set_page_dirty(node_page);
}
int reserve_new_block(struct dnode_of_data *dn)
{
struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
if (is_inode_flag_set(F2FS_I(dn->inode), FI_NO_ALLOC))
return -EPERM;
if (!inc_valid_block_count(sbi, dn->inode, 1))
return -ENOSPC;
__set_data_blkaddr(dn, NEW_ADDR);
dn->data_blkaddr = NEW_ADDR;
sync_inode_page(dn);
return 0;
}
static int check_extent_cache(struct inode *inode, pgoff_t pgofs,
struct buffer_head *bh_result)
{
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
pgoff_t start_fofs, end_fofs;
block_t start_blkaddr;
read_lock(&fi->ext.ext_lock);
if (fi->ext.len == 0) {
read_unlock(&fi->ext.ext_lock);
return 0;
}
sbi->total_hit_ext++;
start_fofs = fi->ext.fofs;
end_fofs = fi->ext.fofs + fi->ext.len - 1;
start_blkaddr = fi->ext.blk_addr;
if (pgofs >= start_fofs && pgofs <= end_fofs) {
unsigned int blkbits = inode->i_sb->s_blocksize_bits;
size_t count;
clear_buffer_new(bh_result);
map_bh(bh_result, inode->i_sb,
start_blkaddr + pgofs - start_fofs);
count = end_fofs - pgofs + 1;
if (count < (UINT_MAX >> blkbits))
bh_result->b_size = (count << blkbits);
else
bh_result->b_size = UINT_MAX;
sbi->read_hit_ext++;
read_unlock(&fi->ext.ext_lock);
return 1;
}
read_unlock(&fi->ext.ext_lock);
return 0;
}
void update_extent_cache(block_t blk_addr, struct dnode_of_data *dn)
{
struct f2fs_inode_info *fi = F2FS_I(dn->inode);
pgoff_t fofs, start_fofs, end_fofs;
block_t start_blkaddr, end_blkaddr;
BUG_ON(blk_addr == NEW_ADDR);
fofs = start_bidx_of_node(ofs_of_node(dn->node_page)) + dn->ofs_in_node;
/* Update the page address in the parent node */
__set_data_blkaddr(dn, blk_addr);
write_lock(&fi->ext.ext_lock);
start_fofs = fi->ext.fofs;
end_fofs = fi->ext.fofs + fi->ext.len - 1;
start_blkaddr = fi->ext.blk_addr;
end_blkaddr = fi->ext.blk_addr + fi->ext.len - 1;
/* Drop and initialize the matched extent */
if (fi->ext.len == 1 && fofs == start_fofs)
fi->ext.len = 0;
/* Initial extent */
if (fi->ext.len == 0) {
if (blk_addr != NULL_ADDR) {
fi->ext.fofs = fofs;
fi->ext.blk_addr = blk_addr;
fi->ext.len = 1;
}
goto end_update;
}
/* Frone merge */
if (fofs == start_fofs - 1 && blk_addr == start_blkaddr - 1) {
fi->ext.fofs--;
fi->ext.blk_addr--;
fi->ext.len++;
goto end_update;
}
/* Back merge */
if (fofs == end_fofs + 1 && blk_addr == end_blkaddr + 1) {
fi->ext.len++;
goto end_update;
}
/* Split the existing extent */
if (fi->ext.len > 1 &&
fofs >= start_fofs && fofs <= end_fofs) {
if ((end_fofs - fofs) < (fi->ext.len >> 1)) {
fi->ext.len = fofs - start_fofs;
} else {
fi->ext.fofs = fofs + 1;
fi->ext.blk_addr = start_blkaddr +
fofs - start_fofs + 1;
fi->ext.len -= fofs - start_fofs + 1;
}
goto end_update;
}
write_unlock(&fi->ext.ext_lock);
return;
end_update:
write_unlock(&fi->ext.ext_lock);
sync_inode_page(dn);
return;
}
struct page *find_data_page(struct inode *inode, pgoff_t index)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct address_space *mapping = inode->i_mapping;
struct dnode_of_data dn;
struct page *page;
int err;
page = find_get_page(mapping, index);
if (page && PageUptodate(page))
return page;
f2fs_put_page(page, 0);
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, index, RDONLY_NODE);
if (err)
return ERR_PTR(err);
f2fs_put_dnode(&dn);
if (dn.data_blkaddr == NULL_ADDR)
return ERR_PTR(-ENOENT);
/* By fallocate(), there is no cached page, but with NEW_ADDR */
if (dn.data_blkaddr == NEW_ADDR)
return ERR_PTR(-EINVAL);
page = grab_cache_page(mapping, index);
if (!page)
return ERR_PTR(-ENOMEM);
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err) {
f2fs_put_page(page, 1);
return ERR_PTR(err);
}
unlock_page(page);
return page;
}
/*
* If it tries to access a hole, return an error.
* Because, the callers, functions in dir.c and GC, should be able to know
* whether this page exists or not.
*/
struct page *get_lock_data_page(struct inode *inode, pgoff_t index)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct address_space *mapping = inode->i_mapping;
struct dnode_of_data dn;
struct page *page;
int err;
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, index, RDONLY_NODE);
if (err)
return ERR_PTR(err);
f2fs_put_dnode(&dn);
if (dn.data_blkaddr == NULL_ADDR)
return ERR_PTR(-ENOENT);
page = grab_cache_page(mapping, index);
if (!page)
return ERR_PTR(-ENOMEM);
if (PageUptodate(page))
return page;
BUG_ON(dn.data_blkaddr == NEW_ADDR);
BUG_ON(dn.data_blkaddr == NULL_ADDR);
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err) {
f2fs_put_page(page, 1);
return ERR_PTR(err);
}
return page;
}
/*
* Caller ensures that this data page is never allocated.
* A new zero-filled data page is allocated in the page cache.
*/
struct page *get_new_data_page(struct inode *inode, pgoff_t index,
bool new_i_size)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct address_space *mapping = inode->i_mapping;
struct page *page;
struct dnode_of_data dn;
int err;
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, index, 0);
if (err)
return ERR_PTR(err);
if (dn.data_blkaddr == NULL_ADDR) {
if (reserve_new_block(&dn)) {
f2fs_put_dnode(&dn);
return ERR_PTR(-ENOSPC);
}
}
f2fs_put_dnode(&dn);
page = grab_cache_page(mapping, index);
if (!page)
return ERR_PTR(-ENOMEM);
if (PageUptodate(page))
return page;
if (dn.data_blkaddr == NEW_ADDR) {
zero_user_segment(page, 0, PAGE_CACHE_SIZE);
} else {
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err) {
f2fs_put_page(page, 1);
return ERR_PTR(err);
}
}
SetPageUptodate(page);
if (new_i_size &&
i_size_read(inode) < ((index + 1) << PAGE_CACHE_SHIFT)) {
i_size_write(inode, ((index + 1) << PAGE_CACHE_SHIFT));
mark_inode_dirty_sync(inode);
}
return page;
}
static void read_end_io(struct bio *bio, int err)
{
const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
do {
struct page *page = bvec->bv_page;
if (--bvec >= bio->bi_io_vec)
prefetchw(&bvec->bv_page->flags);
if (uptodate) {
SetPageUptodate(page);
} else {
ClearPageUptodate(page);
SetPageError(page);
}
unlock_page(page);
} while (bvec >= bio->bi_io_vec);
kfree(bio->bi_private);
bio_put(bio);
}
/*
* Fill the locked page with data located in the block address.
* Read operation is synchronous, and caller must unlock the page.
*/
int f2fs_readpage(struct f2fs_sb_info *sbi, struct page *page,
block_t blk_addr, int type)
{
struct block_device *bdev = sbi->sb->s_bdev;
bool sync = (type == READ_SYNC);
struct bio *bio;
/* This page can be already read by other threads */
if (PageUptodate(page)) {
if (!sync)
unlock_page(page);
return 0;
}
down_read(&sbi->bio_sem);
/* Allocate a new bio */
bio = f2fs_bio_alloc(bdev, 1);
/* Initialize the bio */
bio->bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
bio->bi_end_io = read_end_io;
if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
kfree(bio->bi_private);
bio_put(bio);
up_read(&sbi->bio_sem);
return -EFAULT;
}
submit_bio(type, bio);
up_read(&sbi->bio_sem);
/* wait for read completion if sync */
if (sync) {
lock_page(page);
if (PageError(page))
return -EIO;
}
return 0;
}
/*
* This function should be used by the data read flow only where it
* does not check the "create" flag that indicates block allocation.
* The reason for this special functionality is to exploit VFS readahead
* mechanism.
*/
static int get_data_block_ro(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
unsigned int blkbits = inode->i_sb->s_blocksize_bits;
unsigned maxblocks = bh_result->b_size >> blkbits;
struct dnode_of_data dn;
pgoff_t pgofs;
int err;
/* Get the page offset from the block offset(iblock) */
pgofs = (pgoff_t)(iblock >> (PAGE_CACHE_SHIFT - blkbits));
if (check_extent_cache(inode, pgofs, bh_result))
return 0;
/* When reading holes, we need its node page */
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, pgofs, RDONLY_NODE);
if (err)
return (err == -ENOENT) ? 0 : err;
/* It does not support data allocation */
BUG_ON(create);
if (dn.data_blkaddr != NEW_ADDR && dn.data_blkaddr != NULL_ADDR) {
int i;
unsigned int end_offset;
end_offset = IS_INODE(dn.node_page) ?
ADDRS_PER_INODE :
ADDRS_PER_BLOCK;
clear_buffer_new(bh_result);
/* Give more consecutive addresses for the read ahead */
for (i = 0; i < end_offset - dn.ofs_in_node; i++)
if (((datablock_addr(dn.node_page,
dn.ofs_in_node + i))
!= (dn.data_blkaddr + i)) || maxblocks == i)
break;
map_bh(bh_result, inode->i_sb, dn.data_blkaddr);
bh_result->b_size = (i << blkbits);
}
f2fs_put_dnode(&dn);
return 0;
}
static int f2fs_read_data_page(struct file *file, struct page *page)
{
return mpage_readpage(page, get_data_block_ro);
}
static int f2fs_read_data_pages(struct file *file,
struct address_space *mapping,
struct list_head *pages, unsigned nr_pages)
{
return mpage_readpages(mapping, pages, nr_pages, get_data_block_ro);
}
int do_write_data_page(struct page *page)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
block_t old_blk_addr, new_blk_addr;
struct dnode_of_data dn;
int err = 0;
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, page->index, RDONLY_NODE);
if (err)
return err;
old_blk_addr = dn.data_blkaddr;
/* This page is already truncated */
if (old_blk_addr == NULL_ADDR)
goto out_writepage;
set_page_writeback(page);
/*
* If current allocation needs SSR,
* it had better in-place writes for updated data.
*/
if (old_blk_addr != NEW_ADDR && !is_cold_data(page) &&
need_inplace_update(inode)) {
rewrite_data_page(F2FS_SB(inode->i_sb), page,
old_blk_addr);
} else {
write_data_page(inode, page, &dn,
old_blk_addr, &new_blk_addr);
update_extent_cache(new_blk_addr, &dn);
F2FS_I(inode)->data_version =
le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver);
}
out_writepage:
f2fs_put_dnode(&dn);
return err;
}
static int f2fs_write_data_page(struct page *page,
struct writeback_control *wbc)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
loff_t i_size = i_size_read(inode);
const pgoff_t end_index = ((unsigned long long) i_size)
>> PAGE_CACHE_SHIFT;
unsigned offset;
int err = 0;
if (page->index < end_index)
goto out;
/*
* If the offset is out-of-range of file size,
* this page does not have to be written to disk.
*/
offset = i_size & (PAGE_CACHE_SIZE - 1);
if ((page->index >= end_index + 1) || !offset) {
if (S_ISDIR(inode->i_mode)) {
dec_page_count(sbi, F2FS_DIRTY_DENTS);
inode_dec_dirty_dents(inode);
}
goto unlock_out;
}
zero_user_segment(page, offset, PAGE_CACHE_SIZE);
out:
if (sbi->por_doing)
goto redirty_out;
if (wbc->for_reclaim && !S_ISDIR(inode->i_mode) && !is_cold_data(page))
goto redirty_out;
mutex_lock_op(sbi, DATA_WRITE);
if (S_ISDIR(inode->i_mode)) {
dec_page_count(sbi, F2FS_DIRTY_DENTS);
inode_dec_dirty_dents(inode);
}
err = do_write_data_page(page);
if (err && err != -ENOENT) {
wbc->pages_skipped++;
set_page_dirty(page);
}
mutex_unlock_op(sbi, DATA_WRITE);
if (wbc->for_reclaim)
f2fs_submit_bio(sbi, DATA, true);
if (err == -ENOENT)
goto unlock_out;
clear_cold_data(page);
unlock_page(page);
if (!wbc->for_reclaim && !S_ISDIR(inode->i_mode))
f2fs_balance_fs(sbi);
return 0;
unlock_out:
unlock_page(page);
return (err == -ENOENT) ? 0 : err;
redirty_out:
wbc->pages_skipped++;
set_page_dirty(page);
return AOP_WRITEPAGE_ACTIVATE;
}
#define MAX_DESIRED_PAGES_WP 4096
static int f2fs_write_data_pages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct inode *inode = mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
int ret;
long excess_nrtw = 0, desired_nrtw;
if (wbc->nr_to_write < MAX_DESIRED_PAGES_WP) {
desired_nrtw = MAX_DESIRED_PAGES_WP;
excess_nrtw = desired_nrtw - wbc->nr_to_write;
wbc->nr_to_write = desired_nrtw;
}
if (!S_ISDIR(inode->i_mode))
mutex_lock(&sbi->writepages);
ret = generic_writepages(mapping, wbc);
if (!S_ISDIR(inode->i_mode))
mutex_unlock(&sbi->writepages);
f2fs_submit_bio(sbi, DATA, (wbc->sync_mode == WB_SYNC_ALL));
remove_dirty_dir_inode(inode);
wbc->nr_to_write -= excess_nrtw;
return ret;
}
static int f2fs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
{
struct inode *inode = mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct page *page;
pgoff_t index = ((unsigned long long) pos) >> PAGE_CACHE_SHIFT;
struct dnode_of_data dn;
int err = 0;
/* for nobh_write_end */
*fsdata = NULL;
f2fs_balance_fs(sbi);
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page)
return -ENOMEM;
*pagep = page;
mutex_lock_op(sbi, DATA_NEW);
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, index, 0);
if (err) {
mutex_unlock_op(sbi, DATA_NEW);
f2fs_put_page(page, 1);
return err;
}
if (dn.data_blkaddr == NULL_ADDR) {
err = reserve_new_block(&dn);
if (err) {
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_NEW);
f2fs_put_page(page, 1);
return err;
}
}
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_NEW);
if ((len == PAGE_CACHE_SIZE) || PageUptodate(page))
return 0;
if ((pos & PAGE_CACHE_MASK) >= i_size_read(inode)) {
unsigned start = pos & (PAGE_CACHE_SIZE - 1);
unsigned end = start + len;
/* Reading beyond i_size is simple: memset to zero */
zero_user_segments(page, 0, start, end, PAGE_CACHE_SIZE);
return 0;
}
if (dn.data_blkaddr == NEW_ADDR) {
zero_user_segment(page, 0, PAGE_CACHE_SIZE);
} else {
err = f2fs_readpage(sbi, page, dn.data_blkaddr, READ_SYNC);
if (err) {
f2fs_put_page(page, 1);
return err;
}
}
SetPageUptodate(page);
clear_cold_data(page);
return 0;
}
static ssize_t f2fs_direct_IO(int rw, struct kiocb *iocb,
const struct iovec *iov, loff_t offset, unsigned long nr_segs)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
if (rw == WRITE)
return 0;
/* Needs synchronization with the cleaner */
return blockdev_direct_IO(rw, iocb, inode, iov, offset, nr_segs,
get_data_block_ro);
}
static void f2fs_invalidate_data_page(struct page *page, unsigned long offset)
{
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
if (S_ISDIR(inode->i_mode) && PageDirty(page)) {
dec_page_count(sbi, F2FS_DIRTY_DENTS);
inode_dec_dirty_dents(inode);
}
ClearPagePrivate(page);
}
static int f2fs_release_data_page(struct page *page, gfp_t wait)
{
ClearPagePrivate(page);
return 0;
}
static int f2fs_set_data_page_dirty(struct page *page)
{
struct address_space *mapping = page->mapping;
struct inode *inode = mapping->host;
SetPageUptodate(page);
if (!PageDirty(page)) {
__set_page_dirty_nobuffers(page);
set_dirty_dir_page(inode, page);
return 1;
}
return 0;
}
const struct address_space_operations f2fs_dblock_aops = {
.readpage = f2fs_read_data_page,
.readpages = f2fs_read_data_pages,
.writepage = f2fs_write_data_page,
.writepages = f2fs_write_data_pages,
.write_begin = f2fs_write_begin,
.write_end = nobh_write_end,
.set_page_dirty = f2fs_set_data_page_dirty,
.invalidatepage = f2fs_invalidate_data_page,
.releasepage = f2fs_release_data_page,
.direct_IO = f2fs_direct_IO,
};

361
fs/f2fs/debug.c Normal file
View file

@ -0,0 +1,361 @@
/*
* f2fs debugging statistics
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
* Copyright (c) 2012 Linux Foundation
* Copyright (c) 2012 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/backing-dev.h>
#include <linux/proc_fs.h>
#include <linux/f2fs_fs.h>
#include <linux/blkdev.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
#include "f2fs.h"
#include "node.h"
#include "segment.h"
#include "gc.h"
static LIST_HEAD(f2fs_stat_list);
static struct dentry *debugfs_root;
static void update_general_status(struct f2fs_sb_info *sbi)
{
struct f2fs_stat_info *si = sbi->stat_info;
int i;
/* valid check of the segment numbers */
si->hit_ext = sbi->read_hit_ext;
si->total_ext = sbi->total_hit_ext;
si->ndirty_node = get_pages(sbi, F2FS_DIRTY_NODES);
si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
si->ndirty_dirs = sbi->n_dirty_dirs;
si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
si->rsvd_segs = reserved_segments(sbi);
si->overp_segs = overprovision_segments(sbi);
si->valid_count = valid_user_blocks(sbi);
si->valid_node_count = valid_node_count(sbi);
si->valid_inode_count = valid_inode_count(sbi);
si->utilization = utilization(sbi);
si->free_segs = free_segments(sbi);
si->free_secs = free_sections(sbi);
si->prefree_count = prefree_segments(sbi);
si->dirty_count = dirty_segments(sbi);
si->node_pages = sbi->node_inode->i_mapping->nrpages;
si->meta_pages = sbi->meta_inode->i_mapping->nrpages;
si->nats = NM_I(sbi)->nat_cnt;
si->sits = SIT_I(sbi)->dirty_sentries;
si->fnids = NM_I(sbi)->fcnt;
si->bg_gc = sbi->bg_gc;
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
si->util_valid = (int)(written_block_count(sbi) >>
sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
si->util_invalid = 50 - si->util_free - si->util_valid;
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) {
struct curseg_info *curseg = CURSEG_I(sbi, i);
si->curseg[i] = curseg->segno;
si->cursec[i] = curseg->segno / sbi->segs_per_sec;
si->curzone[i] = si->cursec[i] / sbi->secs_per_zone;
}
for (i = 0; i < 2; i++) {
si->segment_count[i] = sbi->segment_count[i];
si->block_count[i] = sbi->block_count[i];
}
}
/*
* This function calculates BDF of every segments
*/
static void update_sit_info(struct f2fs_sb_info *sbi)
{
struct f2fs_stat_info *si = sbi->stat_info;
unsigned int blks_per_sec, hblks_per_sec, total_vblocks, bimodal, dist;
struct sit_info *sit_i = SIT_I(sbi);
unsigned int segno, vblocks;
int ndirty = 0;
bimodal = 0;
total_vblocks = 0;
blks_per_sec = sbi->segs_per_sec * (1 << sbi->log_blocks_per_seg);
hblks_per_sec = blks_per_sec / 2;
mutex_lock(&sit_i->sentry_lock);
for (segno = 0; segno < TOTAL_SEGS(sbi); segno += sbi->segs_per_sec) {
vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec);
dist = abs(vblocks - hblks_per_sec);
bimodal += dist * dist;
if (vblocks > 0 && vblocks < blks_per_sec) {
total_vblocks += vblocks;
ndirty++;
}
}
mutex_unlock(&sit_i->sentry_lock);
dist = sbi->total_sections * hblks_per_sec * hblks_per_sec / 100;
si->bimodal = bimodal / dist;
if (si->dirty_count)
si->avg_vblocks = total_vblocks / ndirty;
else
si->avg_vblocks = 0;
}
/*
* This function calculates memory footprint.
*/
static void update_mem_info(struct f2fs_sb_info *sbi)
{
struct f2fs_stat_info *si = sbi->stat_info;
unsigned npages;
if (si->base_mem)
goto get_cache;
si->base_mem = sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize;
si->base_mem += 2 * sizeof(struct f2fs_inode_info);
si->base_mem += sizeof(*sbi->ckpt);
/* build sm */
si->base_mem += sizeof(struct f2fs_sm_info);
/* build sit */
si->base_mem += sizeof(struct sit_info);
si->base_mem += TOTAL_SEGS(sbi) * sizeof(struct seg_entry);
si->base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
si->base_mem += 2 * SIT_VBLOCK_MAP_SIZE * TOTAL_SEGS(sbi);
if (sbi->segs_per_sec > 1)
si->base_mem += sbi->total_sections *
sizeof(struct sec_entry);
si->base_mem += __bitmap_size(sbi, SIT_BITMAP);
/* build free segmap */
si->base_mem += sizeof(struct free_segmap_info);
si->base_mem += f2fs_bitmap_size(TOTAL_SEGS(sbi));
si->base_mem += f2fs_bitmap_size(sbi->total_sections);
/* build curseg */
si->base_mem += sizeof(struct curseg_info) * NR_CURSEG_TYPE;
si->base_mem += PAGE_CACHE_SIZE * NR_CURSEG_TYPE;
/* build dirty segmap */
si->base_mem += sizeof(struct dirty_seglist_info);
si->base_mem += NR_DIRTY_TYPE * f2fs_bitmap_size(TOTAL_SEGS(sbi));
si->base_mem += 2 * f2fs_bitmap_size(TOTAL_SEGS(sbi));
/* buld nm */
si->base_mem += sizeof(struct f2fs_nm_info);
si->base_mem += __bitmap_size(sbi, NAT_BITMAP);
/* build gc */
si->base_mem += sizeof(struct f2fs_gc_kthread);
get_cache:
/* free nids */
si->cache_mem = NM_I(sbi)->fcnt;
si->cache_mem += NM_I(sbi)->nat_cnt;
npages = sbi->node_inode->i_mapping->nrpages;
si->cache_mem += npages << PAGE_CACHE_SHIFT;
npages = sbi->meta_inode->i_mapping->nrpages;
si->cache_mem += npages << PAGE_CACHE_SHIFT;
si->cache_mem += sbi->n_orphans * sizeof(struct orphan_inode_entry);
si->cache_mem += sbi->n_dirty_dirs * sizeof(struct dir_inode_entry);
}
static int stat_show(struct seq_file *s, void *v)
{
struct f2fs_stat_info *si, *next;
int i = 0;
int j;
list_for_each_entry_safe(si, next, &f2fs_stat_list, stat_list) {
mutex_lock(&si->stat_lock);
if (!si->sbi) {
mutex_unlock(&si->stat_lock);
continue;
}
update_general_status(si->sbi);
seq_printf(s, "\n=====[ partition info. #%d ]=====\n", i++);
seq_printf(s, "[SB: 1] [CP: 2] [NAT: %d] [SIT: %d] ",
si->nat_area_segs, si->sit_area_segs);
seq_printf(s, "[SSA: %d] [MAIN: %d",
si->ssa_area_segs, si->main_area_segs);
seq_printf(s, "(OverProv:%d Resv:%d)]\n\n",
si->overp_segs, si->rsvd_segs);
seq_printf(s, "Utilization: %d%% (%d valid blocks)\n",
si->utilization, si->valid_count);
seq_printf(s, " - Node: %u (Inode: %u, ",
si->valid_node_count, si->valid_inode_count);
seq_printf(s, "Other: %u)\n - Data: %u\n",
si->valid_node_count - si->valid_inode_count,
si->valid_count - si->valid_node_count);
seq_printf(s, "\nMain area: %d segs, %d secs %d zones\n",
si->main_area_segs, si->main_area_sections,
si->main_area_zones);
seq_printf(s, " - COLD data: %d, %d, %d\n",
si->curseg[CURSEG_COLD_DATA],
si->cursec[CURSEG_COLD_DATA],
si->curzone[CURSEG_COLD_DATA]);
seq_printf(s, " - WARM data: %d, %d, %d\n",
si->curseg[CURSEG_WARM_DATA],
si->cursec[CURSEG_WARM_DATA],
si->curzone[CURSEG_WARM_DATA]);
seq_printf(s, " - HOT data: %d, %d, %d\n",
si->curseg[CURSEG_HOT_DATA],
si->cursec[CURSEG_HOT_DATA],
si->curzone[CURSEG_HOT_DATA]);
seq_printf(s, " - Dir dnode: %d, %d, %d\n",
si->curseg[CURSEG_HOT_NODE],
si->cursec[CURSEG_HOT_NODE],
si->curzone[CURSEG_HOT_NODE]);
seq_printf(s, " - File dnode: %d, %d, %d\n",
si->curseg[CURSEG_WARM_NODE],
si->cursec[CURSEG_WARM_NODE],
si->curzone[CURSEG_WARM_NODE]);
seq_printf(s, " - Indir nodes: %d, %d, %d\n",
si->curseg[CURSEG_COLD_NODE],
si->cursec[CURSEG_COLD_NODE],
si->curzone[CURSEG_COLD_NODE]);
seq_printf(s, "\n - Valid: %d\n - Dirty: %d\n",
si->main_area_segs - si->dirty_count -
si->prefree_count - si->free_segs,
si->dirty_count);
seq_printf(s, " - Prefree: %d\n - Free: %d (%d)\n\n",
si->prefree_count, si->free_segs, si->free_secs);
seq_printf(s, "GC calls: %d (BG: %d)\n",
si->call_count, si->bg_gc);
seq_printf(s, " - data segments : %d\n", si->data_segs);
seq_printf(s, " - node segments : %d\n", si->node_segs);
seq_printf(s, "Try to move %d blocks\n", si->tot_blks);
seq_printf(s, " - data blocks : %d\n", si->data_blks);
seq_printf(s, " - node blocks : %d\n", si->node_blks);
seq_printf(s, "\nExtent Hit Ratio: %d / %d\n",
si->hit_ext, si->total_ext);
seq_printf(s, "\nBalancing F2FS Async:\n");
seq_printf(s, " - nodes %4d in %4d\n",
si->ndirty_node, si->node_pages);
seq_printf(s, " - dents %4d in dirs:%4d\n",
si->ndirty_dent, si->ndirty_dirs);
seq_printf(s, " - meta %4d in %4d\n",
si->ndirty_meta, si->meta_pages);
seq_printf(s, " - NATs %5d > %lu\n",
si->nats, NM_WOUT_THRESHOLD);
seq_printf(s, " - SITs: %5d\n - free_nids: %5d\n",
si->sits, si->fnids);
seq_printf(s, "\nDistribution of User Blocks:");
seq_printf(s, " [ valid | invalid | free ]\n");
seq_printf(s, " [");
for (j = 0; j < si->util_valid; j++)
seq_printf(s, "-");
seq_printf(s, "|");
for (j = 0; j < si->util_invalid; j++)
seq_printf(s, "-");
seq_printf(s, "|");
for (j = 0; j < si->util_free; j++)
seq_printf(s, "-");
seq_printf(s, "]\n\n");
seq_printf(s, "SSR: %u blocks in %u segments\n",
si->block_count[SSR], si->segment_count[SSR]);
seq_printf(s, "LFS: %u blocks in %u segments\n",
si->block_count[LFS], si->segment_count[LFS]);
/* segment usage info */
update_sit_info(si->sbi);
seq_printf(s, "\nBDF: %u, avg. vblocks: %u\n",
si->bimodal, si->avg_vblocks);
/* memory footprint */
update_mem_info(si->sbi);
seq_printf(s, "\nMemory: %u KB = static: %u + cached: %u\n",
(si->base_mem + si->cache_mem) >> 10,
si->base_mem >> 10, si->cache_mem >> 10);
mutex_unlock(&si->stat_lock);
}
return 0;
}
static int stat_open(struct inode *inode, struct file *file)
{
return single_open(file, stat_show, inode->i_private);
}
static const struct file_operations stat_fops = {
.open = stat_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int init_stats(struct f2fs_sb_info *sbi)
{
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
struct f2fs_stat_info *si;
sbi->stat_info = kzalloc(sizeof(struct f2fs_stat_info), GFP_KERNEL);
if (!sbi->stat_info)
return -ENOMEM;
si = sbi->stat_info;
mutex_init(&si->stat_lock);
list_add_tail(&si->stat_list, &f2fs_stat_list);
si->all_area_segs = le32_to_cpu(raw_super->segment_count);
si->sit_area_segs = le32_to_cpu(raw_super->segment_count_sit);
si->nat_area_segs = le32_to_cpu(raw_super->segment_count_nat);
si->ssa_area_segs = le32_to_cpu(raw_super->segment_count_ssa);
si->main_area_segs = le32_to_cpu(raw_super->segment_count_main);
si->main_area_sections = le32_to_cpu(raw_super->section_count);
si->main_area_zones = si->main_area_sections /
le32_to_cpu(raw_super->secs_per_zone);
si->sbi = sbi;
return 0;
}
int f2fs_build_stats(struct f2fs_sb_info *sbi)
{
int retval;
retval = init_stats(sbi);
if (retval)
return retval;
if (!debugfs_root)
debugfs_root = debugfs_create_dir("f2fs", NULL);
debugfs_create_file("status", S_IRUGO, debugfs_root, NULL, &stat_fops);
return 0;
}
void f2fs_destroy_stats(struct f2fs_sb_info *sbi)
{
struct f2fs_stat_info *si = sbi->stat_info;
list_del(&si->stat_list);
mutex_lock(&si->stat_lock);
si->sbi = NULL;
mutex_unlock(&si->stat_lock);
kfree(sbi->stat_info);
}
void destroy_root_stats(void)
{
debugfs_remove_recursive(debugfs_root);
debugfs_root = NULL;
}

672
fs/f2fs/dir.c Normal file
View file

@ -0,0 +1,672 @@
/*
* fs/f2fs/dir.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include "f2fs.h"
#include "acl.h"
static unsigned long dir_blocks(struct inode *inode)
{
return ((unsigned long long) (i_size_read(inode) + PAGE_CACHE_SIZE - 1))
>> PAGE_CACHE_SHIFT;
}
static unsigned int dir_buckets(unsigned int level)
{
if (level < MAX_DIR_HASH_DEPTH / 2)
return 1 << level;
else
return 1 << ((MAX_DIR_HASH_DEPTH / 2) - 1);
}
static unsigned int bucket_blocks(unsigned int level)
{
if (level < MAX_DIR_HASH_DEPTH / 2)
return 2;
else
return 4;
}
static unsigned char f2fs_filetype_table[F2FS_FT_MAX] = {
[F2FS_FT_UNKNOWN] = DT_UNKNOWN,
[F2FS_FT_REG_FILE] = DT_REG,
[F2FS_FT_DIR] = DT_DIR,
[F2FS_FT_CHRDEV] = DT_CHR,
[F2FS_FT_BLKDEV] = DT_BLK,
[F2FS_FT_FIFO] = DT_FIFO,
[F2FS_FT_SOCK] = DT_SOCK,
[F2FS_FT_SYMLINK] = DT_LNK,
};
#define S_SHIFT 12
static unsigned char f2fs_type_by_mode[S_IFMT >> S_SHIFT] = {
[S_IFREG >> S_SHIFT] = F2FS_FT_REG_FILE,
[S_IFDIR >> S_SHIFT] = F2FS_FT_DIR,
[S_IFCHR >> S_SHIFT] = F2FS_FT_CHRDEV,
[S_IFBLK >> S_SHIFT] = F2FS_FT_BLKDEV,
[S_IFIFO >> S_SHIFT] = F2FS_FT_FIFO,
[S_IFSOCK >> S_SHIFT] = F2FS_FT_SOCK,
[S_IFLNK >> S_SHIFT] = F2FS_FT_SYMLINK,
};
static void set_de_type(struct f2fs_dir_entry *de, struct inode *inode)
{
mode_t mode = inode->i_mode;
de->file_type = f2fs_type_by_mode[(mode & S_IFMT) >> S_SHIFT];
}
static unsigned long dir_block_index(unsigned int level, unsigned int idx)
{
unsigned long i;
unsigned long bidx = 0;
for (i = 0; i < level; i++)
bidx += dir_buckets(i) * bucket_blocks(i);
bidx += idx * bucket_blocks(level);
return bidx;
}
static bool early_match_name(const char *name, int namelen,
f2fs_hash_t namehash, struct f2fs_dir_entry *de)
{
if (le16_to_cpu(de->name_len) != namelen)
return false;
if (de->hash_code != namehash)
return false;
return true;
}
static struct f2fs_dir_entry *find_in_block(struct page *dentry_page,
const char *name, int namelen, int *max_slots,
f2fs_hash_t namehash, struct page **res_page)
{
struct f2fs_dir_entry *de;
unsigned long bit_pos, end_pos, next_pos;
struct f2fs_dentry_block *dentry_blk = kmap(dentry_page);
int slots;
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK, 0);
while (bit_pos < NR_DENTRY_IN_BLOCK) {
de = &dentry_blk->dentry[bit_pos];
slots = GET_DENTRY_SLOTS(le16_to_cpu(de->name_len));
if (early_match_name(name, namelen, namehash, de)) {
if (!memcmp(dentry_blk->filename[bit_pos],
name, namelen)) {
*res_page = dentry_page;
goto found;
}
}
next_pos = bit_pos + slots;
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK, next_pos);
if (bit_pos >= NR_DENTRY_IN_BLOCK)
end_pos = NR_DENTRY_IN_BLOCK;
else
end_pos = bit_pos;
if (*max_slots < end_pos - next_pos)
*max_slots = end_pos - next_pos;
}
de = NULL;
kunmap(dentry_page);
found:
return de;
}
static struct f2fs_dir_entry *find_in_level(struct inode *dir,
unsigned int level, const char *name, int namelen,
f2fs_hash_t namehash, struct page **res_page)
{
int s = GET_DENTRY_SLOTS(namelen);
unsigned int nbucket, nblock;
unsigned int bidx, end_block;
struct page *dentry_page;
struct f2fs_dir_entry *de = NULL;
bool room = false;
int max_slots = 0;
BUG_ON(level > MAX_DIR_HASH_DEPTH);
nbucket = dir_buckets(level);
nblock = bucket_blocks(level);
bidx = dir_block_index(level, le32_to_cpu(namehash) % nbucket);
end_block = bidx + nblock;
for (; bidx < end_block; bidx++) {
/* no need to allocate new dentry pages to all the indices */
dentry_page = find_data_page(dir, bidx);
if (IS_ERR(dentry_page)) {
room = true;
continue;
}
de = find_in_block(dentry_page, name, namelen,
&max_slots, namehash, res_page);
if (de)
break;
if (max_slots >= s)
room = true;
f2fs_put_page(dentry_page, 0);
}
if (!de && room && F2FS_I(dir)->chash != namehash) {
F2FS_I(dir)->chash = namehash;
F2FS_I(dir)->clevel = level;
}
return de;
}
/*
* Find an entry in the specified directory with the wanted name.
* It returns the page where the entry was found (as a parameter - res_page),
* and the entry itself. Page is returned mapped and unlocked.
* Entry is guaranteed to be valid.
*/
struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
struct qstr *child, struct page **res_page)
{
const char *name = child->name;
int namelen = child->len;
unsigned long npages = dir_blocks(dir);
struct f2fs_dir_entry *de = NULL;
f2fs_hash_t name_hash;
unsigned int max_depth;
unsigned int level;
if (npages == 0)
return NULL;
*res_page = NULL;
name_hash = f2fs_dentry_hash(name, namelen);
max_depth = F2FS_I(dir)->i_current_depth;
for (level = 0; level < max_depth; level++) {
de = find_in_level(dir, level, name,
namelen, name_hash, res_page);
if (de)
break;
}
if (!de && F2FS_I(dir)->chash != name_hash) {
F2FS_I(dir)->chash = name_hash;
F2FS_I(dir)->clevel = level - 1;
}
return de;
}
struct f2fs_dir_entry *f2fs_parent_dir(struct inode *dir, struct page **p)
{
struct page *page = NULL;
struct f2fs_dir_entry *de = NULL;
struct f2fs_dentry_block *dentry_blk = NULL;
page = get_lock_data_page(dir, 0);
if (IS_ERR(page))
return NULL;
dentry_blk = kmap(page);
de = &dentry_blk->dentry[1];
*p = page;
unlock_page(page);
return de;
}
ino_t f2fs_inode_by_name(struct inode *dir, struct qstr *qstr)
{
ino_t res = 0;
struct f2fs_dir_entry *de;
struct page *page;
de = f2fs_find_entry(dir, qstr, &page);
if (de) {
res = le32_to_cpu(de->ino);
kunmap(page);
f2fs_put_page(page, 0);
}
return res;
}
void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de,
struct page *page, struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
mutex_lock_op(sbi, DENTRY_OPS);
lock_page(page);
wait_on_page_writeback(page);
de->ino = cpu_to_le32(inode->i_ino);
set_de_type(de, inode);
kunmap(page);
set_page_dirty(page);
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
mark_inode_dirty(dir);
/* update parent inode number before releasing dentry page */
F2FS_I(inode)->i_pino = dir->i_ino;
f2fs_put_page(page, 1);
mutex_unlock_op(sbi, DENTRY_OPS);
}
void init_dent_inode(struct dentry *dentry, struct page *ipage)
{
struct f2fs_node *rn;
if (IS_ERR(ipage))
return;
wait_on_page_writeback(ipage);
/* copy dentry info. to this inode page */
rn = (struct f2fs_node *)page_address(ipage);
rn->i.i_namelen = cpu_to_le32(dentry->d_name.len);
memcpy(rn->i.i_name, dentry->d_name.name, dentry->d_name.len);
set_page_dirty(ipage);
}
static int init_inode_metadata(struct inode *inode, struct dentry *dentry)
{
struct inode *dir = dentry->d_parent->d_inode;
if (is_inode_flag_set(F2FS_I(inode), FI_NEW_INODE)) {
int err;
err = new_inode_page(inode, dentry);
if (err)
return err;
if (S_ISDIR(inode->i_mode)) {
err = f2fs_make_empty(inode, dir);
if (err) {
remove_inode_page(inode);
return err;
}
}
err = f2fs_init_acl(inode, dir);
if (err) {
remove_inode_page(inode);
return err;
}
} else {
struct page *ipage;
ipage = get_node_page(F2FS_SB(dir->i_sb), inode->i_ino);
if (IS_ERR(ipage))
return PTR_ERR(ipage);
init_dent_inode(dentry, ipage);
f2fs_put_page(ipage, 1);
}
if (is_inode_flag_set(F2FS_I(inode), FI_INC_LINK)) {
inc_nlink(inode);
f2fs_write_inode(inode, NULL);
}
return 0;
}
static void update_parent_metadata(struct inode *dir, struct inode *inode,
unsigned int current_depth)
{
bool need_dir_update = false;
if (is_inode_flag_set(F2FS_I(inode), FI_NEW_INODE)) {
if (S_ISDIR(inode->i_mode)) {
inc_nlink(dir);
need_dir_update = true;
}
clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
}
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
if (F2FS_I(dir)->i_current_depth != current_depth) {
F2FS_I(dir)->i_current_depth = current_depth;
need_dir_update = true;
}
if (need_dir_update)
f2fs_write_inode(dir, NULL);
else
mark_inode_dirty(dir);
if (is_inode_flag_set(F2FS_I(inode), FI_INC_LINK))
clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
}
static int room_for_filename(struct f2fs_dentry_block *dentry_blk, int slots)
{
int bit_start = 0;
int zero_start, zero_end;
next:
zero_start = find_next_zero_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
bit_start);
if (zero_start >= NR_DENTRY_IN_BLOCK)
return NR_DENTRY_IN_BLOCK;
zero_end = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
zero_start);
if (zero_end - zero_start >= slots)
return zero_start;
bit_start = zero_end + 1;
if (zero_end + 1 >= NR_DENTRY_IN_BLOCK)
return NR_DENTRY_IN_BLOCK;
goto next;
}
int f2fs_add_link(struct dentry *dentry, struct inode *inode)
{
unsigned int bit_pos;
unsigned int level;
unsigned int current_depth;
unsigned long bidx, block;
f2fs_hash_t dentry_hash;
struct f2fs_dir_entry *de;
unsigned int nbucket, nblock;
struct inode *dir = dentry->d_parent->d_inode;
struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
const char *name = dentry->d_name.name;
int namelen = dentry->d_name.len;
struct page *dentry_page = NULL;
struct f2fs_dentry_block *dentry_blk = NULL;
int slots = GET_DENTRY_SLOTS(namelen);
int err = 0;
int i;
dentry_hash = f2fs_dentry_hash(name, dentry->d_name.len);
level = 0;
current_depth = F2FS_I(dir)->i_current_depth;
if (F2FS_I(dir)->chash == dentry_hash) {
level = F2FS_I(dir)->clevel;
F2FS_I(dir)->chash = 0;
}
start:
if (current_depth == MAX_DIR_HASH_DEPTH)
return -ENOSPC;
/* Increase the depth, if required */
if (level == current_depth)
++current_depth;
nbucket = dir_buckets(level);
nblock = bucket_blocks(level);
bidx = dir_block_index(level, (le32_to_cpu(dentry_hash) % nbucket));
for (block = bidx; block <= (bidx + nblock - 1); block++) {
mutex_lock_op(sbi, DENTRY_OPS);
dentry_page = get_new_data_page(dir, block, true);
if (IS_ERR(dentry_page)) {
mutex_unlock_op(sbi, DENTRY_OPS);
return PTR_ERR(dentry_page);
}
dentry_blk = kmap(dentry_page);
bit_pos = room_for_filename(dentry_blk, slots);
if (bit_pos < NR_DENTRY_IN_BLOCK)
goto add_dentry;
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
mutex_unlock_op(sbi, DENTRY_OPS);
}
/* Move to next level to find the empty slot for new dentry */
++level;
goto start;
add_dentry:
err = init_inode_metadata(inode, dentry);
if (err)
goto fail;
wait_on_page_writeback(dentry_page);
de = &dentry_blk->dentry[bit_pos];
de->hash_code = dentry_hash;
de->name_len = cpu_to_le16(namelen);
memcpy(dentry_blk->filename[bit_pos], name, namelen);
de->ino = cpu_to_le32(inode->i_ino);
set_de_type(de, inode);
for (i = 0; i < slots; i++)
test_and_set_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap);
set_page_dirty(dentry_page);
update_parent_metadata(dir, inode, current_depth);
/* update parent inode number before releasing dentry page */
F2FS_I(inode)->i_pino = dir->i_ino;
fail:
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
mutex_unlock_op(sbi, DENTRY_OPS);
return err;
}
/*
* It only removes the dentry from the dentry page,corresponding name
* entry in name page does not need to be touched during deletion.
*/
void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
struct inode *inode)
{
struct f2fs_dentry_block *dentry_blk;
unsigned int bit_pos;
struct address_space *mapping = page->mapping;
struct inode *dir = mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len));
void *kaddr = page_address(page);
int i;
mutex_lock_op(sbi, DENTRY_OPS);
lock_page(page);
wait_on_page_writeback(page);
dentry_blk = (struct f2fs_dentry_block *)kaddr;
bit_pos = dentry - (struct f2fs_dir_entry *)dentry_blk->dentry;
for (i = 0; i < slots; i++)
test_and_clear_bit_le(bit_pos + i, &dentry_blk->dentry_bitmap);
/* Let's check and deallocate this dentry page */
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
0);
kunmap(page); /* kunmap - pair of f2fs_find_entry */
set_page_dirty(page);
dir->i_ctime = dir->i_mtime = CURRENT_TIME;
if (inode && S_ISDIR(inode->i_mode)) {
drop_nlink(dir);
f2fs_write_inode(dir, NULL);
} else {
mark_inode_dirty(dir);
}
if (inode) {
inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
drop_nlink(inode);
if (S_ISDIR(inode->i_mode)) {
drop_nlink(inode);
i_size_write(inode, 0);
}
f2fs_write_inode(inode, NULL);
if (inode->i_nlink == 0)
add_orphan_inode(sbi, inode->i_ino);
}
if (bit_pos == NR_DENTRY_IN_BLOCK) {
truncate_hole(dir, page->index, page->index + 1);
clear_page_dirty_for_io(page);
ClearPageUptodate(page);
dec_page_count(sbi, F2FS_DIRTY_DENTS);
inode_dec_dirty_dents(dir);
}
f2fs_put_page(page, 1);
mutex_unlock_op(sbi, DENTRY_OPS);
}
int f2fs_make_empty(struct inode *inode, struct inode *parent)
{
struct page *dentry_page;
struct f2fs_dentry_block *dentry_blk;
struct f2fs_dir_entry *de;
void *kaddr;
dentry_page = get_new_data_page(inode, 0, true);
if (IS_ERR(dentry_page))
return PTR_ERR(dentry_page);
kaddr = kmap_atomic(dentry_page);
dentry_blk = (struct f2fs_dentry_block *)kaddr;
de = &dentry_blk->dentry[0];
de->name_len = cpu_to_le16(1);
de->hash_code = 0;
de->ino = cpu_to_le32(inode->i_ino);
memcpy(dentry_blk->filename[0], ".", 1);
set_de_type(de, inode);
de = &dentry_blk->dentry[1];
de->hash_code = 0;
de->name_len = cpu_to_le16(2);
de->ino = cpu_to_le32(parent->i_ino);
memcpy(dentry_blk->filename[1], "..", 2);
set_de_type(de, inode);
test_and_set_bit_le(0, &dentry_blk->dentry_bitmap);
test_and_set_bit_le(1, &dentry_blk->dentry_bitmap);
kunmap_atomic(kaddr);
set_page_dirty(dentry_page);
f2fs_put_page(dentry_page, 1);
return 0;
}
bool f2fs_empty_dir(struct inode *dir)
{
unsigned long bidx;
struct page *dentry_page;
unsigned int bit_pos;
struct f2fs_dentry_block *dentry_blk;
unsigned long nblock = dir_blocks(dir);
for (bidx = 0; bidx < nblock; bidx++) {
void *kaddr;
dentry_page = get_lock_data_page(dir, bidx);
if (IS_ERR(dentry_page)) {
if (PTR_ERR(dentry_page) == -ENOENT)
continue;
else
return false;
}
kaddr = kmap_atomic(dentry_page);
dentry_blk = (struct f2fs_dentry_block *)kaddr;
if (bidx == 0)
bit_pos = 2;
else
bit_pos = 0;
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
bit_pos);
kunmap_atomic(kaddr);
f2fs_put_page(dentry_page, 1);
if (bit_pos < NR_DENTRY_IN_BLOCK)
return false;
}
return true;
}
static int f2fs_readdir(struct file *file, void *dirent, filldir_t filldir)
{
unsigned long pos = file->f_pos;
struct inode *inode = file->f_dentry->d_inode;
unsigned long npages = dir_blocks(inode);
unsigned char *types = NULL;
unsigned int bit_pos = 0, start_bit_pos = 0;
int over = 0;
struct f2fs_dentry_block *dentry_blk = NULL;
struct f2fs_dir_entry *de = NULL;
struct page *dentry_page = NULL;
unsigned int n = 0;
unsigned char d_type = DT_UNKNOWN;
int slots;
types = f2fs_filetype_table;
bit_pos = (pos % NR_DENTRY_IN_BLOCK);
n = (pos / NR_DENTRY_IN_BLOCK);
for ( ; n < npages; n++) {
dentry_page = get_lock_data_page(inode, n);
if (IS_ERR(dentry_page))
continue;
start_bit_pos = bit_pos;
dentry_blk = kmap(dentry_page);
while (bit_pos < NR_DENTRY_IN_BLOCK) {
d_type = DT_UNKNOWN;
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
bit_pos);
if (bit_pos >= NR_DENTRY_IN_BLOCK)
break;
de = &dentry_blk->dentry[bit_pos];
if (types && de->file_type < F2FS_FT_MAX)
d_type = types[de->file_type];
over = filldir(dirent,
dentry_blk->filename[bit_pos],
le16_to_cpu(de->name_len),
(n * NR_DENTRY_IN_BLOCK) + bit_pos,
le32_to_cpu(de->ino), d_type);
if (over) {
file->f_pos += bit_pos - start_bit_pos;
goto success;
}
slots = GET_DENTRY_SLOTS(le16_to_cpu(de->name_len));
bit_pos += slots;
}
bit_pos = 0;
file->f_pos = (n + 1) * NR_DENTRY_IN_BLOCK;
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
dentry_page = NULL;
}
success:
if (dentry_page && !IS_ERR(dentry_page)) {
kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
}
return 0;
}
const struct file_operations f2fs_dir_operations = {
.llseek = generic_file_llseek,
.read = generic_read_dir,
.readdir = f2fs_readdir,
.fsync = f2fs_sync_file,
.unlocked_ioctl = f2fs_ioctl,
};

1083
fs/f2fs/f2fs.h Normal file

File diff suppressed because it is too large Load diff

636
fs/f2fs/file.c Normal file
View file

@ -0,0 +1,636 @@
/*
* fs/f2fs/file.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/stat.h>
#include <linux/buffer_head.h>
#include <linux/writeback.h>
#include <linux/falloc.h>
#include <linux/types.h>
#include <linux/uaccess.h>
#include <linux/mount.h>
#include "f2fs.h"
#include "node.h"
#include "segment.h"
#include "xattr.h"
#include "acl.h"
static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma,
struct vm_fault *vmf)
{
struct page *page = vmf->page;
struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
block_t old_blk_addr;
struct dnode_of_data dn;
int err;
f2fs_balance_fs(sbi);
sb_start_pagefault(inode->i_sb);
mutex_lock_op(sbi, DATA_NEW);
/* block allocation */
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, page->index, 0);
if (err) {
mutex_unlock_op(sbi, DATA_NEW);
goto out;
}
old_blk_addr = dn.data_blkaddr;
if (old_blk_addr == NULL_ADDR) {
err = reserve_new_block(&dn);
if (err) {
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_NEW);
goto out;
}
}
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_NEW);
lock_page(page);
if (page->mapping != inode->i_mapping ||
page_offset(page) >= i_size_read(inode) ||
!PageUptodate(page)) {
unlock_page(page);
err = -EFAULT;
goto out;
}
/*
* check to see if the page is mapped already (no holes)
*/
if (PageMappedToDisk(page))
goto out;
/* fill the page */
wait_on_page_writeback(page);
/* page is wholly or partially inside EOF */
if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode)) {
unsigned offset;
offset = i_size_read(inode) & ~PAGE_CACHE_MASK;
zero_user_segment(page, offset, PAGE_CACHE_SIZE);
}
set_page_dirty(page);
SetPageUptodate(page);
file_update_time(vma->vm_file);
out:
sb_end_pagefault(inode->i_sb);
return block_page_mkwrite_return(err);
}
static const struct vm_operations_struct f2fs_file_vm_ops = {
.fault = filemap_fault,
.page_mkwrite = f2fs_vm_page_mkwrite,
};
static int need_to_sync_dir(struct f2fs_sb_info *sbi, struct inode *inode)
{
struct dentry *dentry;
nid_t pino;
inode = igrab(inode);
dentry = d_find_any_alias(inode);
if (!dentry) {
iput(inode);
return 0;
}
pino = dentry->d_parent->d_inode->i_ino;
dput(dentry);
iput(inode);
return !is_checkpointed_node(sbi, pino);
}
int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
struct inode *inode = file->f_mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
unsigned long long cur_version;
int ret = 0;
bool need_cp = false;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
.for_reclaim = 0,
};
if (inode->i_sb->s_flags & MS_RDONLY)
return 0;
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret)
return ret;
mutex_lock(&inode->i_mutex);
if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
goto out;
mutex_lock(&sbi->cp_mutex);
cur_version = le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver);
mutex_unlock(&sbi->cp_mutex);
if (F2FS_I(inode)->data_version != cur_version &&
!(inode->i_state & I_DIRTY))
goto out;
F2FS_I(inode)->data_version--;
if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
need_cp = true;
if (is_inode_flag_set(F2FS_I(inode), FI_NEED_CP))
need_cp = true;
if (!space_for_roll_forward(sbi))
need_cp = true;
if (need_to_sync_dir(sbi, inode))
need_cp = true;
f2fs_write_inode(inode, NULL);
if (need_cp) {
/* all the dirty node pages should be flushed for POR */
ret = f2fs_sync_fs(inode->i_sb, 1);
clear_inode_flag(F2FS_I(inode), FI_NEED_CP);
} else {
while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0)
f2fs_write_inode(inode, NULL);
filemap_fdatawait_range(sbi->node_inode->i_mapping,
0, LONG_MAX);
}
out:
mutex_unlock(&inode->i_mutex);
return ret;
}
static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
file_accessed(file);
vma->vm_ops = &f2fs_file_vm_ops;
return 0;
}
static int truncate_data_blocks_range(struct dnode_of_data *dn, int count)
{
int nr_free = 0, ofs = dn->ofs_in_node;
struct f2fs_sb_info *sbi = F2FS_SB(dn->inode->i_sb);
struct f2fs_node *raw_node;
__le32 *addr;
raw_node = page_address(dn->node_page);
addr = blkaddr_in_node(raw_node) + ofs;
for ( ; count > 0; count--, addr++, dn->ofs_in_node++) {
block_t blkaddr = le32_to_cpu(*addr);
if (blkaddr == NULL_ADDR)
continue;
update_extent_cache(NULL_ADDR, dn);
invalidate_blocks(sbi, blkaddr);
dec_valid_block_count(sbi, dn->inode, 1);
nr_free++;
}
if (nr_free) {
set_page_dirty(dn->node_page);
sync_inode_page(dn);
}
dn->ofs_in_node = ofs;
return nr_free;
}
void truncate_data_blocks(struct dnode_of_data *dn)
{
truncate_data_blocks_range(dn, ADDRS_PER_BLOCK);
}
static void truncate_partial_data_page(struct inode *inode, u64 from)
{
unsigned offset = from & (PAGE_CACHE_SIZE - 1);
struct page *page;
if (!offset)
return;
page = find_data_page(inode, from >> PAGE_CACHE_SHIFT);
if (IS_ERR(page))
return;
lock_page(page);
wait_on_page_writeback(page);
zero_user(page, offset, PAGE_CACHE_SIZE - offset);
set_page_dirty(page);
f2fs_put_page(page, 1);
}
static int truncate_blocks(struct inode *inode, u64 from)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
unsigned int blocksize = inode->i_sb->s_blocksize;
struct dnode_of_data dn;
pgoff_t free_from;
int count = 0;
int err;
free_from = (pgoff_t)
((from + blocksize - 1) >> (sbi->log_blocksize));
mutex_lock_op(sbi, DATA_TRUNC);
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, free_from, RDONLY_NODE);
if (err) {
if (err == -ENOENT)
goto free_next;
mutex_unlock_op(sbi, DATA_TRUNC);
return err;
}
if (IS_INODE(dn.node_page))
count = ADDRS_PER_INODE;
else
count = ADDRS_PER_BLOCK;
count -= dn.ofs_in_node;
BUG_ON(count < 0);
if (dn.ofs_in_node || IS_INODE(dn.node_page)) {
truncate_data_blocks_range(&dn, count);
free_from += count;
}
f2fs_put_dnode(&dn);
free_next:
err = truncate_inode_blocks(inode, free_from);
mutex_unlock_op(sbi, DATA_TRUNC);
/* lastly zero out the first data page */
truncate_partial_data_page(inode, from);
return err;
}
void f2fs_truncate(struct inode *inode)
{
if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
S_ISLNK(inode->i_mode)))
return;
if (!truncate_blocks(inode, i_size_read(inode))) {
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
mark_inode_dirty(inode);
}
f2fs_balance_fs(F2FS_SB(inode->i_sb));
}
static int f2fs_getattr(struct vfsmount *mnt,
struct dentry *dentry, struct kstat *stat)
{
struct inode *inode = dentry->d_inode;
generic_fillattr(inode, stat);
stat->blocks <<= 3;
return 0;
}
#ifdef CONFIG_F2FS_FS_POSIX_ACL
static void __setattr_copy(struct inode *inode, const struct iattr *attr)
{
struct f2fs_inode_info *fi = F2FS_I(inode);
unsigned int ia_valid = attr->ia_valid;
if (ia_valid & ATTR_UID)
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
if (ia_valid & ATTR_ATIME)
inode->i_atime = timespec_trunc(attr->ia_atime,
inode->i_sb->s_time_gran);
if (ia_valid & ATTR_MTIME)
inode->i_mtime = timespec_trunc(attr->ia_mtime,
inode->i_sb->s_time_gran);
if (ia_valid & ATTR_CTIME)
inode->i_ctime = timespec_trunc(attr->ia_ctime,
inode->i_sb->s_time_gran);
if (ia_valid & ATTR_MODE) {
umode_t mode = attr->ia_mode;
if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
mode &= ~S_ISGID;
set_acl_inode(fi, mode);
}
}
#else
#define __setattr_copy setattr_copy
#endif
int f2fs_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = dentry->d_inode;
struct f2fs_inode_info *fi = F2FS_I(inode);
int err;
err = inode_change_ok(inode, attr);
if (err)
return err;
if ((attr->ia_valid & ATTR_SIZE) &&
attr->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attr->ia_size);
f2fs_truncate(inode);
}
__setattr_copy(inode, attr);
if (attr->ia_valid & ATTR_MODE) {
err = f2fs_acl_chmod(inode);
if (err || is_inode_flag_set(fi, FI_ACL_MODE)) {
inode->i_mode = fi->i_acl_mode;
clear_inode_flag(fi, FI_ACL_MODE);
}
}
mark_inode_dirty(inode);
return err;
}
const struct inode_operations f2fs_file_inode_operations = {
.getattr = f2fs_getattr,
.setattr = f2fs_setattr,
.get_acl = f2fs_get_acl,
#ifdef CONFIG_F2FS_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = f2fs_listxattr,
.removexattr = generic_removexattr,
#endif
};
static void fill_zero(struct inode *inode, pgoff_t index,
loff_t start, loff_t len)
{
struct page *page;
if (!len)
return;
page = get_new_data_page(inode, index, false);
if (!IS_ERR(page)) {
wait_on_page_writeback(page);
zero_user(page, start, len);
set_page_dirty(page);
f2fs_put_page(page, 1);
}
}
int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end)
{
pgoff_t index;
int err;
for (index = pg_start; index < pg_end; index++) {
struct dnode_of_data dn;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
mutex_lock_op(sbi, DATA_TRUNC);
set_new_dnode(&dn, inode, NULL, NULL, 0);
err = get_dnode_of_data(&dn, index, RDONLY_NODE);
if (err) {
mutex_unlock_op(sbi, DATA_TRUNC);
if (err == -ENOENT)
continue;
return err;
}
if (dn.data_blkaddr != NULL_ADDR)
truncate_data_blocks_range(&dn, 1);
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_TRUNC);
}
return 0;
}
static int punch_hole(struct inode *inode, loff_t offset, loff_t len, int mode)
{
pgoff_t pg_start, pg_end;
loff_t off_start, off_end;
int ret = 0;
pg_start = ((unsigned long long) offset) >> PAGE_CACHE_SHIFT;
pg_end = ((unsigned long long) offset + len) >> PAGE_CACHE_SHIFT;
off_start = offset & (PAGE_CACHE_SIZE - 1);
off_end = (offset + len) & (PAGE_CACHE_SIZE - 1);
if (pg_start == pg_end) {
fill_zero(inode, pg_start, off_start,
off_end - off_start);
} else {
if (off_start)
fill_zero(inode, pg_start++, off_start,
PAGE_CACHE_SIZE - off_start);
if (off_end)
fill_zero(inode, pg_end, 0, off_end);
if (pg_start < pg_end) {
struct address_space *mapping = inode->i_mapping;
loff_t blk_start, blk_end;
blk_start = pg_start << PAGE_CACHE_SHIFT;
blk_end = pg_end << PAGE_CACHE_SHIFT;
truncate_inode_pages_range(mapping, blk_start,
blk_end - 1);
ret = truncate_hole(inode, pg_start, pg_end);
}
}
if (!(mode & FALLOC_FL_KEEP_SIZE) &&
i_size_read(inode) <= (offset + len)) {
i_size_write(inode, offset);
mark_inode_dirty(inode);
}
return ret;
}
static int expand_inode_data(struct inode *inode, loff_t offset,
loff_t len, int mode)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
pgoff_t index, pg_start, pg_end;
loff_t new_size = i_size_read(inode);
loff_t off_start, off_end;
int ret = 0;
ret = inode_newsize_ok(inode, (len + offset));
if (ret)
return ret;
pg_start = ((unsigned long long) offset) >> PAGE_CACHE_SHIFT;
pg_end = ((unsigned long long) offset + len) >> PAGE_CACHE_SHIFT;
off_start = offset & (PAGE_CACHE_SIZE - 1);
off_end = (offset + len) & (PAGE_CACHE_SIZE - 1);
for (index = pg_start; index <= pg_end; index++) {
struct dnode_of_data dn;
mutex_lock_op(sbi, DATA_NEW);
set_new_dnode(&dn, inode, NULL, NULL, 0);
ret = get_dnode_of_data(&dn, index, 0);
if (ret) {
mutex_unlock_op(sbi, DATA_NEW);
break;
}
if (dn.data_blkaddr == NULL_ADDR) {
ret = reserve_new_block(&dn);
if (ret) {
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_NEW);
break;
}
}
f2fs_put_dnode(&dn);
mutex_unlock_op(sbi, DATA_NEW);
if (pg_start == pg_end)
new_size = offset + len;
else if (index == pg_start && off_start)
new_size = (index + 1) << PAGE_CACHE_SHIFT;
else if (index == pg_end)
new_size = (index << PAGE_CACHE_SHIFT) + off_end;
else
new_size += PAGE_CACHE_SIZE;
}
if (!(mode & FALLOC_FL_KEEP_SIZE) &&
i_size_read(inode) < new_size) {
i_size_write(inode, new_size);
mark_inode_dirty(inode);
}
return ret;
}
static long f2fs_fallocate(struct file *file, int mode,
loff_t offset, loff_t len)
{
struct inode *inode = file->f_path.dentry->d_inode;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
long ret;
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
return -EOPNOTSUPP;
if (mode & FALLOC_FL_PUNCH_HOLE)
ret = punch_hole(inode, offset, len, mode);
else
ret = expand_inode_data(inode, offset, len, mode);
f2fs_balance_fs(sbi);
return ret;
}
#define F2FS_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL))
#define F2FS_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL)
static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags)
{
if (S_ISDIR(mode))
return flags;
else if (S_ISREG(mode))
return flags & F2FS_REG_FLMASK;
else
return flags & F2FS_OTHER_FLMASK;
}
long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = filp->f_dentry->d_inode;
struct f2fs_inode_info *fi = F2FS_I(inode);
unsigned int flags;
int ret;
switch (cmd) {
case FS_IOC_GETFLAGS:
flags = fi->i_flags & FS_FL_USER_VISIBLE;
return put_user(flags, (int __user *) arg);
case FS_IOC_SETFLAGS:
{
unsigned int oldflags;
ret = mnt_want_write(filp->f_path.mnt);
if (ret)
return ret;
if (!inode_owner_or_capable(inode)) {
ret = -EACCES;
goto out;
}
if (get_user(flags, (int __user *) arg)) {
ret = -EFAULT;
goto out;
}
flags = f2fs_mask_flags(inode->i_mode, flags);
mutex_lock(&inode->i_mutex);
oldflags = fi->i_flags;
if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
mutex_unlock(&inode->i_mutex);
ret = -EPERM;
goto out;
}
}
flags = flags & FS_FL_USER_MODIFIABLE;
flags |= oldflags & ~FS_FL_USER_MODIFIABLE;
fi->i_flags = flags;
mutex_unlock(&inode->i_mutex);
f2fs_set_inode_flags(inode);
inode->i_ctime = CURRENT_TIME;
mark_inode_dirty(inode);
out:
mnt_drop_write(filp->f_path.mnt);
return ret;
}
default:
return -ENOTTY;
}
}
const struct file_operations f2fs_file_operations = {
.llseek = generic_file_llseek,
.read = do_sync_read,
.write = do_sync_write,
.aio_read = generic_file_aio_read,
.aio_write = generic_file_aio_write,
.open = generic_file_open,
.mmap = f2fs_file_mmap,
.fsync = f2fs_sync_file,
.fallocate = f2fs_fallocate,
.unlocked_ioctl = f2fs_ioctl,
.splice_read = generic_file_splice_read,
.splice_write = generic_file_splice_write,
};

742
fs/f2fs/gc.c Normal file
View file

@ -0,0 +1,742 @@
/*
* fs/f2fs/gc.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/module.h>
#include <linux/backing-dev.h>
#include <linux/proc_fs.h>
#include <linux/init.h>
#include <linux/f2fs_fs.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/freezer.h>
#include <linux/blkdev.h>
#include "f2fs.h"
#include "node.h"
#include "segment.h"
#include "gc.h"
static struct kmem_cache *winode_slab;
static int gc_thread_func(void *data)
{
struct f2fs_sb_info *sbi = data;
wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head;
long wait_ms;
wait_ms = GC_THREAD_MIN_SLEEP_TIME;
do {
if (try_to_freeze())
continue;
else
wait_event_interruptible_timeout(*wq,
kthread_should_stop(),
msecs_to_jiffies(wait_ms));
if (kthread_should_stop())
break;
f2fs_balance_fs(sbi);
if (!test_opt(sbi, BG_GC))
continue;
/*
* [GC triggering condition]
* 0. GC is not conducted currently.
* 1. There are enough dirty segments.
* 2. IO subsystem is idle by checking the # of writeback pages.
* 3. IO subsystem is idle by checking the # of requests in
* bdev's request list.
*
* Note) We have to avoid triggering GCs too much frequently.
* Because it is possible that some segments can be
* invalidated soon after by user update or deletion.
* So, I'd like to wait some time to collect dirty segments.
*/
if (!mutex_trylock(&sbi->gc_mutex))
continue;
if (!is_idle(sbi)) {
wait_ms = increase_sleep_time(wait_ms);
mutex_unlock(&sbi->gc_mutex);
continue;
}
if (has_enough_invalid_blocks(sbi))
wait_ms = decrease_sleep_time(wait_ms);
else
wait_ms = increase_sleep_time(wait_ms);
sbi->bg_gc++;
if (f2fs_gc(sbi, 1) == GC_NONE)
wait_ms = GC_THREAD_NOGC_SLEEP_TIME;
else if (wait_ms == GC_THREAD_NOGC_SLEEP_TIME)
wait_ms = GC_THREAD_MAX_SLEEP_TIME;
} while (!kthread_should_stop());
return 0;
}
int start_gc_thread(struct f2fs_sb_info *sbi)
{
struct f2fs_gc_kthread *gc_th;
gc_th = kmalloc(sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
if (!gc_th)
return -ENOMEM;
sbi->gc_thread = gc_th;
init_waitqueue_head(&sbi->gc_thread->gc_wait_queue_head);
sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
GC_THREAD_NAME);
if (IS_ERR(gc_th->f2fs_gc_task)) {
kfree(gc_th);
return -ENOMEM;
}
return 0;
}
void stop_gc_thread(struct f2fs_sb_info *sbi)
{
struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
if (!gc_th)
return;
kthread_stop(gc_th->f2fs_gc_task);
kfree(gc_th);
sbi->gc_thread = NULL;
}
static int select_gc_type(int gc_type)
{
return (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
}
static void select_policy(struct f2fs_sb_info *sbi, int gc_type,
int type, struct victim_sel_policy *p)
{
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
if (p->alloc_mode) {
p->gc_mode = GC_GREEDY;
p->dirty_segmap = dirty_i->dirty_segmap[type];
p->ofs_unit = 1;
} else {
p->gc_mode = select_gc_type(gc_type);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
p->ofs_unit = sbi->segs_per_sec;
}
p->offset = sbi->last_victim[p->gc_mode];
}
static unsigned int get_max_cost(struct f2fs_sb_info *sbi,
struct victim_sel_policy *p)
{
if (p->gc_mode == GC_GREEDY)
return (1 << sbi->log_blocks_per_seg) * p->ofs_unit;
else if (p->gc_mode == GC_CB)
return UINT_MAX;
else /* No other gc_mode */
return 0;
}
static unsigned int check_bg_victims(struct f2fs_sb_info *sbi)
{
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
unsigned int segno;
/*
* If the gc_type is FG_GC, we can select victim segments
* selected by background GC before.
* Those segments guarantee they have small valid blocks.
*/
segno = find_next_bit(dirty_i->victim_segmap[BG_GC],
TOTAL_SEGS(sbi), 0);
if (segno < TOTAL_SEGS(sbi)) {
clear_bit(segno, dirty_i->victim_segmap[BG_GC]);
return segno;
}
return NULL_SEGNO;
}
static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno)
{
struct sit_info *sit_i = SIT_I(sbi);
unsigned int secno = GET_SECNO(sbi, segno);
unsigned int start = secno * sbi->segs_per_sec;
unsigned long long mtime = 0;
unsigned int vblocks;
unsigned char age = 0;
unsigned char u;
unsigned int i;
for (i = 0; i < sbi->segs_per_sec; i++)
mtime += get_seg_entry(sbi, start + i)->mtime;
vblocks = get_valid_blocks(sbi, segno, sbi->segs_per_sec);
mtime = div_u64(mtime, sbi->segs_per_sec);
vblocks = div_u64(vblocks, sbi->segs_per_sec);
u = (vblocks * 100) >> sbi->log_blocks_per_seg;
/* Handle if the system time is changed by user */
if (mtime < sit_i->min_mtime)
sit_i->min_mtime = mtime;
if (mtime > sit_i->max_mtime)
sit_i->max_mtime = mtime;
if (sit_i->max_mtime != sit_i->min_mtime)
age = 100 - div64_u64(100 * (mtime - sit_i->min_mtime),
sit_i->max_mtime - sit_i->min_mtime);
return UINT_MAX - ((100 * (100 - u) * age) / (100 + u));
}
static unsigned int get_gc_cost(struct f2fs_sb_info *sbi, unsigned int segno,
struct victim_sel_policy *p)
{
if (p->alloc_mode == SSR)
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
/* alloc_mode == LFS */
if (p->gc_mode == GC_GREEDY)
return get_valid_blocks(sbi, segno, sbi->segs_per_sec);
else
return get_cb_cost(sbi, segno);
}
/*
* This function is called from two pathes.
* One is garbage collection and the other is SSR segment selection.
* When it is called during GC, it just gets a victim segment
* and it does not remove it from dirty seglist.
* When it is called from SSR segment selection, it finds a segment
* which has minimum valid blocks and removes it from dirty seglist.
*/
static int get_victim_by_default(struct f2fs_sb_info *sbi,
unsigned int *result, int gc_type, int type, char alloc_mode)
{
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
struct victim_sel_policy p;
unsigned int segno;
int nsearched = 0;
p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, &p);
p.min_segno = NULL_SEGNO;
p.min_cost = get_max_cost(sbi, &p);
mutex_lock(&dirty_i->seglist_lock);
if (p.alloc_mode == LFS && gc_type == FG_GC) {
p.min_segno = check_bg_victims(sbi);
if (p.min_segno != NULL_SEGNO)
goto got_it;
}
while (1) {
unsigned long cost;
segno = find_next_bit(p.dirty_segmap,
TOTAL_SEGS(sbi), p.offset);
if (segno >= TOTAL_SEGS(sbi)) {
if (sbi->last_victim[p.gc_mode]) {
sbi->last_victim[p.gc_mode] = 0;
p.offset = 0;
continue;
}
break;
}
p.offset = ((segno / p.ofs_unit) * p.ofs_unit) + p.ofs_unit;
if (test_bit(segno, dirty_i->victim_segmap[FG_GC]))
continue;
if (gc_type == BG_GC &&
test_bit(segno, dirty_i->victim_segmap[BG_GC]))
continue;
if (IS_CURSEC(sbi, GET_SECNO(sbi, segno)))
continue;
cost = get_gc_cost(sbi, segno, &p);
if (p.min_cost > cost) {
p.min_segno = segno;
p.min_cost = cost;
}
if (cost == get_max_cost(sbi, &p))
continue;
if (nsearched++ >= MAX_VICTIM_SEARCH) {
sbi->last_victim[p.gc_mode] = segno;
break;
}
}
got_it:
if (p.min_segno != NULL_SEGNO) {
*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
if (p.alloc_mode == LFS) {
int i;
for (i = 0; i < p.ofs_unit; i++)
set_bit(*result + i,
dirty_i->victim_segmap[gc_type]);
}
}
mutex_unlock(&dirty_i->seglist_lock);
return (p.min_segno == NULL_SEGNO) ? 0 : 1;
}
static const struct victim_selection default_v_ops = {
.get_victim = get_victim_by_default,
};
static struct inode *find_gc_inode(nid_t ino, struct list_head *ilist)
{
struct list_head *this;
struct inode_entry *ie;
list_for_each(this, ilist) {
ie = list_entry(this, struct inode_entry, list);
if (ie->inode->i_ino == ino)
return ie->inode;
}
return NULL;
}
static void add_gc_inode(struct inode *inode, struct list_head *ilist)
{
struct list_head *this;
struct inode_entry *new_ie, *ie;
list_for_each(this, ilist) {
ie = list_entry(this, struct inode_entry, list);
if (ie->inode == inode) {
iput(inode);
return;
}
}
repeat:
new_ie = kmem_cache_alloc(winode_slab, GFP_NOFS);
if (!new_ie) {
cond_resched();
goto repeat;
}
new_ie->inode = inode;
list_add_tail(&new_ie->list, ilist);
}
static void put_gc_inode(struct list_head *ilist)
{
struct inode_entry *ie, *next_ie;
list_for_each_entry_safe(ie, next_ie, ilist, list) {
iput(ie->inode);
list_del(&ie->list);
kmem_cache_free(winode_slab, ie);
}
}
static int check_valid_map(struct f2fs_sb_info *sbi,
unsigned int segno, int offset)
{
struct sit_info *sit_i = SIT_I(sbi);
struct seg_entry *sentry;
int ret;
mutex_lock(&sit_i->sentry_lock);
sentry = get_seg_entry(sbi, segno);
ret = f2fs_test_bit(offset, sentry->cur_valid_map);
mutex_unlock(&sit_i->sentry_lock);
return ret ? GC_OK : GC_NEXT;
}
/*
* This function compares node address got in summary with that in NAT.
* On validity, copy that node with cold status, otherwise (invalid node)
* ignore that.
*/
static int gc_node_segment(struct f2fs_sb_info *sbi,
struct f2fs_summary *sum, unsigned int segno, int gc_type)
{
bool initial = true;
struct f2fs_summary *entry;
int off;
next_step:
entry = sum;
for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
nid_t nid = le32_to_cpu(entry->nid);
struct page *node_page;
int err;
/*
* It makes sure that free segments are able to write
* all the dirty node pages before CP after this CP.
* So let's check the space of dirty node pages.
*/
if (should_do_checkpoint(sbi)) {
mutex_lock(&sbi->cp_mutex);
block_operations(sbi);
return GC_BLOCKED;
}
err = check_valid_map(sbi, segno, off);
if (err == GC_ERROR)
return err;
else if (err == GC_NEXT)
continue;
if (initial) {
ra_node_page(sbi, nid);
continue;
}
node_page = get_node_page(sbi, nid);
if (IS_ERR(node_page))
continue;
/* set page dirty and write it */
if (!PageWriteback(node_page))
set_page_dirty(node_page);
f2fs_put_page(node_page, 1);
stat_inc_node_blk_count(sbi, 1);
}
if (initial) {
initial = false;
goto next_step;
}
if (gc_type == FG_GC) {
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
.for_reclaim = 0,
};
sync_node_pages(sbi, 0, &wbc);
}
return GC_DONE;
}
/*
* Calculate start block index that this node page contains
*/
block_t start_bidx_of_node(unsigned int node_ofs)
{
block_t start_bidx;
unsigned int bidx, indirect_blks;
int dec;
indirect_blks = 2 * NIDS_PER_BLOCK + 4;
start_bidx = 1;
if (node_ofs == 0) {
start_bidx = 0;
} else if (node_ofs <= 2) {
bidx = node_ofs - 1;
} else if (node_ofs <= indirect_blks) {
dec = (node_ofs - 4) / (NIDS_PER_BLOCK + 1);
bidx = node_ofs - 2 - dec;
} else {
dec = (node_ofs - indirect_blks - 3) / (NIDS_PER_BLOCK + 1);
bidx = node_ofs - 5 - dec;
}
if (start_bidx)
start_bidx = bidx * ADDRS_PER_BLOCK + ADDRS_PER_INODE;
return start_bidx;
}
static int check_dnode(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
struct node_info *dni, block_t blkaddr, unsigned int *nofs)
{
struct page *node_page;
nid_t nid;
unsigned int ofs_in_node;
block_t source_blkaddr;
nid = le32_to_cpu(sum->nid);
ofs_in_node = le16_to_cpu(sum->ofs_in_node);
node_page = get_node_page(sbi, nid);
if (IS_ERR(node_page))
return GC_NEXT;
get_node_info(sbi, nid, dni);
if (sum->version != dni->version) {
f2fs_put_page(node_page, 1);
return GC_NEXT;
}
*nofs = ofs_of_node(node_page);
source_blkaddr = datablock_addr(node_page, ofs_in_node);
f2fs_put_page(node_page, 1);
if (source_blkaddr != blkaddr)
return GC_NEXT;
return GC_OK;
}
static void move_data_page(struct inode *inode, struct page *page, int gc_type)
{
if (page->mapping != inode->i_mapping)
goto out;
if (inode != page->mapping->host)
goto out;
if (PageWriteback(page))
goto out;
if (gc_type == BG_GC) {
set_page_dirty(page);
set_cold_data(page);
} else {
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
mutex_lock_op(sbi, DATA_WRITE);
if (clear_page_dirty_for_io(page) &&
S_ISDIR(inode->i_mode)) {
dec_page_count(sbi, F2FS_DIRTY_DENTS);
inode_dec_dirty_dents(inode);
}
set_cold_data(page);
do_write_data_page(page);
mutex_unlock_op(sbi, DATA_WRITE);
clear_cold_data(page);
}
out:
f2fs_put_page(page, 1);
}
/*
* This function tries to get parent node of victim data block, and identifies
* data block validity. If the block is valid, copy that with cold status and
* modify parent node.
* If the parent node is not valid or the data block address is different,
* the victim data block is ignored.
*/
static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
struct list_head *ilist, unsigned int segno, int gc_type)
{
struct super_block *sb = sbi->sb;
struct f2fs_summary *entry;
block_t start_addr;
int err, off;
int phase = 0;
start_addr = START_BLOCK(sbi, segno);
next_step:
entry = sum;
for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
struct page *data_page;
struct inode *inode;
struct node_info dni; /* dnode info for the data */
unsigned int ofs_in_node, nofs;
block_t start_bidx;
/*
* It makes sure that free segments are able to write
* all the dirty node pages before CP after this CP.
* So let's check the space of dirty node pages.
*/
if (should_do_checkpoint(sbi)) {
mutex_lock(&sbi->cp_mutex);
block_operations(sbi);
err = GC_BLOCKED;
goto stop;
}
err = check_valid_map(sbi, segno, off);
if (err == GC_ERROR)
goto stop;
else if (err == GC_NEXT)
continue;
if (phase == 0) {
ra_node_page(sbi, le32_to_cpu(entry->nid));
continue;
}
/* Get an inode by ino with checking validity */
err = check_dnode(sbi, entry, &dni, start_addr + off, &nofs);
if (err == GC_ERROR)
goto stop;
else if (err == GC_NEXT)
continue;
if (phase == 1) {
ra_node_page(sbi, dni.ino);
continue;
}
start_bidx = start_bidx_of_node(nofs);
ofs_in_node = le16_to_cpu(entry->ofs_in_node);
if (phase == 2) {
inode = f2fs_iget_nowait(sb, dni.ino);
if (IS_ERR(inode))
continue;
data_page = find_data_page(inode,
start_bidx + ofs_in_node);
if (IS_ERR(data_page))
goto next_iput;
f2fs_put_page(data_page, 0);
add_gc_inode(inode, ilist);
} else {
inode = find_gc_inode(dni.ino, ilist);
if (inode) {
data_page = get_lock_data_page(inode,
start_bidx + ofs_in_node);
if (IS_ERR(data_page))
continue;
move_data_page(inode, data_page, gc_type);
stat_inc_data_blk_count(sbi, 1);
}
}
continue;
next_iput:
iput(inode);
}
if (++phase < 4)
goto next_step;
err = GC_DONE;
stop:
if (gc_type == FG_GC)
f2fs_submit_bio(sbi, DATA, true);
return err;
}
static int __get_victim(struct f2fs_sb_info *sbi, unsigned int *victim,
int gc_type, int type)
{
struct sit_info *sit_i = SIT_I(sbi);
int ret;
mutex_lock(&sit_i->sentry_lock);
ret = DIRTY_I(sbi)->v_ops->get_victim(sbi, victim, gc_type, type, LFS);
mutex_unlock(&sit_i->sentry_lock);
return ret;
}
static int do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno,
struct list_head *ilist, int gc_type)
{
struct page *sum_page;
struct f2fs_summary_block *sum;
int ret = GC_DONE;
/* read segment summary of victim */
sum_page = get_sum_page(sbi, segno);
if (IS_ERR(sum_page))
return GC_ERROR;
/*
* CP needs to lock sum_page. In this time, we don't need
* to lock this page, because this summary page is not gone anywhere.
* Also, this page is not gonna be updated before GC is done.
*/
unlock_page(sum_page);
sum = page_address(sum_page);
switch (GET_SUM_TYPE((&sum->footer))) {
case SUM_TYPE_NODE:
ret = gc_node_segment(sbi, sum->entries, segno, gc_type);
break;
case SUM_TYPE_DATA:
ret = gc_data_segment(sbi, sum->entries, ilist, segno, gc_type);
break;
}
stat_inc_seg_count(sbi, GET_SUM_TYPE((&sum->footer)));
stat_inc_call_count(sbi->stat_info);
f2fs_put_page(sum_page, 0);
return ret;
}
int f2fs_gc(struct f2fs_sb_info *sbi, int nGC)
{
unsigned int segno;
int old_free_secs, cur_free_secs;
int gc_status, nfree;
struct list_head ilist;
int gc_type = BG_GC;
INIT_LIST_HEAD(&ilist);
gc_more:
nfree = 0;
gc_status = GC_NONE;
if (has_not_enough_free_secs(sbi))
old_free_secs = reserved_sections(sbi);
else
old_free_secs = free_sections(sbi);
while (sbi->sb->s_flags & MS_ACTIVE) {
int i;
if (has_not_enough_free_secs(sbi))
gc_type = FG_GC;
cur_free_secs = free_sections(sbi) + nfree;
/* We got free space successfully. */
if (nGC < cur_free_secs - old_free_secs)
break;
if (!__get_victim(sbi, &segno, gc_type, NO_CHECK_TYPE))
break;
for (i = 0; i < sbi->segs_per_sec; i++) {
/*
* do_garbage_collect will give us three gc_status:
* GC_ERROR, GC_DONE, and GC_BLOCKED.
* If GC is finished uncleanly, we have to return
* the victim to dirty segment list.
*/
gc_status = do_garbage_collect(sbi, segno + i,
&ilist, gc_type);
if (gc_status != GC_DONE)
goto stop;
nfree++;
}
}
stop:
if (has_not_enough_free_secs(sbi) || gc_status == GC_BLOCKED) {
write_checkpoint(sbi, (gc_status == GC_BLOCKED), false);
if (nfree)
goto gc_more;
}
mutex_unlock(&sbi->gc_mutex);
put_gc_inode(&ilist);
BUG_ON(!list_empty(&ilist));
return gc_status;
}
void build_gc_manager(struct f2fs_sb_info *sbi)
{
DIRTY_I(sbi)->v_ops = &default_v_ops;
}
int create_gc_caches(void)
{
winode_slab = f2fs_kmem_cache_create("f2fs_gc_inodes",
sizeof(struct inode_entry), NULL);
if (!winode_slab)
return -ENOMEM;
return 0;
}
void destroy_gc_caches(void)
{
kmem_cache_destroy(winode_slab);
}

117
fs/f2fs/gc.h Normal file
View file

@ -0,0 +1,117 @@
/*
* fs/f2fs/gc.h
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#define GC_THREAD_NAME "f2fs_gc_task"
#define GC_THREAD_MIN_WB_PAGES 1 /*
* a threshold to determine
* whether IO subsystem is idle
* or not
*/
#define GC_THREAD_MIN_SLEEP_TIME 10000 /* milliseconds */
#define GC_THREAD_MAX_SLEEP_TIME 30000
#define GC_THREAD_NOGC_SLEEP_TIME 10000
#define LIMIT_INVALID_BLOCK 40 /* percentage over total user space */
#define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
/* Search max. number of dirty segments to select a victim segment */
#define MAX_VICTIM_SEARCH 20
enum {
GC_NONE = 0,
GC_ERROR,
GC_OK,
GC_NEXT,
GC_BLOCKED,
GC_DONE,
};
struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
wait_queue_head_t gc_wait_queue_head;
};
struct inode_entry {
struct list_head list;
struct inode *inode;
};
/*
* inline functions
*/
static inline block_t free_user_blocks(struct f2fs_sb_info *sbi)
{
if (free_segments(sbi) < overprovision_segments(sbi))
return 0;
else
return (free_segments(sbi) - overprovision_segments(sbi))
<< sbi->log_blocks_per_seg;
}
static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info *sbi)
{
return (long)(sbi->user_block_count * LIMIT_INVALID_BLOCK) / 100;
}
static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi)
{
block_t reclaimable_user_blocks = sbi->user_block_count -
written_block_count(sbi);
return (long)(reclaimable_user_blocks * LIMIT_FREE_BLOCK) / 100;
}
static inline long increase_sleep_time(long wait)
{
wait += GC_THREAD_MIN_SLEEP_TIME;
if (wait > GC_THREAD_MAX_SLEEP_TIME)
wait = GC_THREAD_MAX_SLEEP_TIME;
return wait;
}
static inline long decrease_sleep_time(long wait)
{
wait -= GC_THREAD_MIN_SLEEP_TIME;
if (wait <= GC_THREAD_MIN_SLEEP_TIME)
wait = GC_THREAD_MIN_SLEEP_TIME;
return wait;
}
static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
{
block_t invalid_user_blocks = sbi->user_block_count -
written_block_count(sbi);
/*
* Background GC is triggered with the following condition.
* 1. There are a number of invalid blocks.
* 2. There is not enough free space.
*/
if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
free_user_blocks(sbi) < limit_free_user_blocks(sbi))
return true;
return false;
}
static inline int is_idle(struct f2fs_sb_info *sbi)
{
struct block_device *bdev = sbi->sb->s_bdev;
struct request_queue *q = bdev_get_queue(bdev);
struct request_list *rl = &q->root_rl;
return !(rl->count[BLK_RW_SYNC]) && !(rl->count[BLK_RW_ASYNC]);
}
static inline bool should_do_checkpoint(struct f2fs_sb_info *sbi)
{
unsigned int pages_per_sec = sbi->segs_per_sec *
(1 << sbi->log_blocks_per_seg);
int node_secs = ((get_pages(sbi, F2FS_DIRTY_NODES) + pages_per_sec - 1)
>> sbi->log_blocks_per_seg) / sbi->segs_per_sec;
int dent_secs = ((get_pages(sbi, F2FS_DIRTY_DENTS) + pages_per_sec - 1)
>> sbi->log_blocks_per_seg) / sbi->segs_per_sec;
return free_sections(sbi) <= (node_secs + 2 * dent_secs + 2);
}

97
fs/f2fs/hash.c Normal file
View file

@ -0,0 +1,97 @@
/*
* fs/f2fs/hash.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* Portions of this code from linux/fs/ext3/hash.c
*
* Copyright (C) 2002 by Theodore Ts'o
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/types.h>
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/cryptohash.h>
#include <linux/pagemap.h>
#include "f2fs.h"
/*
* Hashing code copied from ext3
*/
#define DELTA 0x9E3779B9
static void TEA_transform(unsigned int buf[4], unsigned int const in[])
{
__u32 sum = 0;
__u32 b0 = buf[0], b1 = buf[1];
__u32 a = in[0], b = in[1], c = in[2], d = in[3];
int n = 16;
do {
sum += DELTA;
b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
} while (--n);
buf[0] += b0;
buf[1] += b1;
}
static void str2hashbuf(const char *msg, int len, unsigned int *buf, int num)
{
unsigned pad, val;
int i;
pad = (__u32)len | ((__u32)len << 8);
pad |= pad << 16;
val = pad;
if (len > num * 4)
len = num * 4;
for (i = 0; i < len; i++) {
if ((i % 4) == 0)
val = pad;
val = msg[i] + (val << 8);
if ((i % 4) == 3) {
*buf++ = val;
val = pad;
num--;
}
}
if (--num >= 0)
*buf++ = val;
while (--num >= 0)
*buf++ = pad;
}
f2fs_hash_t f2fs_dentry_hash(const char *name, int len)
{
__u32 hash, minor_hash;
f2fs_hash_t f2fs_hash;
const char *p;
__u32 in[8], buf[4];
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
buf[1] = 0xefcdab89;
buf[2] = 0x98badcfe;
buf[3] = 0x10325476;
p = name;
while (len > 0) {
str2hashbuf(p, len, in, 4);
TEA_transform(buf, in);
len -= 16;
p += 16;
}
hash = buf[0];
minor_hash = buf[1];
f2fs_hash = cpu_to_le32(hash & ~F2FS_HASH_COL_BIT);
return f2fs_hash;
}

268
fs/f2fs/inode.c Normal file
View file

@ -0,0 +1,268 @@
/*
* fs/f2fs/inode.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/buffer_head.h>
#include <linux/writeback.h>
#include "f2fs.h"
#include "node.h"
struct f2fs_iget_args {
u64 ino;
int on_free;
};
void f2fs_set_inode_flags(struct inode *inode)
{
unsigned int flags = F2FS_I(inode)->i_flags;
inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE |
S_NOATIME | S_DIRSYNC);
if (flags & FS_SYNC_FL)
inode->i_flags |= S_SYNC;
if (flags & FS_APPEND_FL)
inode->i_flags |= S_APPEND;
if (flags & FS_IMMUTABLE_FL)
inode->i_flags |= S_IMMUTABLE;
if (flags & FS_NOATIME_FL)
inode->i_flags |= S_NOATIME;
if (flags & FS_DIRSYNC_FL)
inode->i_flags |= S_DIRSYNC;
}
static int f2fs_iget_test(struct inode *inode, void *data)
{
struct f2fs_iget_args *args = data;
if (inode->i_ino != args->ino)
return 0;
if (inode->i_state & (I_FREEING | I_WILL_FREE)) {
args->on_free = 1;
return 0;
}
return 1;
}
struct inode *f2fs_iget_nowait(struct super_block *sb, unsigned long ino)
{
struct f2fs_iget_args args = {
.ino = ino,
.on_free = 0
};
struct inode *inode = ilookup5(sb, ino, f2fs_iget_test, &args);
if (inode)
return inode;
if (!args.on_free)
return f2fs_iget(sb, ino);
return ERR_PTR(-ENOENT);
}
static int do_read_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct page *node_page;
struct f2fs_node *rn;
struct f2fs_inode *ri;
/* Check if ino is within scope */
check_nid_range(sbi, inode->i_ino);
node_page = get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page))
return PTR_ERR(node_page);
rn = page_address(node_page);
ri = &(rn->i);
inode->i_mode = le16_to_cpu(ri->i_mode);
i_uid_write(inode, le32_to_cpu(ri->i_uid));
i_gid_write(inode, le32_to_cpu(ri->i_gid));
set_nlink(inode, le32_to_cpu(ri->i_links));
inode->i_size = le64_to_cpu(ri->i_size);
inode->i_blocks = le64_to_cpu(ri->i_blocks);
inode->i_atime.tv_sec = le64_to_cpu(ri->i_atime);
inode->i_ctime.tv_sec = le64_to_cpu(ri->i_ctime);
inode->i_mtime.tv_sec = le64_to_cpu(ri->i_mtime);
inode->i_atime.tv_nsec = le32_to_cpu(ri->i_atime_nsec);
inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
inode->i_generation = le32_to_cpu(ri->i_generation);
fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
fi->i_flags = le32_to_cpu(ri->i_flags);
fi->flags = 0;
fi->data_version = le64_to_cpu(F2FS_CKPT(sbi)->checkpoint_ver) - 1;
fi->i_advise = ri->i_advise;
fi->i_pino = le32_to_cpu(ri->i_pino);
get_extent_info(&fi->ext, ri->i_ext);
f2fs_put_page(node_page, 1);
return 0;
}
struct inode *f2fs_iget(struct super_block *sb, unsigned long ino)
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
int ret;
inode = iget_locked(sb, ino);
if (!inode)
return ERR_PTR(-ENOMEM);
if (!(inode->i_state & I_NEW))
return inode;
if (ino == F2FS_NODE_INO(sbi) || ino == F2FS_META_INO(sbi))
goto make_now;
ret = do_read_inode(inode);
if (ret)
goto bad_inode;
if (!sbi->por_doing && inode->i_nlink == 0) {
ret = -ENOENT;
goto bad_inode;
}
make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_node_aops;
mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
} else if (ino == F2FS_META_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_meta_aops;
mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
} else if (S_ISREG(inode->i_mode)) {
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &f2fs_dir_inode_operations;
inode->i_fop = &f2fs_dir_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER_MOVABLE |
__GFP_ZERO);
} else if (S_ISLNK(inode->i_mode)) {
inode->i_op = &f2fs_symlink_inode_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
} else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
inode->i_op = &f2fs_special_inode_operations;
init_special_inode(inode, inode->i_mode, inode->i_rdev);
} else {
ret = -EIO;
goto bad_inode;
}
unlock_new_inode(inode);
return inode;
bad_inode:
iget_failed(inode);
return ERR_PTR(ret);
}
void update_inode(struct inode *inode, struct page *node_page)
{
struct f2fs_node *rn;
struct f2fs_inode *ri;
wait_on_page_writeback(node_page);
rn = page_address(node_page);
ri = &(rn->i);
ri->i_mode = cpu_to_le16(inode->i_mode);
ri->i_advise = F2FS_I(inode)->i_advise;
ri->i_uid = cpu_to_le32(i_uid_read(inode));
ri->i_gid = cpu_to_le32(i_gid_read(inode));
ri->i_links = cpu_to_le32(inode->i_nlink);
ri->i_size = cpu_to_le64(i_size_read(inode));
ri->i_blocks = cpu_to_le64(inode->i_blocks);
set_raw_extent(&F2FS_I(inode)->ext, &ri->i_ext);
ri->i_atime = cpu_to_le64(inode->i_atime.tv_sec);
ri->i_ctime = cpu_to_le64(inode->i_ctime.tv_sec);
ri->i_mtime = cpu_to_le64(inode->i_mtime.tv_sec);
ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec);
ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth);
ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino);
ri->i_generation = cpu_to_le32(inode->i_generation);
set_page_dirty(node_page);
}
int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct page *node_page;
bool need_lock = false;
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
inode->i_ino == F2FS_META_INO(sbi))
return 0;
node_page = get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page))
return PTR_ERR(node_page);
if (!PageDirty(node_page)) {
need_lock = true;
f2fs_put_page(node_page, 1);
mutex_lock(&sbi->write_inode);
node_page = get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page)) {
mutex_unlock(&sbi->write_inode);
return PTR_ERR(node_page);
}
}
update_inode(inode, node_page);
f2fs_put_page(node_page, 1);
if (need_lock)
mutex_unlock(&sbi->write_inode);
return 0;
}
/*
* Called at the last iput() if i_nlink is zero
*/
void f2fs_evict_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
truncate_inode_pages(&inode->i_data, 0);
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
inode->i_ino == F2FS_META_INO(sbi))
goto no_delete;
BUG_ON(atomic_read(&F2FS_I(inode)->dirty_dents));
remove_dirty_dir_inode(inode);
if (inode->i_nlink || is_bad_inode(inode))
goto no_delete;
set_inode_flag(F2FS_I(inode), FI_NO_ALLOC);
i_size_write(inode, 0);
if (F2FS_HAS_BLOCKS(inode))
f2fs_truncate(inode);
remove_inode_page(inode);
no_delete:
clear_inode(inode);
}

503
fs/f2fs/namei.c Normal file
View file

@ -0,0 +1,503 @@
/*
* fs/f2fs/namei.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/pagemap.h>
#include <linux/sched.h>
#include <linux/ctype.h>
#include "f2fs.h"
#include "xattr.h"
#include "acl.h"
static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
{
struct super_block *sb = dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
nid_t ino;
struct inode *inode;
bool nid_free = false;
int err;
inode = new_inode(sb);
if (!inode)
return ERR_PTR(-ENOMEM);
mutex_lock_op(sbi, NODE_NEW);
if (!alloc_nid(sbi, &ino)) {
mutex_unlock_op(sbi, NODE_NEW);
err = -ENOSPC;
goto fail;
}
mutex_unlock_op(sbi, NODE_NEW);
inode->i_uid = current_fsuid();
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
if (S_ISDIR(mode))
mode |= S_ISGID;
} else {
inode->i_gid = current_fsgid();
}
inode->i_ino = ino;
inode->i_mode = mode;
inode->i_blocks = 0;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
inode->i_generation = sbi->s_next_generation++;
err = insert_inode_locked(inode);
if (err) {
err = -EINVAL;
nid_free = true;
goto out;
}
mark_inode_dirty(inode);
return inode;
out:
clear_nlink(inode);
unlock_new_inode(inode);
fail:
iput(inode);
if (nid_free)
alloc_nid_failed(sbi, ino);
return ERR_PTR(err);
}
static int is_multimedia_file(const unsigned char *s, const char *sub)
{
int slen = strlen(s);
int sublen = strlen(sub);
int ret;
if (sublen > slen)
return 1;
ret = memcmp(s + slen - sublen, sub, sublen);
if (ret) { /* compare upper case */
int i;
char upper_sub[8];
for (i = 0; i < sublen && i < sizeof(upper_sub); i++)
upper_sub[i] = toupper(sub[i]);
return memcmp(s + slen - sublen, upper_sub, sublen);
}
return ret;
}
/*
* Set multimedia files as cold files for hot/cold data separation
*/
static inline void set_cold_file(struct f2fs_sb_info *sbi, struct inode *inode,
const unsigned char *name)
{
int i;
__u8 (*extlist)[8] = sbi->raw_super->extension_list;
int count = le32_to_cpu(sbi->raw_super->extension_count);
for (i = 0; i < count; i++) {
if (!is_multimedia_file(name, extlist[i])) {
F2FS_I(inode)->i_advise |= FADVISE_COLD_BIT;
break;
}
}
}
static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
bool excl)
{
struct super_block *sb = dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
nid_t ino = 0;
int err;
inode = f2fs_new_inode(dir, mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
if (!test_opt(sbi, DISABLE_EXT_IDENTIFY))
set_cold_file(sbi, inode, dentry->d_name.name);
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
ino = inode->i_ino;
err = f2fs_add_link(dentry, inode);
if (err)
goto out;
alloc_nid_done(sbi, ino);
if (!sbi->por_doing)
d_instantiate(dentry, inode);
unlock_new_inode(inode);
f2fs_balance_fs(sbi);
return 0;
out:
clear_nlink(inode);
unlock_new_inode(inode);
iput(inode);
alloc_nid_failed(sbi, ino);
return err;
}
static int f2fs_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
struct inode *inode = old_dentry->d_inode;
struct super_block *sb = dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
int err;
inode->i_ctime = CURRENT_TIME;
atomic_inc(&inode->i_count);
set_inode_flag(F2FS_I(inode), FI_INC_LINK);
err = f2fs_add_link(dentry, inode);
if (err)
goto out;
d_instantiate(dentry, inode);
f2fs_balance_fs(sbi);
return 0;
out:
clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
iput(inode);
return err;
}
struct dentry *f2fs_get_parent(struct dentry *child)
{
struct qstr dotdot = QSTR_INIT("..", 2);
unsigned long ino = f2fs_inode_by_name(child->d_inode, &dotdot);
if (!ino)
return ERR_PTR(-ENOENT);
return d_obtain_alias(f2fs_iget(child->d_inode->i_sb, ino));
}
static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
unsigned int flags)
{
struct inode *inode = NULL;
struct f2fs_dir_entry *de;
struct page *page;
if (dentry->d_name.len > F2FS_MAX_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
de = f2fs_find_entry(dir, &dentry->d_name, &page);
if (de) {
nid_t ino = le32_to_cpu(de->ino);
kunmap(page);
f2fs_put_page(page, 0);
inode = f2fs_iget(dir->i_sb, ino);
if (IS_ERR(inode))
return ERR_CAST(inode);
}
return d_splice_alias(inode, dentry);
}
static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
{
struct super_block *sb = dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode = dentry->d_inode;
struct f2fs_dir_entry *de;
struct page *page;
int err = -ENOENT;
de = f2fs_find_entry(dir, &dentry->d_name, &page);
if (!de)
goto fail;
err = check_orphan_space(sbi);
if (err) {
kunmap(page);
f2fs_put_page(page, 0);
goto fail;
}
f2fs_delete_entry(de, page, inode);
/* In order to evict this inode, we set it dirty */
mark_inode_dirty(inode);
f2fs_balance_fs(sbi);
fail:
return err;
}
static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
const char *symname)
{
struct super_block *sb = dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
unsigned symlen = strlen(symname) + 1;
int err;
inode = f2fs_new_inode(dir, S_IFLNK | S_IRWXUGO);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &f2fs_symlink_inode_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
err = f2fs_add_link(dentry, inode);
if (err)
goto out;
err = page_symlink(inode, symname, symlen);
alloc_nid_done(sbi, inode->i_ino);
d_instantiate(dentry, inode);
unlock_new_inode(inode);
f2fs_balance_fs(sbi);
return err;
out:
clear_nlink(inode);
unlock_new_inode(inode);
iput(inode);
alloc_nid_failed(sbi, inode->i_ino);
return err;
}
static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
struct inode *inode;
int err;
inode = f2fs_new_inode(dir, S_IFDIR | mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &f2fs_dir_inode_operations;
inode->i_fop = &f2fs_dir_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
set_inode_flag(F2FS_I(inode), FI_INC_LINK);
err = f2fs_add_link(dentry, inode);
if (err)
goto out_fail;
alloc_nid_done(sbi, inode->i_ino);
d_instantiate(dentry, inode);
unlock_new_inode(inode);
f2fs_balance_fs(sbi);
return 0;
out_fail:
clear_inode_flag(F2FS_I(inode), FI_INC_LINK);
clear_nlink(inode);
unlock_new_inode(inode);
iput(inode);
alloc_nid_failed(sbi, inode->i_ino);
return err;
}
static int f2fs_rmdir(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
if (f2fs_empty_dir(inode))
return f2fs_unlink(dir, dentry);
return -ENOTEMPTY;
}
static int f2fs_mknod(struct inode *dir, struct dentry *dentry,
umode_t mode, dev_t rdev)
{
struct super_block *sb = dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
int err = 0;
if (!new_valid_dev(rdev))
return -EINVAL;
inode = f2fs_new_inode(dir, mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
init_special_inode(inode, inode->i_mode, rdev);
inode->i_op = &f2fs_special_inode_operations;
err = f2fs_add_link(dentry, inode);
if (err)
goto out;
alloc_nid_done(sbi, inode->i_ino);
d_instantiate(dentry, inode);
unlock_new_inode(inode);
f2fs_balance_fs(sbi);
return 0;
out:
clear_nlink(inode);
unlock_new_inode(inode);
iput(inode);
alloc_nid_failed(sbi, inode->i_ino);
return err;
}
static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
struct super_block *sb = old_dir->i_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *old_inode = old_dentry->d_inode;
struct inode *new_inode = new_dentry->d_inode;
struct page *old_dir_page;
struct page *old_page;
struct f2fs_dir_entry *old_dir_entry = NULL;
struct f2fs_dir_entry *old_entry;
struct f2fs_dir_entry *new_entry;
int err = -ENOENT;
old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
if (!old_entry)
goto out;
if (S_ISDIR(old_inode->i_mode)) {
err = -EIO;
old_dir_entry = f2fs_parent_dir(old_inode, &old_dir_page);
if (!old_dir_entry)
goto out_old;
}
mutex_lock_op(sbi, RENAME);
if (new_inode) {
struct page *new_page;
err = -ENOTEMPTY;
if (old_dir_entry && !f2fs_empty_dir(new_inode))
goto out_dir;
err = -ENOENT;
new_entry = f2fs_find_entry(new_dir, &new_dentry->d_name,
&new_page);
if (!new_entry)
goto out_dir;
f2fs_set_link(new_dir, new_entry, new_page, old_inode);
new_inode->i_ctime = CURRENT_TIME;
if (old_dir_entry)
drop_nlink(new_inode);
drop_nlink(new_inode);
if (!new_inode->i_nlink)
add_orphan_inode(sbi, new_inode->i_ino);
f2fs_write_inode(new_inode, NULL);
} else {
err = f2fs_add_link(new_dentry, old_inode);
if (err)
goto out_dir;
if (old_dir_entry) {
inc_nlink(new_dir);
f2fs_write_inode(new_dir, NULL);
}
}
old_inode->i_ctime = CURRENT_TIME;
set_inode_flag(F2FS_I(old_inode), FI_NEED_CP);
mark_inode_dirty(old_inode);
f2fs_delete_entry(old_entry, old_page, NULL);
if (old_dir_entry) {
if (old_dir != new_dir) {
f2fs_set_link(old_inode, old_dir_entry,
old_dir_page, new_dir);
} else {
kunmap(old_dir_page);
f2fs_put_page(old_dir_page, 0);
}
drop_nlink(old_dir);
f2fs_write_inode(old_dir, NULL);
}
mutex_unlock_op(sbi, RENAME);
f2fs_balance_fs(sbi);
return 0;
out_dir:
if (old_dir_entry) {
kunmap(old_dir_page);
f2fs_put_page(old_dir_page, 0);
}
mutex_unlock_op(sbi, RENAME);
out_old:
kunmap(old_page);
f2fs_put_page(old_page, 0);
out:
return err;
}
const struct inode_operations f2fs_dir_inode_operations = {
.create = f2fs_create,
.lookup = f2fs_lookup,
.link = f2fs_link,
.unlink = f2fs_unlink,
.symlink = f2fs_symlink,
.mkdir = f2fs_mkdir,
.rmdir = f2fs_rmdir,
.mknod = f2fs_mknod,
.rename = f2fs_rename,
.setattr = f2fs_setattr,
.get_acl = f2fs_get_acl,
#ifdef CONFIG_F2FS_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = f2fs_listxattr,
.removexattr = generic_removexattr,
#endif
};
const struct inode_operations f2fs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = page_follow_link_light,
.put_link = page_put_link,
.setattr = f2fs_setattr,
#ifdef CONFIG_F2FS_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = f2fs_listxattr,
.removexattr = generic_removexattr,
#endif
};
const struct inode_operations f2fs_special_inode_operations = {
.setattr = f2fs_setattr,
.get_acl = f2fs_get_acl,
#ifdef CONFIG_F2FS_FS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = f2fs_listxattr,
.removexattr = generic_removexattr,
#endif
};

1764
fs/f2fs/node.c Normal file

File diff suppressed because it is too large Load diff

353
fs/f2fs/node.h Normal file
View file

@ -0,0 +1,353 @@
/*
* fs/f2fs/node.h
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
/* start node id of a node block dedicated to the given node id */
#define START_NID(nid) ((nid / NAT_ENTRY_PER_BLOCK) * NAT_ENTRY_PER_BLOCK)
/* node block offset on the NAT area dedicated to the given start node id */
#define NAT_BLOCK_OFFSET(start_nid) (start_nid / NAT_ENTRY_PER_BLOCK)
/* # of pages to perform readahead before building free nids */
#define FREE_NID_PAGES 4
/* maximum # of free node ids to produce during build_free_nids */
#define MAX_FREE_NIDS (NAT_ENTRY_PER_BLOCK * FREE_NID_PAGES)
/* maximum readahead size for node during getting data blocks */
#define MAX_RA_NODE 128
/* maximum cached nat entries to manage memory footprint */
#define NM_WOUT_THRESHOLD (64 * NAT_ENTRY_PER_BLOCK)
/* vector size for gang look-up from nat cache that consists of radix tree */
#define NATVEC_SIZE 64
/*
* For node information
*/
struct node_info {
nid_t nid; /* node id */
nid_t ino; /* inode number of the node's owner */
block_t blk_addr; /* block address of the node */
unsigned char version; /* version of the node */
};
struct nat_entry {
struct list_head list; /* for clean or dirty nat list */
bool checkpointed; /* whether it is checkpointed or not */
struct node_info ni; /* in-memory node information */
};
#define nat_get_nid(nat) (nat->ni.nid)
#define nat_set_nid(nat, n) (nat->ni.nid = n)
#define nat_get_blkaddr(nat) (nat->ni.blk_addr)
#define nat_set_blkaddr(nat, b) (nat->ni.blk_addr = b)
#define nat_get_ino(nat) (nat->ni.ino)
#define nat_set_ino(nat, i) (nat->ni.ino = i)
#define nat_get_version(nat) (nat->ni.version)
#define nat_set_version(nat, v) (nat->ni.version = v)
#define __set_nat_cache_dirty(nm_i, ne) \
list_move_tail(&ne->list, &nm_i->dirty_nat_entries);
#define __clear_nat_cache_dirty(nm_i, ne) \
list_move_tail(&ne->list, &nm_i->nat_entries);
#define inc_node_version(version) (++version)
static inline void node_info_from_raw_nat(struct node_info *ni,
struct f2fs_nat_entry *raw_ne)
{
ni->ino = le32_to_cpu(raw_ne->ino);
ni->blk_addr = le32_to_cpu(raw_ne->block_addr);
ni->version = raw_ne->version;
}
/*
* For free nid mangement
*/
enum nid_state {
NID_NEW, /* newly added to free nid list */
NID_ALLOC /* it is allocated */
};
struct free_nid {
struct list_head list; /* for free node id list */
nid_t nid; /* node id */
int state; /* in use or not: NID_NEW or NID_ALLOC */
};
static inline int next_free_nid(struct f2fs_sb_info *sbi, nid_t *nid)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *fnid;
if (nm_i->fcnt <= 0)
return -1;
spin_lock(&nm_i->free_nid_list_lock);
fnid = list_entry(nm_i->free_nid_list.next, struct free_nid, list);
*nid = fnid->nid;
spin_unlock(&nm_i->free_nid_list_lock);
return 0;
}
/*
* inline functions
*/
static inline void get_nat_bitmap(struct f2fs_sb_info *sbi, void *addr)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
memcpy(addr, nm_i->nat_bitmap, nm_i->bitmap_size);
}
static inline pgoff_t current_nat_addr(struct f2fs_sb_info *sbi, nid_t start)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
pgoff_t block_off;
pgoff_t block_addr;
int seg_off;
block_off = NAT_BLOCK_OFFSET(start);
seg_off = block_off >> sbi->log_blocks_per_seg;
block_addr = (pgoff_t)(nm_i->nat_blkaddr +
(seg_off << sbi->log_blocks_per_seg << 1) +
(block_off & ((1 << sbi->log_blocks_per_seg) - 1)));
if (f2fs_test_bit(block_off, nm_i->nat_bitmap))
block_addr += sbi->blocks_per_seg;
return block_addr;
}
static inline pgoff_t next_nat_addr(struct f2fs_sb_info *sbi,
pgoff_t block_addr)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
block_addr -= nm_i->nat_blkaddr;
if ((block_addr >> sbi->log_blocks_per_seg) % 2)
block_addr -= sbi->blocks_per_seg;
else
block_addr += sbi->blocks_per_seg;
return block_addr + nm_i->nat_blkaddr;
}
static inline void set_to_next_nat(struct f2fs_nm_info *nm_i, nid_t start_nid)
{
unsigned int block_off = NAT_BLOCK_OFFSET(start_nid);
if (f2fs_test_bit(block_off, nm_i->nat_bitmap))
f2fs_clear_bit(block_off, nm_i->nat_bitmap);
else
f2fs_set_bit(block_off, nm_i->nat_bitmap);
}
static inline void fill_node_footer(struct page *page, nid_t nid,
nid_t ino, unsigned int ofs, bool reset)
{
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
if (reset)
memset(rn, 0, sizeof(*rn));
rn->footer.nid = cpu_to_le32(nid);
rn->footer.ino = cpu_to_le32(ino);
rn->footer.flag = cpu_to_le32(ofs << OFFSET_BIT_SHIFT);
}
static inline void copy_node_footer(struct page *dst, struct page *src)
{
void *src_addr = page_address(src);
void *dst_addr = page_address(dst);
struct f2fs_node *src_rn = (struct f2fs_node *)src_addr;
struct f2fs_node *dst_rn = (struct f2fs_node *)dst_addr;
memcpy(&dst_rn->footer, &src_rn->footer, sizeof(struct node_footer));
}
static inline void fill_node_footer_blkaddr(struct page *page, block_t blkaddr)
{
struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
rn->footer.cp_ver = ckpt->checkpoint_ver;
rn->footer.next_blkaddr = cpu_to_le32(blkaddr);
}
static inline nid_t ino_of_node(struct page *node_page)
{
void *kaddr = page_address(node_page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
return le32_to_cpu(rn->footer.ino);
}
static inline nid_t nid_of_node(struct page *node_page)
{
void *kaddr = page_address(node_page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
return le32_to_cpu(rn->footer.nid);
}
static inline unsigned int ofs_of_node(struct page *node_page)
{
void *kaddr = page_address(node_page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
unsigned flag = le32_to_cpu(rn->footer.flag);
return flag >> OFFSET_BIT_SHIFT;
}
static inline unsigned long long cpver_of_node(struct page *node_page)
{
void *kaddr = page_address(node_page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
return le64_to_cpu(rn->footer.cp_ver);
}
static inline block_t next_blkaddr_of_node(struct page *node_page)
{
void *kaddr = page_address(node_page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
return le32_to_cpu(rn->footer.next_blkaddr);
}
/*
* f2fs assigns the following node offsets described as (num).
* N = NIDS_PER_BLOCK
*
* Inode block (0)
* |- direct node (1)
* |- direct node (2)
* |- indirect node (3)
* | `- direct node (4 => 4 + N - 1)
* |- indirect node (4 + N)
* | `- direct node (5 + N => 5 + 2N - 1)
* `- double indirect node (5 + 2N)
* `- indirect node (6 + 2N)
* `- direct node (x(N + 1))
*/
static inline bool IS_DNODE(struct page *node_page)
{
unsigned int ofs = ofs_of_node(node_page);
if (ofs == 3 || ofs == 4 + NIDS_PER_BLOCK ||
ofs == 5 + 2 * NIDS_PER_BLOCK)
return false;
if (ofs >= 6 + 2 * NIDS_PER_BLOCK) {
ofs -= 6 + 2 * NIDS_PER_BLOCK;
if ((long int)ofs % (NIDS_PER_BLOCK + 1))
return false;
}
return true;
}
static inline void set_nid(struct page *p, int off, nid_t nid, bool i)
{
struct f2fs_node *rn = (struct f2fs_node *)page_address(p);
wait_on_page_writeback(p);
if (i)
rn->i.i_nid[off - NODE_DIR1_BLOCK] = cpu_to_le32(nid);
else
rn->in.nid[off] = cpu_to_le32(nid);
set_page_dirty(p);
}
static inline nid_t get_nid(struct page *p, int off, bool i)
{
struct f2fs_node *rn = (struct f2fs_node *)page_address(p);
if (i)
return le32_to_cpu(rn->i.i_nid[off - NODE_DIR1_BLOCK]);
return le32_to_cpu(rn->in.nid[off]);
}
/*
* Coldness identification:
* - Mark cold files in f2fs_inode_info
* - Mark cold node blocks in their node footer
* - Mark cold data pages in page cache
*/
static inline int is_cold_file(struct inode *inode)
{
return F2FS_I(inode)->i_advise & FADVISE_COLD_BIT;
}
static inline int is_cold_data(struct page *page)
{
return PageChecked(page);
}
static inline void set_cold_data(struct page *page)
{
SetPageChecked(page);
}
static inline void clear_cold_data(struct page *page)
{
ClearPageChecked(page);
}
static inline int is_cold_node(struct page *page)
{
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
unsigned int flag = le32_to_cpu(rn->footer.flag);
return flag & (0x1 << COLD_BIT_SHIFT);
}
static inline unsigned char is_fsync_dnode(struct page *page)
{
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
unsigned int flag = le32_to_cpu(rn->footer.flag);
return flag & (0x1 << FSYNC_BIT_SHIFT);
}
static inline unsigned char is_dent_dnode(struct page *page)
{
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
unsigned int flag = le32_to_cpu(rn->footer.flag);
return flag & (0x1 << DENT_BIT_SHIFT);
}
static inline void set_cold_node(struct inode *inode, struct page *page)
{
struct f2fs_node *rn = (struct f2fs_node *)page_address(page);
unsigned int flag = le32_to_cpu(rn->footer.flag);
if (S_ISDIR(inode->i_mode))
flag &= ~(0x1 << COLD_BIT_SHIFT);
else
flag |= (0x1 << COLD_BIT_SHIFT);
rn->footer.flag = cpu_to_le32(flag);
}
static inline void set_fsync_mark(struct page *page, int mark)
{
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
unsigned int flag = le32_to_cpu(rn->footer.flag);
if (mark)
flag |= (0x1 << FSYNC_BIT_SHIFT);
else
flag &= ~(0x1 << FSYNC_BIT_SHIFT);
rn->footer.flag = cpu_to_le32(flag);
}
static inline void set_dentry_mark(struct page *page, int mark)
{
void *kaddr = page_address(page);
struct f2fs_node *rn = (struct f2fs_node *)kaddr;
unsigned int flag = le32_to_cpu(rn->footer.flag);
if (mark)
flag |= (0x1 << DENT_BIT_SHIFT);
else
flag &= ~(0x1 << DENT_BIT_SHIFT);
rn->footer.flag = cpu_to_le32(flag);
}

375
fs/f2fs/recovery.c Normal file
View file

@ -0,0 +1,375 @@
/*
* fs/f2fs/recovery.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include "f2fs.h"
#include "node.h"
#include "segment.h"
static struct kmem_cache *fsync_entry_slab;
bool space_for_roll_forward(struct f2fs_sb_info *sbi)
{
if (sbi->last_valid_block_count + sbi->alloc_valid_block_count
> sbi->user_block_count)
return false;
return true;
}
static struct fsync_inode_entry *get_fsync_inode(struct list_head *head,
nid_t ino)
{
struct list_head *this;
struct fsync_inode_entry *entry;
list_for_each(this, head) {
entry = list_entry(this, struct fsync_inode_entry, list);
if (entry->inode->i_ino == ino)
return entry;
}
return NULL;
}
static int recover_dentry(struct page *ipage, struct inode *inode)
{
struct f2fs_node *raw_node = (struct f2fs_node *)kmap(ipage);
struct f2fs_inode *raw_inode = &(raw_node->i);
struct dentry dent, parent;
struct f2fs_dir_entry *de;
struct page *page;
struct inode *dir;
int err = 0;
if (!is_dent_dnode(ipage))
goto out;
dir = f2fs_iget(inode->i_sb, le32_to_cpu(raw_inode->i_pino));
if (IS_ERR(dir)) {
err = -EINVAL;
goto out;
}
parent.d_inode = dir;
dent.d_parent = &parent;
dent.d_name.len = le32_to_cpu(raw_inode->i_namelen);
dent.d_name.name = raw_inode->i_name;
de = f2fs_find_entry(dir, &dent.d_name, &page);
if (de) {
kunmap(page);
f2fs_put_page(page, 0);
} else {
f2fs_add_link(&dent, inode);
}
iput(dir);
out:
kunmap(ipage);
return err;
}
static int recover_inode(struct inode *inode, struct page *node_page)
{
void *kaddr = page_address(node_page);
struct f2fs_node *raw_node = (struct f2fs_node *)kaddr;
struct f2fs_inode *raw_inode = &(raw_node->i);
inode->i_mode = le16_to_cpu(raw_inode->i_mode);
i_size_write(inode, le64_to_cpu(raw_inode->i_size));
inode->i_atime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
inode->i_ctime.tv_sec = le64_to_cpu(raw_inode->i_ctime);
inode->i_mtime.tv_sec = le64_to_cpu(raw_inode->i_mtime);
inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
return recover_dentry(node_page, inode);
}
static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head)
{
unsigned long long cp_ver = le64_to_cpu(sbi->ckpt->checkpoint_ver);
struct curseg_info *curseg;
struct page *page;
block_t blkaddr;
int err = 0;
/* get node pages in the current segment */
curseg = CURSEG_I(sbi, CURSEG_WARM_NODE);
blkaddr = START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff;
/* read node page */
page = alloc_page(GFP_F2FS_ZERO);
if (IS_ERR(page))
return PTR_ERR(page);
lock_page(page);
while (1) {
struct fsync_inode_entry *entry;
if (f2fs_readpage(sbi, page, blkaddr, READ_SYNC))
goto out;
if (cp_ver != cpver_of_node(page))
goto out;
if (!is_fsync_dnode(page))
goto next;
entry = get_fsync_inode(head, ino_of_node(page));
if (entry) {
entry->blkaddr = blkaddr;
if (IS_INODE(page) && is_dent_dnode(page))
set_inode_flag(F2FS_I(entry->inode),
FI_INC_LINK);
} else {
if (IS_INODE(page) && is_dent_dnode(page)) {
if (recover_inode_page(sbi, page)) {
err = -ENOMEM;
goto out;
}
}
/* add this fsync inode to the list */
entry = kmem_cache_alloc(fsync_entry_slab, GFP_NOFS);
if (!entry) {
err = -ENOMEM;
goto out;
}
INIT_LIST_HEAD(&entry->list);
list_add_tail(&entry->list, head);
entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
if (IS_ERR(entry->inode)) {
err = PTR_ERR(entry->inode);
goto out;
}
entry->blkaddr = blkaddr;
}
if (IS_INODE(page)) {
err = recover_inode(entry->inode, page);
if (err)
goto out;
}
next:
/* check next segment */
blkaddr = next_blkaddr_of_node(page);
ClearPageUptodate(page);
}
out:
unlock_page(page);
__free_pages(page, 0);
return err;
}
static void destroy_fsync_dnodes(struct f2fs_sb_info *sbi,
struct list_head *head)
{
struct list_head *this;
struct fsync_inode_entry *entry;
list_for_each(this, head) {
entry = list_entry(this, struct fsync_inode_entry, list);
iput(entry->inode);
list_del(&entry->list);
kmem_cache_free(fsync_entry_slab, entry);
}
}
static void check_index_in_prev_nodes(struct f2fs_sb_info *sbi,
block_t blkaddr)
{
struct seg_entry *sentry;
unsigned int segno = GET_SEGNO(sbi, blkaddr);
unsigned short blkoff = GET_SEGOFF_FROM_SEG0(sbi, blkaddr) &
(sbi->blocks_per_seg - 1);
struct f2fs_summary sum;
nid_t ino;
void *kaddr;
struct inode *inode;
struct page *node_page;
block_t bidx;
int i;
sentry = get_seg_entry(sbi, segno);
if (!f2fs_test_bit(blkoff, sentry->cur_valid_map))
return;
/* Get the previous summary */
for (i = CURSEG_WARM_DATA; i <= CURSEG_COLD_DATA; i++) {
struct curseg_info *curseg = CURSEG_I(sbi, i);
if (curseg->segno == segno) {
sum = curseg->sum_blk->entries[blkoff];
break;
}
}
if (i > CURSEG_COLD_DATA) {
struct page *sum_page = get_sum_page(sbi, segno);
struct f2fs_summary_block *sum_node;
kaddr = page_address(sum_page);
sum_node = (struct f2fs_summary_block *)kaddr;
sum = sum_node->entries[blkoff];
f2fs_put_page(sum_page, 1);
}
/* Get the node page */
node_page = get_node_page(sbi, le32_to_cpu(sum.nid));
bidx = start_bidx_of_node(ofs_of_node(node_page)) +
le16_to_cpu(sum.ofs_in_node);
ino = ino_of_node(node_page);
f2fs_put_page(node_page, 1);
/* Deallocate previous index in the node page */
inode = f2fs_iget_nowait(sbi->sb, ino);
truncate_hole(inode, bidx, bidx + 1);
iput(inode);
}
static void do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
struct page *page, block_t blkaddr)
{
unsigned int start, end;
struct dnode_of_data dn;
struct f2fs_summary sum;
struct node_info ni;
start = start_bidx_of_node(ofs_of_node(page));
if (IS_INODE(page))
end = start + ADDRS_PER_INODE;
else
end = start + ADDRS_PER_BLOCK;
set_new_dnode(&dn, inode, NULL, NULL, 0);
if (get_dnode_of_data(&dn, start, 0))
return;
wait_on_page_writeback(dn.node_page);
get_node_info(sbi, dn.nid, &ni);
BUG_ON(ni.ino != ino_of_node(page));
BUG_ON(ofs_of_node(dn.node_page) != ofs_of_node(page));
for (; start < end; start++) {
block_t src, dest;
src = datablock_addr(dn.node_page, dn.ofs_in_node);
dest = datablock_addr(page, dn.ofs_in_node);
if (src != dest && dest != NEW_ADDR && dest != NULL_ADDR) {
if (src == NULL_ADDR) {
int err = reserve_new_block(&dn);
/* We should not get -ENOSPC */
BUG_ON(err);
}
/* Check the previous node page having this index */
check_index_in_prev_nodes(sbi, dest);
set_summary(&sum, dn.nid, dn.ofs_in_node, ni.version);
/* write dummy data page */
recover_data_page(sbi, NULL, &sum, src, dest);
update_extent_cache(dest, &dn);
}
dn.ofs_in_node++;
}
/* write node page in place */
set_summary(&sum, dn.nid, 0, 0);
if (IS_INODE(dn.node_page))
sync_inode_page(&dn);
copy_node_footer(dn.node_page, page);
fill_node_footer(dn.node_page, dn.nid, ni.ino,
ofs_of_node(page), false);
set_page_dirty(dn.node_page);
recover_node_page(sbi, dn.node_page, &sum, &ni, blkaddr);
f2fs_put_dnode(&dn);
}
static void recover_data(struct f2fs_sb_info *sbi,
struct list_head *head, int type)
{
unsigned long long cp_ver = le64_to_cpu(sbi->ckpt->checkpoint_ver);
struct curseg_info *curseg;
struct page *page;
block_t blkaddr;
/* get node pages in the current segment */
curseg = CURSEG_I(sbi, type);
blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
/* read node page */
page = alloc_page(GFP_NOFS | __GFP_ZERO);
if (IS_ERR(page))
return;
lock_page(page);
while (1) {
struct fsync_inode_entry *entry;
if (f2fs_readpage(sbi, page, blkaddr, READ_SYNC))
goto out;
if (cp_ver != cpver_of_node(page))
goto out;
entry = get_fsync_inode(head, ino_of_node(page));
if (!entry)
goto next;
do_recover_data(sbi, entry->inode, page, blkaddr);
if (entry->blkaddr == blkaddr) {
iput(entry->inode);
list_del(&entry->list);
kmem_cache_free(fsync_entry_slab, entry);
}
next:
/* check next segment */
blkaddr = next_blkaddr_of_node(page);
ClearPageUptodate(page);
}
out:
unlock_page(page);
__free_pages(page, 0);
allocate_new_segments(sbi);
}
void recover_fsync_data(struct f2fs_sb_info *sbi)
{
struct list_head inode_list;
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
sizeof(struct fsync_inode_entry), NULL);
if (unlikely(!fsync_entry_slab))
return;
INIT_LIST_HEAD(&inode_list);
/* step #1: find fsynced inode numbers */
if (find_fsync_dnodes(sbi, &inode_list))
goto out;
if (list_empty(&inode_list))
goto out;
/* step #2: recover data */
sbi->por_doing = 1;
recover_data(sbi, &inode_list, CURSEG_WARM_NODE);
sbi->por_doing = 0;
BUG_ON(!list_empty(&inode_list));
out:
destroy_fsync_dnodes(sbi, &inode_list);
kmem_cache_destroy(fsync_entry_slab);
write_checkpoint(sbi, false, false);
}

1791
fs/f2fs/segment.c Normal file

File diff suppressed because it is too large Load diff

618
fs/f2fs/segment.h Normal file
View file

@ -0,0 +1,618 @@
/*
* fs/f2fs/segment.h
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
/* constant macro */
#define NULL_SEGNO ((unsigned int)(~0))
/* V: Logical segment # in volume, R: Relative segment # in main area */
#define GET_L2R_SEGNO(free_i, segno) (segno - free_i->start_segno)
#define GET_R2L_SEGNO(free_i, segno) (segno + free_i->start_segno)
#define IS_DATASEG(t) \
((t == CURSEG_HOT_DATA) || (t == CURSEG_COLD_DATA) || \
(t == CURSEG_WARM_DATA))
#define IS_NODESEG(t) \
((t == CURSEG_HOT_NODE) || (t == CURSEG_COLD_NODE) || \
(t == CURSEG_WARM_NODE))
#define IS_CURSEG(sbi, segno) \
((segno == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno) || \
(segno == CURSEG_I(sbi, CURSEG_WARM_DATA)->segno) || \
(segno == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno) || \
(segno == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno) || \
(segno == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno) || \
(segno == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno))
#define IS_CURSEC(sbi, secno) \
((secno == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno / \
sbi->segs_per_sec) || \
(secno == CURSEG_I(sbi, CURSEG_WARM_DATA)->segno / \
sbi->segs_per_sec) || \
(secno == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno / \
sbi->segs_per_sec) || \
(secno == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno / \
sbi->segs_per_sec) || \
(secno == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno / \
sbi->segs_per_sec) || \
(secno == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \
sbi->segs_per_sec)) \
#define START_BLOCK(sbi, segno) \
(SM_I(sbi)->seg0_blkaddr + \
(GET_R2L_SEGNO(FREE_I(sbi), segno) << sbi->log_blocks_per_seg))
#define NEXT_FREE_BLKADDR(sbi, curseg) \
(START_BLOCK(sbi, curseg->segno) + curseg->next_blkoff)
#define MAIN_BASE_BLOCK(sbi) (SM_I(sbi)->main_blkaddr)
#define GET_SEGOFF_FROM_SEG0(sbi, blk_addr) \
((blk_addr) - SM_I(sbi)->seg0_blkaddr)
#define GET_SEGNO_FROM_SEG0(sbi, blk_addr) \
(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) >> sbi->log_blocks_per_seg)
#define GET_SEGNO(sbi, blk_addr) \
(((blk_addr == NULL_ADDR) || (blk_addr == NEW_ADDR)) ? \
NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \
GET_SEGNO_FROM_SEG0(sbi, blk_addr)))
#define GET_SECNO(sbi, segno) \
((segno) / sbi->segs_per_sec)
#define GET_ZONENO_FROM_SEGNO(sbi, segno) \
((segno / sbi->segs_per_sec) / sbi->secs_per_zone)
#define GET_SUM_BLOCK(sbi, segno) \
((sbi->sm_info->ssa_blkaddr) + segno)
#define GET_SUM_TYPE(footer) ((footer)->entry_type)
#define SET_SUM_TYPE(footer, type) ((footer)->entry_type = type)
#define SIT_ENTRY_OFFSET(sit_i, segno) \
(segno % sit_i->sents_per_block)
#define SIT_BLOCK_OFFSET(sit_i, segno) \
(segno / SIT_ENTRY_PER_BLOCK)
#define START_SEGNO(sit_i, segno) \
(SIT_BLOCK_OFFSET(sit_i, segno) * SIT_ENTRY_PER_BLOCK)
#define f2fs_bitmap_size(nr) \
(BITS_TO_LONGS(nr) * sizeof(unsigned long))
#define TOTAL_SEGS(sbi) (SM_I(sbi)->main_segments)
#define SECTOR_FROM_BLOCK(sbi, blk_addr) \
(blk_addr << ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
/* during checkpoint, bio_private is used to synchronize the last bio */
struct bio_private {
struct f2fs_sb_info *sbi;
bool is_sync;
void *wait;
};
/*
* indicate a block allocation direction: RIGHT and LEFT.
* RIGHT means allocating new sections towards the end of volume.
* LEFT means the opposite direction.
*/
enum {
ALLOC_RIGHT = 0,
ALLOC_LEFT
};
/*
* In the victim_sel_policy->alloc_mode, there are two block allocation modes.
* LFS writes data sequentially with cleaning operations.
* SSR (Slack Space Recycle) reuses obsolete space without cleaning operations.
*/
enum {
LFS = 0,
SSR
};
/*
* In the victim_sel_policy->gc_mode, there are two gc, aka cleaning, modes.
* GC_CB is based on cost-benefit algorithm.
* GC_GREEDY is based on greedy algorithm.
*/
enum {
GC_CB = 0,
GC_GREEDY
};
/*
* BG_GC means the background cleaning job.
* FG_GC means the on-demand cleaning job.
*/
enum {
BG_GC = 0,
FG_GC
};
/* for a function parameter to select a victim segment */
struct victim_sel_policy {
int alloc_mode; /* LFS or SSR */
int gc_mode; /* GC_CB or GC_GREEDY */
unsigned long *dirty_segmap; /* dirty segment bitmap */
unsigned int offset; /* last scanned bitmap offset */
unsigned int ofs_unit; /* bitmap search unit */
unsigned int min_cost; /* minimum cost */
unsigned int min_segno; /* segment # having min. cost */
};
struct seg_entry {
unsigned short valid_blocks; /* # of valid blocks */
unsigned char *cur_valid_map; /* validity bitmap of blocks */
/*
* # of valid blocks and the validity bitmap stored in the the last
* checkpoint pack. This information is used by the SSR mode.
*/
unsigned short ckpt_valid_blocks;
unsigned char *ckpt_valid_map;
unsigned char type; /* segment type like CURSEG_XXX_TYPE */
unsigned long long mtime; /* modification time of the segment */
};
struct sec_entry {
unsigned int valid_blocks; /* # of valid blocks in a section */
};
struct segment_allocation {
void (*allocate_segment)(struct f2fs_sb_info *, int, bool);
};
struct sit_info {
const struct segment_allocation *s_ops;
block_t sit_base_addr; /* start block address of SIT area */
block_t sit_blocks; /* # of blocks used by SIT area */
block_t written_valid_blocks; /* # of valid blocks in main area */
char *sit_bitmap; /* SIT bitmap pointer */
unsigned int bitmap_size; /* SIT bitmap size */
unsigned long *dirty_sentries_bitmap; /* bitmap for dirty sentries */
unsigned int dirty_sentries; /* # of dirty sentries */
unsigned int sents_per_block; /* # of SIT entries per block */
struct mutex sentry_lock; /* to protect SIT cache */
struct seg_entry *sentries; /* SIT segment-level cache */
struct sec_entry *sec_entries; /* SIT section-level cache */
/* for cost-benefit algorithm in cleaning procedure */
unsigned long long elapsed_time; /* elapsed time after mount */
unsigned long long mounted_time; /* mount time */
unsigned long long min_mtime; /* min. modification time */
unsigned long long max_mtime; /* max. modification time */
};
struct free_segmap_info {
unsigned int start_segno; /* start segment number logically */
unsigned int free_segments; /* # of free segments */
unsigned int free_sections; /* # of free sections */
rwlock_t segmap_lock; /* free segmap lock */
unsigned long *free_segmap; /* free segment bitmap */
unsigned long *free_secmap; /* free section bitmap */
};
/* Notice: The order of dirty type is same with CURSEG_XXX in f2fs.h */
enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};
struct dirty_seglist_info {
const struct victim_selection *v_ops; /* victim selction operation */
unsigned long *dirty_segmap[NR_DIRTY_TYPE];
struct mutex seglist_lock; /* lock for segment bitmaps */
int nr_dirty[NR_DIRTY_TYPE]; /* # of dirty segments */
unsigned long *victim_segmap[2]; /* BG_GC, FG_GC */
};
/* victim selection function for cleaning and SSR */
struct victim_selection {
int (*get_victim)(struct f2fs_sb_info *, unsigned int *,
int, int, char);
};
/* for active log information */
struct curseg_info {
struct mutex curseg_mutex; /* lock for consistency */
struct f2fs_summary_block *sum_blk; /* cached summary block */
unsigned char alloc_type; /* current allocation type */
unsigned int segno; /* current segment number */
unsigned short next_blkoff; /* next block offset to write */
unsigned int zone; /* current zone number */
unsigned int next_segno; /* preallocated segment */
};
/*
* inline functions
*/
static inline struct curseg_info *CURSEG_I(struct f2fs_sb_info *sbi, int type)
{
return (struct curseg_info *)(SM_I(sbi)->curseg_array + type);
}
static inline struct seg_entry *get_seg_entry(struct f2fs_sb_info *sbi,
unsigned int segno)
{
struct sit_info *sit_i = SIT_I(sbi);
return &sit_i->sentries[segno];
}
static inline struct sec_entry *get_sec_entry(struct f2fs_sb_info *sbi,
unsigned int segno)
{
struct sit_info *sit_i = SIT_I(sbi);
return &sit_i->sec_entries[GET_SECNO(sbi, segno)];
}
static inline unsigned int get_valid_blocks(struct f2fs_sb_info *sbi,
unsigned int segno, int section)
{
/*
* In order to get # of valid blocks in a section instantly from many
* segments, f2fs manages two counting structures separately.
*/
if (section > 1)
return get_sec_entry(sbi, segno)->valid_blocks;
else
return get_seg_entry(sbi, segno)->valid_blocks;
}
static inline void seg_info_from_raw_sit(struct seg_entry *se,
struct f2fs_sit_entry *rs)
{
se->valid_blocks = GET_SIT_VBLOCKS(rs);
se->ckpt_valid_blocks = GET_SIT_VBLOCKS(rs);
memcpy(se->cur_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
se->type = GET_SIT_TYPE(rs);
se->mtime = le64_to_cpu(rs->mtime);
}
static inline void seg_info_to_raw_sit(struct seg_entry *se,
struct f2fs_sit_entry *rs)
{
unsigned short raw_vblocks = (se->type << SIT_VBLOCKS_SHIFT) |
se->valid_blocks;
rs->vblocks = cpu_to_le16(raw_vblocks);
memcpy(rs->valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
se->ckpt_valid_blocks = se->valid_blocks;
rs->mtime = cpu_to_le64(se->mtime);
}
static inline unsigned int find_next_inuse(struct free_segmap_info *free_i,
unsigned int max, unsigned int segno)
{
unsigned int ret;
read_lock(&free_i->segmap_lock);
ret = find_next_bit(free_i->free_segmap, max, segno);
read_unlock(&free_i->segmap_lock);
return ret;
}
static inline void __set_free(struct f2fs_sb_info *sbi, unsigned int segno)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int secno = segno / sbi->segs_per_sec;
unsigned int start_segno = secno * sbi->segs_per_sec;
unsigned int next;
write_lock(&free_i->segmap_lock);
clear_bit(segno, free_i->free_segmap);
free_i->free_segments++;
next = find_next_bit(free_i->free_segmap, TOTAL_SEGS(sbi), start_segno);
if (next >= start_segno + sbi->segs_per_sec) {
clear_bit(secno, free_i->free_secmap);
free_i->free_sections++;
}
write_unlock(&free_i->segmap_lock);
}
static inline void __set_inuse(struct f2fs_sb_info *sbi,
unsigned int segno)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int secno = segno / sbi->segs_per_sec;
set_bit(segno, free_i->free_segmap);
free_i->free_segments--;
if (!test_and_set_bit(secno, free_i->free_secmap))
free_i->free_sections--;
}
static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
unsigned int segno)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int secno = segno / sbi->segs_per_sec;
unsigned int start_segno = secno * sbi->segs_per_sec;
unsigned int next;
write_lock(&free_i->segmap_lock);
if (test_and_clear_bit(segno, free_i->free_segmap)) {
free_i->free_segments++;
next = find_next_bit(free_i->free_segmap, TOTAL_SEGS(sbi),
start_segno);
if (next >= start_segno + sbi->segs_per_sec) {
if (test_and_clear_bit(secno, free_i->free_secmap))
free_i->free_sections++;
}
}
write_unlock(&free_i->segmap_lock);
}
static inline void __set_test_and_inuse(struct f2fs_sb_info *sbi,
unsigned int segno)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int secno = segno / sbi->segs_per_sec;
write_lock(&free_i->segmap_lock);
if (!test_and_set_bit(segno, free_i->free_segmap)) {
free_i->free_segments--;
if (!test_and_set_bit(secno, free_i->free_secmap))
free_i->free_sections--;
}
write_unlock(&free_i->segmap_lock);
}
static inline void get_sit_bitmap(struct f2fs_sb_info *sbi,
void *dst_addr)
{
struct sit_info *sit_i = SIT_I(sbi);
memcpy(dst_addr, sit_i->sit_bitmap, sit_i->bitmap_size);
}
static inline block_t written_block_count(struct f2fs_sb_info *sbi)
{
struct sit_info *sit_i = SIT_I(sbi);
block_t vblocks;
mutex_lock(&sit_i->sentry_lock);
vblocks = sit_i->written_valid_blocks;
mutex_unlock(&sit_i->sentry_lock);
return vblocks;
}
static inline unsigned int free_segments(struct f2fs_sb_info *sbi)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int free_segs;
read_lock(&free_i->segmap_lock);
free_segs = free_i->free_segments;
read_unlock(&free_i->segmap_lock);
return free_segs;
}
static inline int reserved_segments(struct f2fs_sb_info *sbi)
{
return SM_I(sbi)->reserved_segments;
}
static inline unsigned int free_sections(struct f2fs_sb_info *sbi)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int free_secs;
read_lock(&free_i->segmap_lock);
free_secs = free_i->free_sections;
read_unlock(&free_i->segmap_lock);
return free_secs;
}
static inline unsigned int prefree_segments(struct f2fs_sb_info *sbi)
{
return DIRTY_I(sbi)->nr_dirty[PRE];
}
static inline unsigned int dirty_segments(struct f2fs_sb_info *sbi)
{
return DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_DATA] +
DIRTY_I(sbi)->nr_dirty[DIRTY_WARM_DATA] +
DIRTY_I(sbi)->nr_dirty[DIRTY_COLD_DATA] +
DIRTY_I(sbi)->nr_dirty[DIRTY_HOT_NODE] +
DIRTY_I(sbi)->nr_dirty[DIRTY_WARM_NODE] +
DIRTY_I(sbi)->nr_dirty[DIRTY_COLD_NODE];
}
static inline int overprovision_segments(struct f2fs_sb_info *sbi)
{
return SM_I(sbi)->ovp_segments;
}
static inline int overprovision_sections(struct f2fs_sb_info *sbi)
{
return ((unsigned int) overprovision_segments(sbi)) / sbi->segs_per_sec;
}
static inline int reserved_sections(struct f2fs_sb_info *sbi)
{
return ((unsigned int) reserved_segments(sbi)) / sbi->segs_per_sec;
}
static inline bool need_SSR(struct f2fs_sb_info *sbi)
{
return (free_sections(sbi) < overprovision_sections(sbi));
}
static inline int get_ssr_segment(struct f2fs_sb_info *sbi, int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
return DIRTY_I(sbi)->v_ops->get_victim(sbi,
&(curseg)->next_segno, BG_GC, type, SSR);
}
static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi)
{
return free_sections(sbi) <= reserved_sections(sbi);
}
static inline int utilization(struct f2fs_sb_info *sbi)
{
return (long int)valid_user_blocks(sbi) * 100 /
(long int)sbi->user_block_count;
}
/*
* Sometimes f2fs may be better to drop out-of-place update policy.
* So, if fs utilization is over MIN_IPU_UTIL, then f2fs tries to write
* data in the original place likewise other traditional file systems.
* But, currently set 100 in percentage, which means it is disabled.
* See below need_inplace_update().
*/
#define MIN_IPU_UTIL 100
static inline bool need_inplace_update(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
if (S_ISDIR(inode->i_mode))
return false;
if (need_SSR(sbi) && utilization(sbi) > MIN_IPU_UTIL)
return true;
return false;
}
static inline unsigned int curseg_segno(struct f2fs_sb_info *sbi,
int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
return curseg->segno;
}
static inline unsigned char curseg_alloc_type(struct f2fs_sb_info *sbi,
int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
return curseg->alloc_type;
}
static inline unsigned short curseg_blkoff(struct f2fs_sb_info *sbi, int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
return curseg->next_blkoff;
}
static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno)
{
unsigned int end_segno = SM_I(sbi)->segment_count - 1;
BUG_ON(segno > end_segno);
}
/*
* This function is used for only debugging.
* NOTE: In future, we have to remove this function.
*/
static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
{
struct f2fs_sm_info *sm_info = SM_I(sbi);
block_t total_blks = sm_info->segment_count << sbi->log_blocks_per_seg;
block_t start_addr = sm_info->seg0_blkaddr;
block_t end_addr = start_addr + total_blks - 1;
BUG_ON(blk_addr < start_addr);
BUG_ON(blk_addr > end_addr);
}
/*
* Summary block is always treated as invalid block
*/
static inline void check_block_count(struct f2fs_sb_info *sbi,
int segno, struct f2fs_sit_entry *raw_sit)
{
struct f2fs_sm_info *sm_info = SM_I(sbi);
unsigned int end_segno = sm_info->segment_count - 1;
int valid_blocks = 0;
int i;
/* check segment usage */
BUG_ON(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg);
/* check boundary of a given segment number */
BUG_ON(segno > end_segno);
/* check bitmap with valid block count */
for (i = 0; i < sbi->blocks_per_seg; i++)
if (f2fs_test_bit(i, raw_sit->valid_map))
valid_blocks++;
BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks);
}
static inline pgoff_t current_sit_addr(struct f2fs_sb_info *sbi,
unsigned int start)
{
struct sit_info *sit_i = SIT_I(sbi);
unsigned int offset = SIT_BLOCK_OFFSET(sit_i, start);
block_t blk_addr = sit_i->sit_base_addr + offset;
check_seg_range(sbi, start);
/* calculate sit block address */
if (f2fs_test_bit(offset, sit_i->sit_bitmap))
blk_addr += sit_i->sit_blocks;
return blk_addr;
}
static inline pgoff_t next_sit_addr(struct f2fs_sb_info *sbi,
pgoff_t block_addr)
{
struct sit_info *sit_i = SIT_I(sbi);
block_addr -= sit_i->sit_base_addr;
if (block_addr < sit_i->sit_blocks)
block_addr += sit_i->sit_blocks;
else
block_addr -= sit_i->sit_blocks;
return block_addr + sit_i->sit_base_addr;
}
static inline void set_to_next_sit(struct sit_info *sit_i, unsigned int start)
{
unsigned int block_off = SIT_BLOCK_OFFSET(sit_i, start);
if (f2fs_test_bit(block_off, sit_i->sit_bitmap))
f2fs_clear_bit(block_off, sit_i->sit_bitmap);
else
f2fs_set_bit(block_off, sit_i->sit_bitmap);
}
static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi)
{
struct sit_info *sit_i = SIT_I(sbi);
return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec -
sit_i->mounted_time;
}
static inline void set_summary(struct f2fs_summary *sum, nid_t nid,
unsigned int ofs_in_node, unsigned char version)
{
sum->nid = cpu_to_le32(nid);
sum->ofs_in_node = cpu_to_le16(ofs_in_node);
sum->version = version;
}
static inline block_t start_sum_block(struct f2fs_sb_info *sbi)
{
return __start_cp_addr(sbi) +
le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_start_sum);
}
static inline block_t sum_blk_addr(struct f2fs_sb_info *sbi, int base, int type)
{
return __start_cp_addr(sbi) +
le32_to_cpu(F2FS_CKPT(sbi)->cp_pack_total_block_count)
- (base + 1) + type;
}

657
fs/f2fs/super.c Normal file
View file

@ -0,0 +1,657 @@
/*
* fs/f2fs/super.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/module.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/statfs.h>
#include <linux/proc_fs.h>
#include <linux/buffer_head.h>
#include <linux/backing-dev.h>
#include <linux/kthread.h>
#include <linux/parser.h>
#include <linux/mount.h>
#include <linux/seq_file.h>
#include <linux/random.h>
#include <linux/exportfs.h>
#include <linux/f2fs_fs.h>
#include "f2fs.h"
#include "node.h"
#include "xattr.h"
static struct kmem_cache *f2fs_inode_cachep;
enum {
Opt_gc_background_off,
Opt_disable_roll_forward,
Opt_discard,
Opt_noheap,
Opt_nouser_xattr,
Opt_noacl,
Opt_active_logs,
Opt_disable_ext_identify,
Opt_err,
};
static match_table_t f2fs_tokens = {
{Opt_gc_background_off, "background_gc_off"},
{Opt_disable_roll_forward, "disable_roll_forward"},
{Opt_discard, "discard"},
{Opt_noheap, "no_heap"},
{Opt_nouser_xattr, "nouser_xattr"},
{Opt_noacl, "noacl"},
{Opt_active_logs, "active_logs=%u"},
{Opt_disable_ext_identify, "disable_ext_identify"},
{Opt_err, NULL},
};
static void init_once(void *foo)
{
struct f2fs_inode_info *fi = (struct f2fs_inode_info *) foo;
inode_init_once(&fi->vfs_inode);
}
static struct inode *f2fs_alloc_inode(struct super_block *sb)
{
struct f2fs_inode_info *fi;
fi = kmem_cache_alloc(f2fs_inode_cachep, GFP_NOFS | __GFP_ZERO);
if (!fi)
return NULL;
init_once((void *) fi);
/* Initilize f2fs-specific inode info */
fi->vfs_inode.i_version = 1;
atomic_set(&fi->dirty_dents, 0);
fi->i_current_depth = 1;
fi->i_advise = 0;
rwlock_init(&fi->ext.ext_lock);
set_inode_flag(fi, FI_NEW_INODE);
return &fi->vfs_inode;
}
static void f2fs_i_callback(struct rcu_head *head)
{
struct inode *inode = container_of(head, struct inode, i_rcu);
kmem_cache_free(f2fs_inode_cachep, F2FS_I(inode));
}
static void f2fs_destroy_inode(struct inode *inode)
{
call_rcu(&inode->i_rcu, f2fs_i_callback);
}
static void f2fs_put_super(struct super_block *sb)
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
f2fs_destroy_stats(sbi);
stop_gc_thread(sbi);
write_checkpoint(sbi, false, true);
iput(sbi->node_inode);
iput(sbi->meta_inode);
/* destroy f2fs internal modules */
destroy_node_manager(sbi);
destroy_segment_manager(sbi);
kfree(sbi->ckpt);
sb->s_fs_info = NULL;
brelse(sbi->raw_super_buf);
kfree(sbi);
}
int f2fs_sync_fs(struct super_block *sb, int sync)
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
int ret = 0;
if (!sbi->s_dirty && !get_pages(sbi, F2FS_DIRTY_NODES))
return 0;
if (sync)
write_checkpoint(sbi, false, false);
return ret;
}
static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
block_t total_count, user_block_count, start_count, ovp_count;
total_count = le64_to_cpu(sbi->raw_super->block_count);
user_block_count = sbi->user_block_count;
start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr);
ovp_count = SM_I(sbi)->ovp_segments << sbi->log_blocks_per_seg;
buf->f_type = F2FS_SUPER_MAGIC;
buf->f_bsize = sbi->blocksize;
buf->f_blocks = total_count - start_count;
buf->f_bfree = buf->f_blocks - valid_user_blocks(sbi) - ovp_count;
buf->f_bavail = user_block_count - valid_user_blocks(sbi);
buf->f_files = valid_inode_count(sbi);
buf->f_ffree = sbi->total_node_count - valid_node_count(sbi);
buf->f_namelen = F2FS_MAX_NAME_LEN;
buf->f_fsid.val[0] = (u32)id;
buf->f_fsid.val[1] = (u32)(id >> 32);
return 0;
}
static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
{
struct f2fs_sb_info *sbi = F2FS_SB(root->d_sb);
if (test_opt(sbi, BG_GC))
seq_puts(seq, ",background_gc_on");
else
seq_puts(seq, ",background_gc_off");
if (test_opt(sbi, DISABLE_ROLL_FORWARD))
seq_puts(seq, ",disable_roll_forward");
if (test_opt(sbi, DISCARD))
seq_puts(seq, ",discard");
if (test_opt(sbi, NOHEAP))
seq_puts(seq, ",no_heap_alloc");
#ifdef CONFIG_F2FS_FS_XATTR
if (test_opt(sbi, XATTR_USER))
seq_puts(seq, ",user_xattr");
else
seq_puts(seq, ",nouser_xattr");
#endif
#ifdef CONFIG_F2FS_FS_POSIX_ACL
if (test_opt(sbi, POSIX_ACL))
seq_puts(seq, ",acl");
else
seq_puts(seq, ",noacl");
#endif
if (test_opt(sbi, DISABLE_EXT_IDENTIFY))
seq_puts(seq, ",disable_ext_indentify");
seq_printf(seq, ",active_logs=%u", sbi->active_logs);
return 0;
}
static struct super_operations f2fs_sops = {
.alloc_inode = f2fs_alloc_inode,
.destroy_inode = f2fs_destroy_inode,
.write_inode = f2fs_write_inode,
.show_options = f2fs_show_options,
.evict_inode = f2fs_evict_inode,
.put_super = f2fs_put_super,
.sync_fs = f2fs_sync_fs,
.statfs = f2fs_statfs,
};
static struct inode *f2fs_nfs_get_inode(struct super_block *sb,
u64 ino, u32 generation)
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
if (ino < F2FS_ROOT_INO(sbi))
return ERR_PTR(-ESTALE);
/*
* f2fs_iget isn't quite right if the inode is currently unallocated!
* However f2fs_iget currently does appropriate checks to handle stale
* inodes so everything is OK.
*/
inode = f2fs_iget(sb, ino);
if (IS_ERR(inode))
return ERR_CAST(inode);
if (generation && inode->i_generation != generation) {
/* we didn't find the right inode.. */
iput(inode);
return ERR_PTR(-ESTALE);
}
return inode;
}
static struct dentry *f2fs_fh_to_dentry(struct super_block *sb, struct fid *fid,
int fh_len, int fh_type)
{
return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
f2fs_nfs_get_inode);
}
static struct dentry *f2fs_fh_to_parent(struct super_block *sb, struct fid *fid,
int fh_len, int fh_type)
{
return generic_fh_to_parent(sb, fid, fh_len, fh_type,
f2fs_nfs_get_inode);
}
static const struct export_operations f2fs_export_ops = {
.fh_to_dentry = f2fs_fh_to_dentry,
.fh_to_parent = f2fs_fh_to_parent,
.get_parent = f2fs_get_parent,
};
static int parse_options(struct f2fs_sb_info *sbi, char *options)
{
substring_t args[MAX_OPT_ARGS];
char *p;
int arg = 0;
if (!options)
return 0;
while ((p = strsep(&options, ",")) != NULL) {
int token;
if (!*p)
continue;
/*
* Initialize args struct so we know whether arg was
* found; some options take optional arguments.
*/
args[0].to = args[0].from = NULL;
token = match_token(p, f2fs_tokens, args);
switch (token) {
case Opt_gc_background_off:
clear_opt(sbi, BG_GC);
break;
case Opt_disable_roll_forward:
set_opt(sbi, DISABLE_ROLL_FORWARD);
break;
case Opt_discard:
set_opt(sbi, DISCARD);
break;
case Opt_noheap:
set_opt(sbi, NOHEAP);
break;
#ifdef CONFIG_F2FS_FS_XATTR
case Opt_nouser_xattr:
clear_opt(sbi, XATTR_USER);
break;
#else
case Opt_nouser_xattr:
pr_info("nouser_xattr options not supported\n");
break;
#endif
#ifdef CONFIG_F2FS_FS_POSIX_ACL
case Opt_noacl:
clear_opt(sbi, POSIX_ACL);
break;
#else
case Opt_noacl:
pr_info("noacl options not supported\n");
break;
#endif
case Opt_active_logs:
if (args->from && match_int(args, &arg))
return -EINVAL;
if (arg != 2 && arg != 4 && arg != 6)
return -EINVAL;
sbi->active_logs = arg;
break;
case Opt_disable_ext_identify:
set_opt(sbi, DISABLE_EXT_IDENTIFY);
break;
default:
pr_err("Unrecognized mount option \"%s\" or missing value\n",
p);
return -EINVAL;
}
}
return 0;
}
static loff_t max_file_size(unsigned bits)
{
loff_t result = ADDRS_PER_INODE;
loff_t leaf_count = ADDRS_PER_BLOCK;
/* two direct node blocks */
result += (leaf_count * 2);
/* two indirect node blocks */
leaf_count *= NIDS_PER_BLOCK;
result += (leaf_count * 2);
/* one double indirect node block */
leaf_count *= NIDS_PER_BLOCK;
result += leaf_count;
result <<= bits;
return result;
}
static int sanity_check_raw_super(struct f2fs_super_block *raw_super)
{
unsigned int blocksize;
if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic))
return 1;
/* Currently, support only 4KB block size */
blocksize = 1 << le32_to_cpu(raw_super->log_blocksize);
if (blocksize != PAGE_CACHE_SIZE)
return 1;
if (le32_to_cpu(raw_super->log_sectorsize) !=
F2FS_LOG_SECTOR_SIZE)
return 1;
if (le32_to_cpu(raw_super->log_sectors_per_block) !=
F2FS_LOG_SECTORS_PER_BLOCK)
return 1;
return 0;
}
static int sanity_check_ckpt(struct f2fs_super_block *raw_super,
struct f2fs_checkpoint *ckpt)
{
unsigned int total, fsmeta;
total = le32_to_cpu(raw_super->segment_count);
fsmeta = le32_to_cpu(raw_super->segment_count_ckpt);
fsmeta += le32_to_cpu(raw_super->segment_count_sit);
fsmeta += le32_to_cpu(raw_super->segment_count_nat);
fsmeta += le32_to_cpu(ckpt->rsvd_segment_count);
fsmeta += le32_to_cpu(raw_super->segment_count_ssa);
if (fsmeta >= total)
return 1;
return 0;
}
static void init_sb_info(struct f2fs_sb_info *sbi)
{
struct f2fs_super_block *raw_super = sbi->raw_super;
int i;
sbi->log_sectors_per_block =
le32_to_cpu(raw_super->log_sectors_per_block);
sbi->log_blocksize = le32_to_cpu(raw_super->log_blocksize);
sbi->blocksize = 1 << sbi->log_blocksize;
sbi->log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
sbi->blocks_per_seg = 1 << sbi->log_blocks_per_seg;
sbi->segs_per_sec = le32_to_cpu(raw_super->segs_per_sec);
sbi->secs_per_zone = le32_to_cpu(raw_super->secs_per_zone);
sbi->total_sections = le32_to_cpu(raw_super->section_count);
sbi->total_node_count =
(le32_to_cpu(raw_super->segment_count_nat) / 2)
* sbi->blocks_per_seg * NAT_ENTRY_PER_BLOCK;
sbi->root_ino_num = le32_to_cpu(raw_super->root_ino);
sbi->node_ino_num = le32_to_cpu(raw_super->node_ino);
sbi->meta_ino_num = le32_to_cpu(raw_super->meta_ino);
for (i = 0; i < NR_COUNT_TYPE; i++)
atomic_set(&sbi->nr_pages[i], 0);
}
static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
{
struct f2fs_sb_info *sbi;
struct f2fs_super_block *raw_super;
struct buffer_head *raw_super_buf;
struct inode *root;
long err = -EINVAL;
int i;
/* allocate memory for f2fs-specific super block info */
sbi = kzalloc(sizeof(struct f2fs_sb_info), GFP_KERNEL);
if (!sbi)
return -ENOMEM;
/* set a temporary block size */
if (!sb_set_blocksize(sb, F2FS_BLKSIZE))
goto free_sbi;
/* read f2fs raw super block */
raw_super_buf = sb_bread(sb, 0);
if (!raw_super_buf) {
err = -EIO;
goto free_sbi;
}
raw_super = (struct f2fs_super_block *)
((char *)raw_super_buf->b_data + F2FS_SUPER_OFFSET);
/* init some FS parameters */
sbi->active_logs = NR_CURSEG_TYPE;
set_opt(sbi, BG_GC);
#ifdef CONFIG_F2FS_FS_XATTR
set_opt(sbi, XATTR_USER);
#endif
#ifdef CONFIG_F2FS_FS_POSIX_ACL
set_opt(sbi, POSIX_ACL);
#endif
/* parse mount options */
if (parse_options(sbi, (char *)data))
goto free_sb_buf;
/* sanity checking of raw super */
if (sanity_check_raw_super(raw_super))
goto free_sb_buf;
sb->s_maxbytes = max_file_size(le32_to_cpu(raw_super->log_blocksize));
sb->s_max_links = F2FS_LINK_MAX;
get_random_bytes(&sbi->s_next_generation, sizeof(u32));
sb->s_op = &f2fs_sops;
sb->s_xattr = f2fs_xattr_handlers;
sb->s_export_op = &f2fs_export_ops;
sb->s_magic = F2FS_SUPER_MAGIC;
sb->s_fs_info = sbi;
sb->s_time_gran = 1;
sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
(test_opt(sbi, POSIX_ACL) ? MS_POSIXACL : 0);
memcpy(sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
/* init f2fs-specific super block info */
sbi->sb = sb;
sbi->raw_super = raw_super;
sbi->raw_super_buf = raw_super_buf;
mutex_init(&sbi->gc_mutex);
mutex_init(&sbi->write_inode);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
for (i = 0; i < NR_LOCK_TYPE; i++)
mutex_init(&sbi->fs_lock[i]);
sbi->por_doing = 0;
spin_lock_init(&sbi->stat_lock);
init_rwsem(&sbi->bio_sem);
init_sb_info(sbi);
/* get an inode for meta space */
sbi->meta_inode = f2fs_iget(sb, F2FS_META_INO(sbi));
if (IS_ERR(sbi->meta_inode)) {
err = PTR_ERR(sbi->meta_inode);
goto free_sb_buf;
}
err = get_valid_checkpoint(sbi);
if (err)
goto free_meta_inode;
/* sanity checking of checkpoint */
err = -EINVAL;
if (sanity_check_ckpt(raw_super, sbi->ckpt))
goto free_cp;
sbi->total_valid_node_count =
le32_to_cpu(sbi->ckpt->valid_node_count);
sbi->total_valid_inode_count =
le32_to_cpu(sbi->ckpt->valid_inode_count);
sbi->user_block_count = le64_to_cpu(sbi->ckpt->user_block_count);
sbi->total_valid_block_count =
le64_to_cpu(sbi->ckpt->valid_block_count);
sbi->last_valid_block_count = sbi->total_valid_block_count;
sbi->alloc_valid_block_count = 0;
INIT_LIST_HEAD(&sbi->dir_inode_list);
spin_lock_init(&sbi->dir_inode_lock);
/* init super block */
if (!sb_set_blocksize(sb, sbi->blocksize))
goto free_cp;
init_orphan_info(sbi);
/* setup f2fs internal modules */
err = build_segment_manager(sbi);
if (err)
goto free_sm;
err = build_node_manager(sbi);
if (err)
goto free_nm;
build_gc_manager(sbi);
/* get an inode for node space */
sbi->node_inode = f2fs_iget(sb, F2FS_NODE_INO(sbi));
if (IS_ERR(sbi->node_inode)) {
err = PTR_ERR(sbi->node_inode);
goto free_nm;
}
/* if there are nt orphan nodes free them */
err = -EINVAL;
if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG) &&
recover_orphan_inodes(sbi))
goto free_node_inode;
/* read root inode and dentry */
root = f2fs_iget(sb, F2FS_ROOT_INO(sbi));
if (IS_ERR(root)) {
err = PTR_ERR(root);
goto free_node_inode;
}
if (!S_ISDIR(root->i_mode) || !root->i_blocks || !root->i_size)
goto free_root_inode;
sb->s_root = d_make_root(root); /* allocate root dentry */
if (!sb->s_root) {
err = -ENOMEM;
goto free_root_inode;
}
/* recover fsynced data */
if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_UMOUNT_FLAG) &&
!test_opt(sbi, DISABLE_ROLL_FORWARD))
recover_fsync_data(sbi);
/* After POR, we can run background GC thread */
err = start_gc_thread(sbi);
if (err)
goto fail;
err = f2fs_build_stats(sbi);
if (err)
goto fail;
return 0;
fail:
stop_gc_thread(sbi);
free_root_inode:
dput(sb->s_root);
sb->s_root = NULL;
free_node_inode:
iput(sbi->node_inode);
free_nm:
destroy_node_manager(sbi);
free_sm:
destroy_segment_manager(sbi);
free_cp:
kfree(sbi->ckpt);
free_meta_inode:
make_bad_inode(sbi->meta_inode);
iput(sbi->meta_inode);
free_sb_buf:
brelse(raw_super_buf);
free_sbi:
kfree(sbi);
return err;
}
static struct dentry *f2fs_mount(struct file_system_type *fs_type, int flags,
const char *dev_name, void *data)
{
return mount_bdev(fs_type, flags, dev_name, data, f2fs_fill_super);
}
static struct file_system_type f2fs_fs_type = {
.owner = THIS_MODULE,
.name = "f2fs",
.mount = f2fs_mount,
.kill_sb = kill_block_super,
.fs_flags = FS_REQUIRES_DEV,
};
static int init_inodecache(void)
{
f2fs_inode_cachep = f2fs_kmem_cache_create("f2fs_inode_cache",
sizeof(struct f2fs_inode_info), NULL);
if (f2fs_inode_cachep == NULL)
return -ENOMEM;
return 0;
}
static void destroy_inodecache(void)
{
/*
* Make sure all delayed rcu free inodes are flushed before we
* destroy cache.
*/
rcu_barrier();
kmem_cache_destroy(f2fs_inode_cachep);
}
static int __init init_f2fs_fs(void)
{
int err;
err = init_inodecache();
if (err)
goto fail;
err = create_node_manager_caches();
if (err)
goto fail;
err = create_gc_caches();
if (err)
goto fail;
err = create_checkpoint_caches();
if (err)
goto fail;
return register_filesystem(&f2fs_fs_type);
fail:
return err;
}
static void __exit exit_f2fs_fs(void)
{
destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
destroy_checkpoint_caches();
destroy_gc_caches();
destroy_node_manager_caches();
destroy_inodecache();
}
module_init(init_f2fs_fs)
module_exit(exit_f2fs_fs)
MODULE_AUTHOR("Samsung Electronics's Praesto Team");
MODULE_DESCRIPTION("Flash Friendly File System");
MODULE_LICENSE("GPL");

440
fs/f2fs/xattr.c Normal file
View file

@ -0,0 +1,440 @@
/*
* fs/f2fs/xattr.c
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* Portions of this code from linux/fs/ext2/xattr.c
*
* Copyright (C) 2001-2003 Andreas Gruenbacher <agruen@suse.de>
*
* Fix by Harrison Xing <harrison@mountainviewdata.com>.
* Extended attributes for symlinks and special files added per
* suggestion of Luka Renko <luka.renko@hermes.si>.
* xattr consolidation Copyright (c) 2004 James Morris <jmorris@redhat.com>,
* Red Hat Inc.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/rwsem.h>
#include <linux/f2fs_fs.h>
#include "f2fs.h"
#include "xattr.h"
static size_t f2fs_xattr_generic_list(struct dentry *dentry, char *list,
size_t list_size, const char *name, size_t name_len, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
int total_len, prefix_len = 0;
const char *prefix = NULL;
switch (type) {
case F2FS_XATTR_INDEX_USER:
if (!test_opt(sbi, XATTR_USER))
return -EOPNOTSUPP;
prefix = XATTR_USER_PREFIX;
prefix_len = XATTR_USER_PREFIX_LEN;
break;
case F2FS_XATTR_INDEX_TRUSTED:
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
prefix = XATTR_TRUSTED_PREFIX;
prefix_len = XATTR_TRUSTED_PREFIX_LEN;
break;
default:
return -EINVAL;
}
total_len = prefix_len + name_len + 1;
if (list && total_len <= list_size) {
memcpy(list, prefix, prefix_len);
memcpy(list+prefix_len, name, name_len);
list[prefix_len + name_len] = '\0';
}
return total_len;
}
static int f2fs_xattr_generic_get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
switch (type) {
case F2FS_XATTR_INDEX_USER:
if (!test_opt(sbi, XATTR_USER))
return -EOPNOTSUPP;
break;
case F2FS_XATTR_INDEX_TRUSTED:
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
break;
default:
return -EINVAL;
}
if (strcmp(name, "") == 0)
return -EINVAL;
return f2fs_getxattr(dentry->d_inode, type, name,
buffer, size);
}
static int f2fs_xattr_generic_set(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags, int type)
{
struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
switch (type) {
case F2FS_XATTR_INDEX_USER:
if (!test_opt(sbi, XATTR_USER))
return -EOPNOTSUPP;
break;
case F2FS_XATTR_INDEX_TRUSTED:
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
break;
default:
return -EINVAL;
}
if (strcmp(name, "") == 0)
return -EINVAL;
return f2fs_setxattr(dentry->d_inode, type, name, value, size);
}
static size_t f2fs_xattr_advise_list(struct dentry *dentry, char *list,
size_t list_size, const char *name, size_t name_len, int type)
{
const char *xname = F2FS_SYSTEM_ADVISE_PREFIX;
size_t size;
if (type != F2FS_XATTR_INDEX_ADVISE)
return 0;
size = strlen(xname) + 1;
if (list && size <= list_size)
memcpy(list, xname, size);
return size;
}
static int f2fs_xattr_advise_get(struct dentry *dentry, const char *name,
void *buffer, size_t size, int type)
{
struct inode *inode = dentry->d_inode;
if (strcmp(name, "") != 0)
return -EINVAL;
*((char *)buffer) = F2FS_I(inode)->i_advise;
return sizeof(char);
}
static int f2fs_xattr_advise_set(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags, int type)
{
struct inode *inode = dentry->d_inode;
if (strcmp(name, "") != 0)
return -EINVAL;
if (!inode_owner_or_capable(inode))
return -EPERM;
if (value == NULL)
return -EINVAL;
F2FS_I(inode)->i_advise |= *(char *)value;
return 0;
}
const struct xattr_handler f2fs_xattr_user_handler = {
.prefix = XATTR_USER_PREFIX,
.flags = F2FS_XATTR_INDEX_USER,
.list = f2fs_xattr_generic_list,
.get = f2fs_xattr_generic_get,
.set = f2fs_xattr_generic_set,
};
const struct xattr_handler f2fs_xattr_trusted_handler = {
.prefix = XATTR_TRUSTED_PREFIX,
.flags = F2FS_XATTR_INDEX_TRUSTED,
.list = f2fs_xattr_generic_list,
.get = f2fs_xattr_generic_get,
.set = f2fs_xattr_generic_set,
};
const struct xattr_handler f2fs_xattr_advise_handler = {
.prefix = F2FS_SYSTEM_ADVISE_PREFIX,
.flags = F2FS_XATTR_INDEX_ADVISE,
.list = f2fs_xattr_advise_list,
.get = f2fs_xattr_advise_get,
.set = f2fs_xattr_advise_set,
};
static const struct xattr_handler *f2fs_xattr_handler_map[] = {
[F2FS_XATTR_INDEX_USER] = &f2fs_xattr_user_handler,
#ifdef CONFIG_F2FS_FS_POSIX_ACL
[F2FS_XATTR_INDEX_POSIX_ACL_ACCESS] = &f2fs_xattr_acl_access_handler,
[F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT] = &f2fs_xattr_acl_default_handler,
#endif
[F2FS_XATTR_INDEX_TRUSTED] = &f2fs_xattr_trusted_handler,
[F2FS_XATTR_INDEX_ADVISE] = &f2fs_xattr_advise_handler,
};
const struct xattr_handler *f2fs_xattr_handlers[] = {
&f2fs_xattr_user_handler,
#ifdef CONFIG_F2FS_FS_POSIX_ACL
&f2fs_xattr_acl_access_handler,
&f2fs_xattr_acl_default_handler,
#endif
&f2fs_xattr_trusted_handler,
&f2fs_xattr_advise_handler,
NULL,
};
static inline const struct xattr_handler *f2fs_xattr_handler(int name_index)
{
const struct xattr_handler *handler = NULL;
if (name_index > 0 && name_index < ARRAY_SIZE(f2fs_xattr_handler_map))
handler = f2fs_xattr_handler_map[name_index];
return handler;
}
int f2fs_getxattr(struct inode *inode, int name_index, const char *name,
void *buffer, size_t buffer_size)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_xattr_entry *entry;
struct page *page;
void *base_addr;
int error = 0, found = 0;
int value_len, name_len;
if (name == NULL)
return -EINVAL;
name_len = strlen(name);
if (!fi->i_xattr_nid)
return -ENODATA;
page = get_node_page(sbi, fi->i_xattr_nid);
base_addr = page_address(page);
list_for_each_xattr(entry, base_addr) {
if (entry->e_name_index != name_index)
continue;
if (entry->e_name_len != name_len)
continue;
if (!memcmp(entry->e_name, name, name_len)) {
found = 1;
break;
}
}
if (!found) {
error = -ENODATA;
goto cleanup;
}
value_len = le16_to_cpu(entry->e_value_size);
if (buffer && value_len > buffer_size) {
error = -ERANGE;
goto cleanup;
}
if (buffer) {
char *pval = entry->e_name + entry->e_name_len;
memcpy(buffer, pval, value_len);
}
error = value_len;
cleanup:
f2fs_put_page(page, 1);
return error;
}
ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size)
{
struct inode *inode = dentry->d_inode;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_xattr_entry *entry;
struct page *page;
void *base_addr;
int error = 0;
size_t rest = buffer_size;
if (!fi->i_xattr_nid)
return 0;
page = get_node_page(sbi, fi->i_xattr_nid);
base_addr = page_address(page);
list_for_each_xattr(entry, base_addr) {
const struct xattr_handler *handler =
f2fs_xattr_handler(entry->e_name_index);
size_t size;
if (!handler)
continue;
size = handler->list(dentry, buffer, rest, entry->e_name,
entry->e_name_len, handler->flags);
if (buffer && size > rest) {
error = -ERANGE;
goto cleanup;
}
if (buffer)
buffer += size;
rest -= size;
}
error = buffer_size - rest;
cleanup:
f2fs_put_page(page, 1);
return error;
}
int f2fs_setxattr(struct inode *inode, int name_index, const char *name,
const void *value, size_t value_len)
{
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_xattr_header *header = NULL;
struct f2fs_xattr_entry *here, *last;
struct page *page;
void *base_addr;
int error, found, free, name_len, newsize;
char *pval;
if (name == NULL)
return -EINVAL;
name_len = strlen(name);
if (value == NULL)
value_len = 0;
if (name_len > 255 || value_len > MAX_VALUE_LEN)
return -ERANGE;
mutex_lock_op(sbi, NODE_NEW);
if (!fi->i_xattr_nid) {
/* Allocate new attribute block */
struct dnode_of_data dn;
if (!alloc_nid(sbi, &fi->i_xattr_nid)) {
mutex_unlock_op(sbi, NODE_NEW);
return -ENOSPC;
}
set_new_dnode(&dn, inode, NULL, NULL, fi->i_xattr_nid);
mark_inode_dirty(inode);
page = new_node_page(&dn, XATTR_NODE_OFFSET);
if (IS_ERR(page)) {
alloc_nid_failed(sbi, fi->i_xattr_nid);
fi->i_xattr_nid = 0;
mutex_unlock_op(sbi, NODE_NEW);
return PTR_ERR(page);
}
alloc_nid_done(sbi, fi->i_xattr_nid);
base_addr = page_address(page);
header = XATTR_HDR(base_addr);
header->h_magic = cpu_to_le32(F2FS_XATTR_MAGIC);
header->h_refcount = cpu_to_le32(1);
} else {
/* The inode already has an extended attribute block. */
page = get_node_page(sbi, fi->i_xattr_nid);
if (IS_ERR(page)) {
mutex_unlock_op(sbi, NODE_NEW);
return PTR_ERR(page);
}
base_addr = page_address(page);
header = XATTR_HDR(base_addr);
}
if (le32_to_cpu(header->h_magic) != F2FS_XATTR_MAGIC) {
error = -EIO;
goto cleanup;
}
/* find entry with wanted name. */
found = 0;
list_for_each_xattr(here, base_addr) {
if (here->e_name_index != name_index)
continue;
if (here->e_name_len != name_len)
continue;
if (!memcmp(here->e_name, name, name_len)) {
found = 1;
break;
}
}
last = here;
while (!IS_XATTR_LAST_ENTRY(last))
last = XATTR_NEXT_ENTRY(last);
newsize = XATTR_ALIGN(sizeof(struct f2fs_xattr_entry) +
name_len + value_len);
/* 1. Check space */
if (value) {
/* If value is NULL, it is remove operation.
* In case of update operation, we caculate free.
*/
free = MIN_OFFSET - ((char *)last - (char *)header);
if (found)
free = free - ENTRY_SIZE(here);
if (free < newsize) {
error = -ENOSPC;
goto cleanup;
}
}
/* 2. Remove old entry */
if (found) {
/* If entry is found, remove old entry.
* If not found, remove operation is not needed.
*/
struct f2fs_xattr_entry *next = XATTR_NEXT_ENTRY(here);
int oldsize = ENTRY_SIZE(here);
memmove(here, next, (char *)last - (char *)next);
last = (struct f2fs_xattr_entry *)((char *)last - oldsize);
memset(last, 0, oldsize);
}
/* 3. Write new entry */
if (value) {
/* Before we come here, old entry is removed.
* We just write new entry. */
memset(last, 0, newsize);
last->e_name_index = name_index;
last->e_name_len = name_len;
memcpy(last->e_name, name, name_len);
pval = last->e_name + name_len;
memcpy(pval, value, value_len);
last->e_value_size = cpu_to_le16(value_len);
}
set_page_dirty(page);
f2fs_put_page(page, 1);
if (is_inode_flag_set(fi, FI_ACL_MODE)) {
inode->i_mode = fi->i_acl_mode;
inode->i_ctime = CURRENT_TIME;
clear_inode_flag(fi, FI_ACL_MODE);
}
f2fs_write_inode(inode, NULL);
mutex_unlock_op(sbi, NODE_NEW);
return 0;
cleanup:
f2fs_put_page(page, 1);
mutex_unlock_op(sbi, NODE_NEW);
return error;
}

145
fs/f2fs/xattr.h Normal file
View file

@ -0,0 +1,145 @@
/*
* fs/f2fs/xattr.h
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* Portions of this code from linux/fs/ext2/xattr.h
*
* On-disk format of extended attributes for the ext2 filesystem.
*
* (C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#ifndef __F2FS_XATTR_H__
#define __F2FS_XATTR_H__
#include <linux/init.h>
#include <linux/xattr.h>
/* Magic value in attribute blocks */
#define F2FS_XATTR_MAGIC 0xF2F52011
/* Maximum number of references to one attribute block */
#define F2FS_XATTR_REFCOUNT_MAX 1024
/* Name indexes */
#define F2FS_SYSTEM_ADVISE_PREFIX "system.advise"
#define F2FS_XATTR_INDEX_USER 1
#define F2FS_XATTR_INDEX_POSIX_ACL_ACCESS 2
#define F2FS_XATTR_INDEX_POSIX_ACL_DEFAULT 3
#define F2FS_XATTR_INDEX_TRUSTED 4
#define F2FS_XATTR_INDEX_LUSTRE 5
#define F2FS_XATTR_INDEX_SECURITY 6
#define F2FS_XATTR_INDEX_ADVISE 7
struct f2fs_xattr_header {
__le32 h_magic; /* magic number for identification */
__le32 h_refcount; /* reference count */
__u32 h_reserved[4]; /* zero right now */
};
struct f2fs_xattr_entry {
__u8 e_name_index;
__u8 e_name_len;
__le16 e_value_size; /* size of attribute value */
char e_name[0]; /* attribute name */
};
#define XATTR_HDR(ptr) ((struct f2fs_xattr_header *)(ptr))
#define XATTR_ENTRY(ptr) ((struct f2fs_xattr_entry *)(ptr))
#define XATTR_FIRST_ENTRY(ptr) (XATTR_ENTRY(XATTR_HDR(ptr)+1))
#define XATTR_ROUND (3)
#define XATTR_ALIGN(size) ((size + XATTR_ROUND) & ~XATTR_ROUND)
#define ENTRY_SIZE(entry) (XATTR_ALIGN(sizeof(struct f2fs_xattr_entry) + \
entry->e_name_len + le16_to_cpu(entry->e_value_size)))
#define XATTR_NEXT_ENTRY(entry) ((struct f2fs_xattr_entry *)((char *)(entry) +\
ENTRY_SIZE(entry)))
#define IS_XATTR_LAST_ENTRY(entry) (*(__u32 *)(entry) == 0)
#define list_for_each_xattr(entry, addr) \
for (entry = XATTR_FIRST_ENTRY(addr);\
!IS_XATTR_LAST_ENTRY(entry);\
entry = XATTR_NEXT_ENTRY(entry))
#define MIN_OFFSET XATTR_ALIGN(PAGE_SIZE - \
sizeof(struct node_footer) - \
sizeof(__u32))
#define MAX_VALUE_LEN (MIN_OFFSET - sizeof(struct f2fs_xattr_header) - \
sizeof(struct f2fs_xattr_entry))
/*
* On-disk structure of f2fs_xattr
* We use only 1 block for xattr.
*
* +--------------------+
* | f2fs_xattr_header |
* | |
* +--------------------+
* | f2fs_xattr_entry |
* | .e_name_index = 1 |
* | .e_name_len = 3 |
* | .e_value_size = 14 |
* | .e_name = "foo" |
* | "value_of_xattr" |<- value_offs = e_name + e_name_len
* +--------------------+
* | f2fs_xattr_entry |
* | .e_name_index = 4 |
* | .e_name = "bar" |
* +--------------------+
* | |
* | Free |
* | |
* +--------------------+<- MIN_OFFSET
* | node_footer |
* | (nid, ino, offset) |
* +--------------------+
*
**/
#ifdef CONFIG_F2FS_FS_XATTR
extern const struct xattr_handler f2fs_xattr_user_handler;
extern const struct xattr_handler f2fs_xattr_trusted_handler;
extern const struct xattr_handler f2fs_xattr_acl_access_handler;
extern const struct xattr_handler f2fs_xattr_acl_default_handler;
extern const struct xattr_handler f2fs_xattr_advise_handler;
extern const struct xattr_handler *f2fs_xattr_handlers[];
extern int f2fs_setxattr(struct inode *inode, int name_index, const char *name,
const void *value, size_t value_len);
extern int f2fs_getxattr(struct inode *inode, int name_index, const char *name,
void *buffer, size_t buffer_size);
extern ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer,
size_t buffer_size);
#else
#define f2fs_xattr_handlers NULL
static inline int f2fs_setxattr(struct inode *inode, int name_index,
const char *name, const void *value, size_t value_len)
{
return -EOPNOTSUPP;
}
static inline int f2fs_getxattr(struct inode *inode, int name_index,
const char *name, void *buffer, size_t buffer_size)
{
return -EOPNOTSUPP;
}
static inline ssize_t f2fs_listxattr(struct dentry *dentry, char *buffer,
size_t buffer_size)
{
return -EOPNOTSUPP;
}
#endif
#endif /* __F2FS_XATTR_H__ */

413
include/linux/f2fs_fs.h Normal file
View file

@ -0,0 +1,413 @@
/**
* include/linux/f2fs_fs.h
*
* Copyright (c) 2012 Samsung Electronics Co., Ltd.
* http://www.samsung.com/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#ifndef _LINUX_F2FS_FS_H
#define _LINUX_F2FS_FS_H
#include <linux/pagemap.h>
#include <linux/types.h>
#define F2FS_SUPER_OFFSET 1024 /* byte-size offset */
#define F2FS_LOG_SECTOR_SIZE 9 /* 9 bits for 512 byte */
#define F2FS_LOG_SECTORS_PER_BLOCK 3 /* 4KB: F2FS_BLKSIZE */
#define F2FS_BLKSIZE 4096 /* support only 4KB block */
#define F2FS_MAX_EXTENSION 64 /* # of extension entries */
#define NULL_ADDR 0x0U
#define NEW_ADDR -1U
#define F2FS_ROOT_INO(sbi) (sbi->root_ino_num)
#define F2FS_NODE_INO(sbi) (sbi->node_ino_num)
#define F2FS_META_INO(sbi) (sbi->meta_ino_num)
/* This flag is used by node and meta inodes, and by recovery */
#define GFP_F2FS_ZERO (GFP_NOFS | __GFP_ZERO)
/*
* For further optimization on multi-head logs, on-disk layout supports maximum
* 16 logs by default. The number, 16, is expected to cover all the cases
* enoughly. The implementaion currently uses no more than 6 logs.
* Half the logs are used for nodes, and the other half are used for data.
*/
#define MAX_ACTIVE_LOGS 16
#define MAX_ACTIVE_NODE_LOGS 8
#define MAX_ACTIVE_DATA_LOGS 8
/*
* For superblock
*/
struct f2fs_super_block {
__le32 magic; /* Magic Number */
__le16 major_ver; /* Major Version */
__le16 minor_ver; /* Minor Version */
__le32 log_sectorsize; /* log2 sector size in bytes */
__le32 log_sectors_per_block; /* log2 # of sectors per block */
__le32 log_blocksize; /* log2 block size in bytes */
__le32 log_blocks_per_seg; /* log2 # of blocks per segment */
__le32 segs_per_sec; /* # of segments per section */
__le32 secs_per_zone; /* # of sections per zone */
__le32 checksum_offset; /* checksum offset inside super block */
__le64 block_count; /* total # of user blocks */
__le32 section_count; /* total # of sections */
__le32 segment_count; /* total # of segments */
__le32 segment_count_ckpt; /* # of segments for checkpoint */
__le32 segment_count_sit; /* # of segments for SIT */
__le32 segment_count_nat; /* # of segments for NAT */
__le32 segment_count_ssa; /* # of segments for SSA */
__le32 segment_count_main; /* # of segments for main area */
__le32 segment0_blkaddr; /* start block address of segment 0 */
__le32 cp_blkaddr; /* start block address of checkpoint */
__le32 sit_blkaddr; /* start block address of SIT */
__le32 nat_blkaddr; /* start block address of NAT */
__le32 ssa_blkaddr; /* start block address of SSA */
__le32 main_blkaddr; /* start block address of main area */
__le32 root_ino; /* root inode number */
__le32 node_ino; /* node inode number */
__le32 meta_ino; /* meta inode number */
__u8 uuid[16]; /* 128-bit uuid for volume */
__le16 volume_name[512]; /* volume name */
__le32 extension_count; /* # of extensions below */
__u8 extension_list[F2FS_MAX_EXTENSION][8]; /* extension array */
} __packed;
/*
* For checkpoint
*/
#define CP_ERROR_FLAG 0x00000008
#define CP_COMPACT_SUM_FLAG 0x00000004
#define CP_ORPHAN_PRESENT_FLAG 0x00000002
#define CP_UMOUNT_FLAG 0x00000001
struct f2fs_checkpoint {
__le64 checkpoint_ver; /* checkpoint block version number */
__le64 user_block_count; /* # of user blocks */
__le64 valid_block_count; /* # of valid blocks in main area */
__le32 rsvd_segment_count; /* # of reserved segments for gc */
__le32 overprov_segment_count; /* # of overprovision segments */
__le32 free_segment_count; /* # of free segments in main area */
/* information of current node segments */
__le32 cur_node_segno[MAX_ACTIVE_NODE_LOGS];
__le16 cur_node_blkoff[MAX_ACTIVE_NODE_LOGS];
/* information of current data segments */
__le32 cur_data_segno[MAX_ACTIVE_DATA_LOGS];
__le16 cur_data_blkoff[MAX_ACTIVE_DATA_LOGS];
__le32 ckpt_flags; /* Flags : umount and journal_present */
__le32 cp_pack_total_block_count; /* total # of one cp pack */
__le32 cp_pack_start_sum; /* start block number of data summary */
__le32 valid_node_count; /* Total number of valid nodes */
__le32 valid_inode_count; /* Total number of valid inodes */
__le32 next_free_nid; /* Next free node number */
__le32 sit_ver_bitmap_bytesize; /* Default value 64 */
__le32 nat_ver_bitmap_bytesize; /* Default value 256 */
__le32 checksum_offset; /* checksum offset inside cp block */
__le64 elapsed_time; /* mounted time */
/* allocation type of current segment */
unsigned char alloc_type[MAX_ACTIVE_LOGS];
/* SIT and NAT version bitmap */
unsigned char sit_nat_version_bitmap[1];
} __packed;
/*
* For orphan inode management
*/
#define F2FS_ORPHANS_PER_BLOCK 1020
struct f2fs_orphan_block {
__le32 ino[F2FS_ORPHANS_PER_BLOCK]; /* inode numbers */
__le32 reserved; /* reserved */
__le16 blk_addr; /* block index in current CP */
__le16 blk_count; /* Number of orphan inode blocks in CP */
__le32 entry_count; /* Total number of orphan nodes in current CP */
__le32 check_sum; /* CRC32 for orphan inode block */
} __packed;
/*
* For NODE structure
*/
struct f2fs_extent {
__le32 fofs; /* start file offset of the extent */
__le32 blk_addr; /* start block address of the extent */
__le32 len; /* lengh of the extent */
} __packed;
#define F2FS_MAX_NAME_LEN 256
#define ADDRS_PER_INODE 923 /* Address Pointers in an Inode */
#define ADDRS_PER_BLOCK 1018 /* Address Pointers in a Direct Block */
#define NIDS_PER_BLOCK 1018 /* Node IDs in an Indirect Block */
struct f2fs_inode {
__le16 i_mode; /* file mode */
__u8 i_advise; /* file hints */
__u8 i_reserved; /* reserved */
__le32 i_uid; /* user ID */
__le32 i_gid; /* group ID */
__le32 i_links; /* links count */
__le64 i_size; /* file size in bytes */
__le64 i_blocks; /* file size in blocks */
__le64 i_atime; /* access time */
__le64 i_ctime; /* change time */
__le64 i_mtime; /* modification time */
__le32 i_atime_nsec; /* access time in nano scale */
__le32 i_ctime_nsec; /* change time in nano scale */
__le32 i_mtime_nsec; /* modification time in nano scale */
__le32 i_generation; /* file version (for NFS) */
__le32 i_current_depth; /* only for directory depth */
__le32 i_xattr_nid; /* nid to save xattr */
__le32 i_flags; /* file attributes */
__le32 i_pino; /* parent inode number */
__le32 i_namelen; /* file name length */
__u8 i_name[F2FS_MAX_NAME_LEN]; /* file name for SPOR */
struct f2fs_extent i_ext; /* caching a largest extent */
__le32 i_addr[ADDRS_PER_INODE]; /* Pointers to data blocks */
__le32 i_nid[5]; /* direct(2), indirect(2),
double_indirect(1) node id */
} __packed;
struct direct_node {
__le32 addr[ADDRS_PER_BLOCK]; /* array of data block address */
} __packed;
struct indirect_node {
__le32 nid[NIDS_PER_BLOCK]; /* array of data block address */
} __packed;
enum {
COLD_BIT_SHIFT = 0,
FSYNC_BIT_SHIFT,
DENT_BIT_SHIFT,
OFFSET_BIT_SHIFT
};
struct node_footer {
__le32 nid; /* node id */
__le32 ino; /* inode nunmber */
__le32 flag; /* include cold/fsync/dentry marks and offset */
__le64 cp_ver; /* checkpoint version */
__le32 next_blkaddr; /* next node page block address */
} __packed;
struct f2fs_node {
/* can be one of three types: inode, direct, and indirect types */
union {
struct f2fs_inode i;
struct direct_node dn;
struct indirect_node in;
};
struct node_footer footer;
} __packed;
/*
* For NAT entries
*/
#define NAT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_nat_entry))
struct f2fs_nat_entry {
__u8 version; /* latest version of cached nat entry */
__le32 ino; /* inode number */
__le32 block_addr; /* block address */
} __packed;
struct f2fs_nat_block {
struct f2fs_nat_entry entries[NAT_ENTRY_PER_BLOCK];
} __packed;
/*
* For SIT entries
*
* Each segment is 2MB in size by default so that a bitmap for validity of
* there-in blocks should occupy 64 bytes, 512 bits.
* Not allow to change this.
*/
#define SIT_VBLOCK_MAP_SIZE 64
#define SIT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_sit_entry))
/*
* Note that f2fs_sit_entry->vblocks has the following bit-field information.
* [15:10] : allocation type such as CURSEG_XXXX_TYPE
* [9:0] : valid block count
*/
#define SIT_VBLOCKS_SHIFT 10
#define SIT_VBLOCKS_MASK ((1 << SIT_VBLOCKS_SHIFT) - 1)
#define GET_SIT_VBLOCKS(raw_sit) \
(le16_to_cpu((raw_sit)->vblocks) & SIT_VBLOCKS_MASK)
#define GET_SIT_TYPE(raw_sit) \
((le16_to_cpu((raw_sit)->vblocks) & ~SIT_VBLOCKS_MASK) \
>> SIT_VBLOCKS_SHIFT)
struct f2fs_sit_entry {
__le16 vblocks; /* reference above */
__u8 valid_map[SIT_VBLOCK_MAP_SIZE]; /* bitmap for valid blocks */
__le64 mtime; /* segment age for cleaning */
} __packed;
struct f2fs_sit_block {
struct f2fs_sit_entry entries[SIT_ENTRY_PER_BLOCK];
} __packed;
/*
* For segment summary
*
* One summary block contains exactly 512 summary entries, which represents
* exactly 2MB segment by default. Not allow to change the basic units.
*
* NOTE: For initializing fields, you must use set_summary
*
* - If data page, nid represents dnode's nid
* - If node page, nid represents the node page's nid.
*
* The ofs_in_node is used by only data page. It represents offset
* from node's page's beginning to get a data block address.
* ex) data_blkaddr = (block_t)(nodepage_start_address + ofs_in_node)
*/
#define ENTRIES_IN_SUM 512
#define SUMMARY_SIZE (7) /* sizeof(struct summary) */
#define SUM_FOOTER_SIZE (5) /* sizeof(struct summary_footer) */
#define SUM_ENTRY_SIZE (SUMMARY_SIZE * ENTRIES_IN_SUM)
/* a summary entry for a 4KB-sized block in a segment */
struct f2fs_summary {
__le32 nid; /* parent node id */
union {
__u8 reserved[3];
struct {
__u8 version; /* node version number */
__le16 ofs_in_node; /* block index in parent node */
} __packed;
};
} __packed;
/* summary block type, node or data, is stored to the summary_footer */
#define SUM_TYPE_NODE (1)
#define SUM_TYPE_DATA (0)
struct summary_footer {
unsigned char entry_type; /* SUM_TYPE_XXX */
__u32 check_sum; /* summary checksum */
} __packed;
#define SUM_JOURNAL_SIZE (F2FS_BLKSIZE - SUM_FOOTER_SIZE -\
SUM_ENTRY_SIZE)
#define NAT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
sizeof(struct nat_journal_entry))
#define NAT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
sizeof(struct nat_journal_entry))
#define SIT_JOURNAL_ENTRIES ((SUM_JOURNAL_SIZE - 2) /\
sizeof(struct sit_journal_entry))
#define SIT_JOURNAL_RESERVED ((SUM_JOURNAL_SIZE - 2) %\
sizeof(struct sit_journal_entry))
/*
* frequently updated NAT/SIT entries can be stored in the spare area in
* summary blocks
*/
enum {
NAT_JOURNAL = 0,
SIT_JOURNAL
};
struct nat_journal_entry {
__le32 nid;
struct f2fs_nat_entry ne;
} __packed;
struct nat_journal {
struct nat_journal_entry entries[NAT_JOURNAL_ENTRIES];
__u8 reserved[NAT_JOURNAL_RESERVED];
} __packed;
struct sit_journal_entry {
__le32 segno;
struct f2fs_sit_entry se;
} __packed;
struct sit_journal {
struct sit_journal_entry entries[SIT_JOURNAL_ENTRIES];
__u8 reserved[SIT_JOURNAL_RESERVED];
} __packed;
/* 4KB-sized summary block structure */
struct f2fs_summary_block {
struct f2fs_summary entries[ENTRIES_IN_SUM];
union {
__le16 n_nats;
__le16 n_sits;
};
/* spare area is used by NAT or SIT journals */
union {
struct nat_journal nat_j;
struct sit_journal sit_j;
};
struct summary_footer footer;
} __packed;
/*
* For directory operations
*/
#define F2FS_DOT_HASH 0
#define F2FS_DDOT_HASH F2FS_DOT_HASH
#define F2FS_MAX_HASH (~((0x3ULL) << 62))
#define F2FS_HASH_COL_BIT ((0x1ULL) << 63)
typedef __le32 f2fs_hash_t;
/* One directory entry slot covers 8bytes-long file name */
#define F2FS_NAME_LEN 8
#define F2FS_NAME_LEN_BITS 3
#define GET_DENTRY_SLOTS(x) ((x + F2FS_NAME_LEN - 1) >> F2FS_NAME_LEN_BITS)
/* the number of dentry in a block */
#define NR_DENTRY_IN_BLOCK 214
/* MAX level for dir lookup */
#define MAX_DIR_HASH_DEPTH 63
#define SIZE_OF_DIR_ENTRY 11 /* by byte */
#define SIZE_OF_DENTRY_BITMAP ((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \
BITS_PER_BYTE)
#define SIZE_OF_RESERVED (PAGE_SIZE - ((SIZE_OF_DIR_ENTRY + \
F2FS_NAME_LEN) * \
NR_DENTRY_IN_BLOCK + SIZE_OF_DENTRY_BITMAP))
/* One directory entry slot representing F2FS_NAME_LEN-sized file name */
struct f2fs_dir_entry {
__le32 hash_code; /* hash code of file name */
__le32 ino; /* inode number */
__le16 name_len; /* lengh of file name */
__u8 file_type; /* file type */
} __packed;
/* 4KB-sized directory entry block */
struct f2fs_dentry_block {
/* validity bitmap for directory entries in each block */
__u8 dentry_bitmap[SIZE_OF_DENTRY_BITMAP];
__u8 reserved[SIZE_OF_RESERVED];
struct f2fs_dir_entry dentry[NR_DENTRY_IN_BLOCK];
__u8 filename[NR_DENTRY_IN_BLOCK][F2FS_NAME_LEN];
} __packed;
/* file types used in inode_info->flags */
enum {
F2FS_FT_UNKNOWN,
F2FS_FT_REG_FILE,
F2FS_FT_DIR,
F2FS_FT_CHRDEV,
F2FS_FT_BLKDEV,
F2FS_FT_FIFO,
F2FS_FT_SOCK,
F2FS_FT_SYMLINK,
F2FS_FT_MAX
};
#endif /* _LINUX_F2FS_FS_H */

View file

@ -23,6 +23,7 @@
#define EXT4_SUPER_MAGIC 0xEF53
#define BTRFS_SUPER_MAGIC 0x9123683E
#define NILFS_SUPER_MAGIC 0x3434
#define F2FS_SUPER_MAGIC 0xF2F52010
#define HPFS_SUPER_MAGIC 0xf995e849
#define ISOFS_SUPER_MAGIC 0x9660
#define JFFS2_SUPER_MAGIC 0x72b6