A set of processes may happen to perform interleaved reads, i.e.,requests
whose union would give rise to a sequential read pattern. There are two
typical cases: in the first case, processes read fixed-size chunks of
data at a fixed distance from each other, while in the second case processes
may read variable-size chunks at variable distances. The latter case occurs
for example with QEMU, which splits the I/O generated by the guest into
multiple chunks, and lets these chunks be served by a pool of cooperating
processes, iteratively assigning the next chunk of I/O to the first
available process. CFQ uses actual queue merging for the first type of
rocesses, whereas it uses preemption to get a sequential read pattern out
of the read requests performed by the second type of processes. In the end
it uses two different mechanisms to achieve the same goal: boosting the
throughput with interleaved I/O.
This patch introduces Early Queue Merge (EQM), a unified mechanism to get a
sequential read pattern with both types of processes. The main idea is
checking newly arrived requests against the next request of the active queue
both in case of actual request insert and in case of request merge. By doing
so, both the types of processes can be handled by just merging their queues.
EQM is then simpler and more compact than the pair of mechanisms used in
CFQ.
Finally, EQM also preserves the typical low-latency properties of BFQ, by
properly restoring the weight-raising state of a queue when it gets back to
a non-merged state.
Change-Id: If95ed48806330667f26959006a20ad13abfd44be
Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Add the BFQ-v7r8 I/O scheduler to 3.4.
The general structure is borrowed from CFQ, as much of the code for
handling I/O contexts. Over time, several useful features have been
ported from CFQ as well (details in the changelog in README.BFQ). A
(bfq_)queue is associated to each task doing I/O on a device, and each
time a scheduling decision has to be made a queue is selected and served
until it expires.
- Slices are given in the service domain: tasks are assigned
budgets, measured in number of sectors. Once got the disk, a task
must however consume its assigned budget within a configurable
maximum time (by default, the maximum possible value of the
budgets is automatically computed to comply with this timeout).
This allows the desired latency vs "throughput boosting" tradeoff
to be set.
- Budgets are scheduled according to a variant of WF2Q+, implemented
using an augmented rb-tree to take eligibility into account while
preserving an O(log N) overall complexity.
- A low-latency tunable is provided; if enabled, both interactive
and soft real-time applications are guaranteed a very low latency.
- Latency guarantees are preserved also in the presence of NCQ.
- Also with flash-based devices, a high throughput is achieved
while still preserving latency guarantees.
- BFQ features Early Queue Merge (EQM), a sort of fusion of the
cooperating-queue-merging and the preemption mechanisms present
in CFQ. EQM is in fact a unified mechanism that tries to get a
sequential read pattern, and hence a high throughput, with any
set of processes performing interleaved I/O over a contiguous
sequence of sectors.
- BFQ supports full hierarchical scheduling, exporting a cgroups
interface. Since each node has a full scheduler, each group can
be assigned its own weight.
- If the cgroups interface is not used, only I/O priorities can be
assigned to processes, with ioprio values mapped to weights
with the relation weight = IOPRIO_BE_NR - ioprio.
- ioprio classes are served in strict priority order, i.e., lower
priority queues are not served as long as there are higher
priority queues. Among queues in the same class the bandwidth is
distributed in proportion to the weight of each queue. A very
thin extra bandwidth is however guaranteed to the Idle class, to
prevent it from starving.
Change-Id: I1603a3f5cdbaca85d8494a396be7daf932a754b3
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Update Kconfig.iosched and do the related Makefile changes to include
kernel configuration options for BFQ. Also add the bfqio controller
to the cgroups subsystem.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Change-Id: I58d05e292ee338400601336797aefb82f9638cf9
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.
Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq->cmd_type directly.
Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
attempt.
Change-Id: Ifc386dfb951c5d6adebf48ff38135dda28e4b1ce
Signed-off-by: Jens Axboe <axboe@fb.com>
CVE-2017-7187
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those. Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.
Change-Id: I3b5b01ea1c13326d873faa17b2edcadf4c77eb94
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
CVE-2016-10088
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
When adding new field to struct bio there is a crash in the removed
code lines. This issue was introduced by commit
80a8f0f87bee18283e9ca0a8966ec97ad9f084e5 "block: row-iosched idling
triggered by readahead pages"
(Partly) reverting this patch till root cause is fixed (on FS level).
Change-Id: Ie82bc806ea52a6370b57aa15455c85b2db10d0da
Signed-off-by: Tanya Brokhman <tlinder@codeaurora.org>
dm-crypt provides bios based device mapper module. dm-crypt
operates on packets with 512 bytes size which is not effiicent
way for HW based crypto blocks. dm-req-crypt is developed to
address this. dm-req-crypt works on requests which carry upto
512KB of data for unmerged requests.
Change-Id: I479230b2f0c1ab2e2a1ddb84947ebaf0629dff8c
Signed-off-by: Dinesh K Garg <dineshg@codeaurora.org>
MMC device driver implements URGENT request execution with priority
(using stop flow), as a result currently running (and prepared) request
may be reinserted back into I/O scheduler. This will break block layer
logic of flushes (flush request should not be inserted into I/O scheduler).
Block layer flush machinery keep q->flush_data_in_flight list updated with
started but not completed flush requests with data (REQ_FUA).
This change will not notify underling block device driver about pending
urgent request during flushes in flight.
Change-Id: I8b654925a3c989250fcb8f4f7c998795fb203923
Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
This reverts commit b410a82118.
The reverted commit was identified as the cause of the FS error
mentioned in the CR bellow.
It's reverted till further annalists of the root cause of FS error.
Change-Id: Ia75216de8012a2491b87f33e8c21f75592d87c80
CRs-fixed: 531257
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
Regular priority queues is marked as "starved" if it skipped a dispatch
due to being empty. When a new request is added to a "starved" queue
it will be marked as urgent.
The removed WARN_ON was warning about an impossible case when a regular
priority (read) queue was marked as starved but wasn't empty. This is
a possible case due to the bellow:
If the device driver fetched a read request that is pending for
transmission and an URGENT request arrives, the fetched read will be
reinserted back to the scheduler. Its possible that the queue it will be
reinserted to was marked as "starved" in the meanwhile due to being empty.
CRs-fixed: 517800
Change-Id: Iaae642ea0ed9c817c41745b0e8ae2217cc684f0c
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
When the scheduler reports to the block layer that there is an urgent
request pending, the device driver may decide to stop the transmission
of the current request in order to handle the urgent one. This is done
in order to reduce the latency of an urgent request. For example:
long WRITE may be stopped to handle an urgent READ.
Change-Id: I3072b8a1423870fed9c04c28d93caaf9557a7b89
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
Calling hrtimer_cancel with interrupts disabled can result in a livelock.
When flushing plug list in the block layer interrupts are disabled and an
hrtimer is used when adding requests from that plug list to the scheduler.
In this code flow, if the hrtimer (which is used for idling) is set, it's
being canceled by calling hrtimer_cancel. hrtimer_cancel will perform
the following in an endless loop:
1. try cancel the timer
2. if fails - rest_cpu
the cancellation can fail if the timer function already started. Since
interrupts are disabled it can never complete.
This patch reduced the number of times the hrtimer lock is taken while
interrupts are disabled by calling hrtimer_try_co_cancel. the later will
try to cancel the timer just once and return with an error code if fails.
CRs-fixed: 499887
Change-Id: I25f79c357426d72ad67c261ce7cb503ae97dc7b9
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
This test adds the ability to test the UFS task management feature
in the driver. It loads the queue with requests in order to allow
the task management to operate in full capacity.
Modify test-iosched infrastructure to support the new tests:
- expose check_test_completion()
Note: we submit 16-bio requests since the current HW is very slow
and we don't want to exceed the timeout duration.
Change-Id: I8ee752cba3c6838d8edc05747fa0288c4b347ef6
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
In blk_post_runtime_resume, an autosuspend request will be initiated for
the device. Since we are holding the queue lock, we can't sleep and thus
we should use the async version to initiate an autosuspend, i.e.
pm_request_suspend instead of pm_runtime_suspend, which might sleep.
Change-Id: I28cbe7e20ffdcb8f7b3329bf3b827c942dd1b45a
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Git-commit: c60855cdb976c632b3bf8922eeab8a0e78edfc04
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
When a request is added:
If device is suspended or is suspending and the request is not a
PM request, resume the device.
When the last request finishes:
Call pm_runtime_mark_last_busy().
When pick a request:
If device is resuming/suspending, then only PM request is allowed
to go.
The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2
Change-Id: If9c8332164afba6aeca984277e3398008ad9c45a
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Git-commit: c8158819d506a8aedeca53c52dfb709a0aabe011
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Add runtime pm helper functions:
void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
- Initialization function for drivers to call.
int blk_pre_runtime_suspend(struct request_queue *q)
- If any requests are in the queue, mark last busy and return -EBUSY.
Otherwise set q->rpm_status to RPM_SUSPENDING and return 0.
void blk_post_runtime_suspend(struct request_queue *q, int err)
- If the suspend succeeded then set q->rpm_status to RPM_SUSPENDED.
Otherwise set it to RPM_ACTIVE and mark last busy.
void blk_pre_runtime_resume(struct request_queue *q)
- Set q->rpm_status to RPM_RESUMING.
void blk_post_runtime_resume(struct request_queue *q, int err)
- If the resume succeeded then set q->rpm_status to RPM_ACTIVE
and call __blk_run_queue, then mark last busy and autosuspend.
Otherwise set q->rpm_status to RPM_SUSPENDED.
The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsi&m=133727953625963&w=2
Change-Id: I486612af79c7887116f70a4fde552c107f7fa360
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Git-commit: 6c9546675864f51506af69eca388e5d922942c56
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Change time measurements in long_sequential_test from jiffies to ktime,
and make the relevant change in test-iosched infrastructure.
In long_sequential_test we measure throughput, and the jiffies resolution
is not sensitive enough for this calculation.
Change-Id: If7c9a03c687f61996609c014e056bcd7132b9012
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
In the current implementation idling is triggered only by request
insertion frequency. This heuristic is not very accurate and may hit
random requests that shouldn't trigger idling. This patch uses the
PG_readahead flag in struct page's flags, which indicates that the page
is part of a readahead window, to start idling upon dispatch of a request
associated with a readahead page.
The above readehead flag is used together with the existing
insertion-frequency trigger. The frequency timer will catch read requests
which are not part of a readahead window, but are still part of a
sequential stream (and therefore dispatched in small time intervals).
Change-Id: Icb7145199c007408de3f267645ccb842e051fd00
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
It is possible for URGENT request to be requeued/reinserted if it was
fetched during the creation of a packed list. This end case is rare and is
not handled at the moment.
This patch changes the messages notifying of the above to debug level
(instead of error) in order to clear the dmesg log.
Change-Id: Ie8bc067e61559a6f702077b95c5dbcc426404232
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
There are cases when blk_peek_request is called not from blk_fetch_request
thus the URGENT request may be started but the flag q->dispatched_urgent is
not updated.
Change-Id: I4fb588823f1b2949160cbd3907f4729767932e12
CRs-fixed: 471736
CRs-fixed: 473036
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
The current starvation tolerance values increase the boot time
since high priority SW requests are delayed by regular priority requests.
In order to overcome this, increase the starvation tolerance values.
Change-Id: I9947fca9927cbd39a1d41d4bd87069df679d3103
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
The block layer implements a mechanism for verifying that the device
driver won't be notified of an URGENT request if there is already an
URGENT request in flight. This is due to the fact that interrupting an
URGENT request isn't efficient.
This patch fixes the above described mechanism in case the URGENT request
was returned back to the block layer from some reason: by requeue or
reinsert.
CRs-fixed: 473376, 473036, 471736
Change-Id: Ie8b8208230a302d4526068531616984825f1050d
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
All ROW (time related) configurable parameters are stored in ms so there
is no need to convert from/to ms when reading/updating them via sysfs.
Change-Id: Ib6a1de54140b5d25696743da944c076dd6fc02ae
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
At the moment all REGULAR and LOW priority requests are starved as long as
there are HIGH priority requests to dispatch.
This patch prevents the above starvation by setting a starvation limit the
REGULAR\LOW priority requests can tolerate.
Change-Id: Ibe24207982c2c55d75c0b0230f67e013d1106017
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
On failure of requests the console is flooded with prints,
which result in a watchdog bark.
This patch limits the number of prints to console thus, avoiding
the bark.
CRs-Fixed: 458071
Change-Id: I614002cc1305ab7b5c6a64d278139291c1baa3c4
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
An urgent request is marked by the scheduler in rq->cmd_flags with the
REQ_URGENT flag. There is no need to add an additional marking by
the block layer.
Change-Id: I05d5e9539d2f6c1bfa80240b0671db197a5d3b3f
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
When ROW scheduler reports to the block layer that there is an urgent
request pending, the device driver may decide to stop the transmission
of the current request in order to handle the urgent one. This is done
in order to reduce the latency of an urgent request. For example:
long WRITE may be stopped to handle an urgent READ.
This patch updates the ROW URGENT notification policy to apply with the
below:
- Don't notify URGENT if there is an un-completed URGENT request in driver
- After notifying that URGENT request is present, the next request
dispatched is the URGENT one.
- At every given moment only 1 request can be marked as URGENT.
Independent of it's location (driver or scheduler)
Other changes to URGENT policy:
- Only READ queues are allowed to notify of an URGENT request pending.
CR fix:
If a pending urgent request (A) gets merged with another request (B)
A is removed from scheduler queue but is not removed from
rd->pending_urgent_rq.
CRs-Fixed: 453712
Change-Id: I321e8cf58e12a05b82edd2a03f52fcce7bc9a900
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
In order to be sure that the packing statistics collected after the test
reflect *only* requests issued by the test (and not real request from
FS) - sleep before each test in order to give an already dispatched
requests time to complete.
Change-Id: If2f40efad1d79084a8ea85afe93cce58e49ff698
CRs-Fixed: 453712
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
This patch updates the ROW URGENT notification policy to apply with the
bellow:
- Don't notify URGENT if there is an un-completed URGENT request in driver
- After notifying that URGENT request is present, the next request
dispatched is the URGENT one.
- At every given moment only 1 request can be marked as URGENT.
Independent of it's location (driver or scheduler)
Other changes to URGENT policy:
- Only queues that are allowed to notify of an URGENT request pending
are READ queues
Change-Id: I17ff73125bc7a760cea1116b466115a2df635a14
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
The __blk_run_queue function is called from several contexts. The fix is
replacing it with blk_run_queue function, this function is guarded with
a lock, thus making it thread safe and prevents the crashing.
Change-Id: I3e12fa9c8b9e161375fffa3570abfa46b223a60b
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
This patch sets the initial values of internal ROW
parameters.
Change-Id: I38132062a7fcbe2e58b9cc757e55caac64d013dc
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
[smuckle@codeaurora.org: ported from msm-3.7]
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
When ROW scheduler reports to the block layer that there is an urgent
request pending, the device driver may decide to stop the transmission
of the current request in order to handle the urgent one. If the current
transmitted request is an urgent request - we don't want it to be
stopped.
Due to the above ROW scheduler won't notify of an urgent request if
there are urgent requests in flight.
Change-Id: I2fa186d911b908ec7611682b378b9cdc48637ac7
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
At the moment idling in ROW is implemented by delayed work that uses
jiffies granularity which is not very accurate. This patch replaces
current idling mechanism implementation with hrtime API, which gives
nanosecond resolution (instead of jiffies).
Change-Id: I86c7b1776d035e1d81571894b300228c8b8f2d92
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
This patch implements "application-hints" which is a way the issuing
application can notify the scheduler on the priority of its request.
This is done by setting the io-priority of the request.
This patch reuses an already existing mechanism of io-priorities developed
for CFQ. Please refer to kernel/Documentation/block/ioprio.txt for
usage example and explanations.
Change-Id: I228ec8e52161b424242bb7bb133418dc8b73925a
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>