Set the timestamp of an expired context to the maximum which is the
last timestamp that was assigned to a command batch of that context.
This is done so that any events which are registered by the context
will be triggered normally and avoid any sync timeouts.
Change-Id: I0ac4507bcaaa9835dd79ffa8f7066bfff6d3f48d
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Commit a8330f853988137425346ce8050970e6a19b64ae broke the fair scheduling
alogrithm in the dispatcher. That commit copied off the current pending
queue into a temporary list and processed it one by one. After all the
inflight slots were filled the remaining contexts were pushed back on
the list. Consider the following situation - max inflight is 15, current
inflight is 14 (room for 1 more command batch). There are two contexts
in the queue:
a -> b
The contexts are copied to a temporary list so now the pending list is
empty. 'a' is processed first and submits one command batch. After
successfully processing the command it is put back on the pending list:
a
inflight is now full, so 'b' doesn't process anything - it gets shoved
back on the pending list:
a -> b
See the problem? In a fair scheduling scenario, 'b' should be first so
it has a chance to be processed the next time there is room in the queue.
Instead 'a' will dominate the time - hilarity ensues.
This is fixed by putting successfully processed contexts on to a requeue
list and then pushing them back on at the end keeping unprocessed
contexts on the master list which ensures fair scheduling (there is a
scenario where two processed contexts could swap spots, but that isn't as
big a deal as long as they both got their timeslice).
Change-Id: Ic0dedbad658dacc43efc972f7731f345f1ec8a79
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
When context is detached there is a wait for the last timestamp
used by the context. This is done under the assumption that after
this timestamp there will be no commands in the pipeline that are
submitted on behalf of this context. However, the dispatcher can
still dispatch commands from the detached context because there
is a time gap between the context being detached and the dispatcher
checking whether the context is detached. Hence, guard the last
timestamp by holding the context mutex and checking if the context is
detached or not before updating it.
Trying to guard internal_timestamp with drawctxt->mutex could lead
to a locking order inversion and possibly deadlock. This variable
is set in adreno_ringbuffer_addcmds(), and used in kgsl_context_detach().
Both of these calls hold device->mutex, so it is safe to access
this variable without an additional mutex.
Change-Id: Idc1a867de94e071a3128a164724d26dd9cb29a0a
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Do not remove command batch from the dispatcher list if it's timestamp
has not retired. Even if the context to which the command belongs
has been detached. If the command is in the dispatcher queue it means
it's in the ringbuffer and we should wait for it to complete the normal
way. Also make sure that we use the correct retired timestamp even if
the context was detached because of a userpace context destroy call.
Change-Id: If3a9a562180b924492ed95f208b5e3d469abdfba
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Store the process private pointer in context. Earlier the process
private pointer was referenced through the dev_priv pointer in the
context, but the dev_priv pointer can be destroyed before the context
process private so store this pointer locally.
Change-Id: Ic07680b79db55d6306306bd61bda5a1288813914
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
When a context is created hold a reference to the process private
structure which is referenced from within the context. The process
private structure can be destroyed while some threads still have
a reference to the context, this can lead to a situation where the
process private pointer inside the context pointer becomes invalid.
Avoid this by holding a reference to the process private structure
as long as the context is around.
Change-Id: Ia35629e5d027a383ed4c1378316633b4923372f7
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Even though we copy the plist to process the pending contexts the
individual nodes might need to be spinlock protected against
outsiders.
Change-Id: Ic0dedbad509e9b37b5bee4b4828561950ad2f73e
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Instead of playing silly tricks to try to avoid going into an
infinite loop while processing pending contexts do the smart
thing and copy off the entire pending queue into a temporary
list. That leaves the master list free to accept new and
requeued contexts and we can do evil things to our temporary
list.
Change-Id: Ic0dedbad365206031854fc95b5353184cabd40a1
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Implement the KGSL fault tolerance policy for faults in the dispatcher.
Replay (or skip) the inflight command batches as dictated by the policy,
iterating progressively through the various behaviors.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Check the value returned by _kgsl_context_get() and fail if
appropriate. This prevents us from accidently increasing the
ref count on a destroyed context.
Change-Id: Ic0dedbad891842a73b1b87eb6671f9a39a275dd4
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Fix a race condition which can occur if a thread tries to acquire
a reference to a mem_entry or context while another thread has
already decremented the refcount to 0 and is in the process of
destroying said mem_entry or context.
Change-Id: I6be64ca75f9cb12b03e870b9ca83588197c64e5e
Signed-off-by: Shrenuj Bansal <shrenujb@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
There are situations where a submitting thread may wish to create a
syncpoint on an already issued timestamp to pause a context until a
previous command has been retired. Relax the restriction against
submitting a sync point against one's own context and only check to
make sure the user isn't submitting a syncpoint against a future
timestamp (which would be an certain deadlock).
Change-Id: Ic0dedbad883fc228da0d94c8416a88504f5d1377
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
If available support two generic GPU trace events:
events/gpu/gpu_sched_switch
events/gpu/gpu_job_enqueue
This will allow generic tools to get a better idea of when
commands are queued and scheduled.
Change-Id: Ic0dedbad5fa6c5fbb34aebdcf82fd10ee92da8d7
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Run soft reset on all targets that define a soft_reset function hook.
If the target defines jump table offsets then we can run through a
faster reset sequence, otherwise default to the slower full reset
path. In any event, both these options are much faster than the
hard reset path that toggles the regulators.
Change-Id: Ic0dedbad4cbfdc083f16b026e133159814481886
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
FT behaviour is changed to be configured from sysfs
instead of debugfs. Sysfs control gives us the advantage
of configuring FT at bootup and also across bootups.
Persistant configuration across bootups helps
testing by configuring FT once and testing the FT
config across bootups.
Change-Id: Ia3378f214371a6a4b9f205f9f884976f58585971
Signed-off-by: Tarun Karra <tkarra@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Soft reset GPU when GPU power rail is not turned off.
In hard reset we reload full microcode but in soft reset
since the memory powerrail is not turned off we only
load jump tables part of microcode, this reduces GPU
reset time from 10ms to 200us.
Change-Id: Ibbc36e97cc95425e13856fd5d847eed742743723
Signed-off-by: Tarun Karra <tkarra@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
It isn't possible to use rcu_read_lock() sections to guard
access to a data structure that is refcounted with a kref.
Rather than creating RCU-aware refcounts for kgsl_mem_entry
as described in Documentation/RCU/rcuref.txt, just use
the mem_lock to guard lookups in the idr.
Change-Id: Ia0733b156fc7a9b446cb8221b9172ce9faf111e7
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Fix the addressing of FSYNR0 and FSYNR1 IOMMU-v1 registers. These
are context registers and were being addressed as global registers.
Change-Id: I2f6c4798a3c82bb4857a334beb99994ac9f4a1e8
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Print the callback function symbol name in GPU event register
and fire trace events. This makes it easier to debug which event
is being registered/fired.
Change-Id: Ic0dedbad4be4e4179c820af6119786c57d12e13f
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
If there isn't enough room at the bottom of the ringbuffer for a
whole command, the remaining space is filled with NOPs and
the command starts again at the top of the ringbuffer, the write
pointer of the ringbuffer shall update accordingly; the existing
implementation sends out in-complete NOP command which may potentially
cause GPU hang. This fix submits the NOP command along with the next
command instead of submitting them separately to have GPU read both
commands in the same fetch.
Change-Id: Ia3c9933c11d986c6743d8026b809bbcb1eaf54bf
Signed-off-by: Zhong Liu <zhongl@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
It is useful to track context switches since they are expensive
and we would like to have as few as possible.
Change-Id: Ic0dedbad6befc84193b17851a9db4ff87e656cc7
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
The msm-dcvs pwrscale driver is no longer in development. Remove it to
avoid bitrot and simplify future target development.
Change-Id: Ic0dedbad81c2afb4bfbb377ee1a2330e115c0e71
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Setup the protection registers for a3xx towards the end of its
start function instead of doing it in generic ringbuffer start
Change-Id: I66df496afa5d1fdf7dea790306f5358c2098674d
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
If the device is still in INIT state there are no open instances
of it running. Resume calls should be no-ops. Do not attempt
to start the device at this point.
Change-Id: I1cb6d60581b0b5b0a2ab1b13418adc4ae3983c5e
Signed-off-by: Lucille Sylvester <lsylvest@codeaurora.org>
Signed-off-by: Ajay Dudani <adudani@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
There are cases where GPU has a pagefault and executes
fine after pagefault without GPU stall. init.qcom.graphics.sh
script can be used to change FT pagefault policy
to not stall IOMMU V1 on pagefault and check if pagefault is harmless.
Change-Id: If061230b66181bfd94c697ea106e7bf4de352e91
Signed-off-by: Tarun Karra <tkarra@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
It was previously assumed that most GPU memory allocations would be
small enough to allow us to fit the array of page pointers into one
or two pages allocated via kmalloc. Recent reports have proven
those assumptions to be wrong - allocations on the order of 32MB will
end up trying to get 8 pages from kmalloc and 8 contiguous pages
on a busy system are a rare beast indeed.
So use the usual kmalloc/vmalloc trick instead - use kmalloc for the
page array when we can and vmalloc if we can't.
CRs-fixed: 513469
Change-Id: Ic0dedbad0a5b14abe6a8bd73342b3e68faa8c8b7
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
It is possible that a context can be successfully destroyed and
kgsl_release() called by an exiting application before the kernel
threads have released all the users of the active count. Instead
of immediately calling BUG_ON() in kgsl_release() when the active
count is unexpected, wait a second to give the others time to
finish up. If after a second the active count still hasn't gone
where we need it to then we can assume driver error and BUG_ON().
To accomplish this, remodel kgsl_active_count_wait() to take a
active_count value to "wait" for.
Change-Id: Ic0dedbadcc7d0714ea14f25e2a43715e2e12c041
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
We really don't want new GPU commands or events to be generated
just to manage the iommu while we are idle.
Change-Id: I2e8740bee8c25c93bddc51a90a3370d151aaf558
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Check for idle in the fault detection timer. Don't report a fault
if the GPU is idle.
Change-Id: Ic0dedbad0a912b2d4b3cda9f545a9301290904d6
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
These two functions need to agree on the meaning of "idle",
so that calling adreno_isidle() right after an adreno_idle()
will always return true.
Change-Id: I7cddf73773186c3ec8b56c111affacac3b07fcc7
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Generate names for kgsl-timelines using the following format:
<device>-<thread name>(<tid>)-<proc name>(<pid>)-<context id>
This makes it possible to identify the context of a timeline in the
sync dump, which makes it much easier to identify which context has
a GPU timeout.
Change-Id: I0ce0614a53a93fd81094d92c9bef7053e6d416d2
Git-commit: e3348f389f715cb2143a708ec6796d3f65e03821
Git-repo: https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git
Signed-off-by: Fred Fettinger <fred.fettinger@motorola.com>
Signed-off-by: Harsh Vardhan Dwivedi <hdwivedi@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Having a separate allocated struct for the device specific context
makes ownership unclear, which could lead to reference counting
problems or invalid pointers. Also, duplicate members were
starting to appear in adreno_context because there wasn't a safe
way to reach the kgsl_context from some parts of the adreno code.
This can now be done via container_of().
This change alters the lifecycle of the context->id, which is
now freed when the context reference count hits zero rather
than in kgsl_context_detach().
It also changes the context creation and destruction sequence.
The device specific code must allocate a structure containing
a struct kgsl_context and passes a pointer it to kgsl_init_context()
before doing any device specific initialization. There is also a
separate drawctxt_detach() callback for doing device specific
cleanup. This is separate from freeing memory, which is done
by the drawctxt_destroy() callback.
Change-Id: I7d238476a3bfec98fd8dbc28971cf3187a81dac2
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Since cffdump is not often enabled in Kconfig, it is helpful
to at least catch compiler errors and warnings when it is off.
Change-Id: Ic0dedbad41b0faac6e0a2514e694f8e369c61f7f
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
There were dual functions for reading and writing registers for
adreno devices. Stop the use of one of these dual functions as
it makes the code more uniform.
Change-Id: I703d27d1674a85a6c2d7a9fe6dc49f13005a3410
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Different adreno cores have different offsets for same register.
These registers are referenced in code areas which are common to
all adreno cores. Hence, they should be referenced with a variable
instead of using a constant to make things more generic. This makes
the code more suitable for accomodating future cores.
Change-Id: Ie3d387d7cf767d46eea90e0fecdbba88dad97860
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
8226v2 has a new spin of the 305B GPU with a different chip ID.
Add an entry to the GPU list for the new chip ID. Unfortunately
we can't use the normal ANY_ID trick here because we have so many
different types of chips that use the same core, major and minor
values and differ only in the patchlevel.
Change-Id: Ic0dedbad9bb856a25b71cb825a2960fd0e1d4198
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Since the rptr is written by the GPU, there's no point
in keeping a copy in the ringbuffer struct where it will
likely be out of date. If you need to look at the ringbuffer,
read it into a local variable with adreno_get_rptr().
Change-Id: Ibf1ba0b9c71a93f65a5c85a58328b2202a27af3f
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
in check_if_freed() a call to kgsl_get_memory_usage() was terminated with
a comma instead of a semicolon. Through a quirk in a macro this somehow
managed to compile and function properly but that doesn't make it okay.
CRs-fixed: 497280
Change-Id: Ic0dedbadfdf901bbc68b7f8cefb231d08dd01a95
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Special case performance counters do not have select registers.
Enabling performance counters first checks that the counters we want
are valid and have select registers, then it would enable the counters.
The special case performance registers were not being enabled in this
case since they fail the initial test. Move these special cases to
before the select register error checking and perform their own sanity
checks before enabling the performance counters
Change-Id: I716103fb6bfb97ba3e198503531af139fb1725f8
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
The function kgsl_sharedmem_find_region holds the memory spinlock
at the beginning of the function so we do not need to hold the lock
before calling the function
Change-Id: I20ee32e0ed6aee6ed61cdd4fb7a9cc08a876fc84
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Future chipsets may not define phys_addr_t to 32 bits hence convert
all physical address variables to this type.
Change-Id: I4ac5bd1aabda455456ff867c973a264f68992404
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Make CFF capture a device specific property. This allows the control
of CFF for a particular device without CFF interferance from another
device. This will be useful when we have a virtual device and need to
only capture CFF for the virtual device. CFF capture can only be
turned on for one device at a time.
Change-Id: I14c5a4442ad05327de1413d98bf795dbd196119d
Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Allow NAP on all targets. The splitting of the clock enable
and disable calls into enable/prepare and disable/unprepare
allows us to safely make this change for all targets.
Change-Id: I03d909b86aef33631a887d159cf0a807a6d0ae75
Signed-off-by: Lucille Sylvester <lsylvest@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Signed-off-by: Iliyan Malchev <malchev@google.com>
SP performance counter 4 is broken on A33x targets so do not assign
this counter. The counter does not reliably return correct values
which can make results misleading.
Change-Id: I87c36e021c547b630e8dfd89abbdb5c65d4b3c46
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Certain hardware does not support performance counters and does not set
them. Ensure the uninitialized performance counter variable does not
get dereferenced in this instance.
Change-Id: I8692090d60ff1e6a0c45b5699b90d9808ef61c5a
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
ion_share_dma_buf is now use to get the share buffer.
Use ion_share_dma_buf_fd to get the ion fd.
Change-Id: I6f0d30782e62e245e89d907ffe22bbd4b7a5d0b0
Signed-off-by: Alex Wong <waiw@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Use the same format for all trace points reporting the same
data, such as context ids (ctx=%u) and timestamps (ts=%u).
Make sure key=value format is used in a few places where it
was missing.
Change-Id: I4ec1c77c853c567c7a6ba69eff5023d8d71cdac4
Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
No need to turn the algorithm off now that there is explict binning.
It's not a performance win and it confuses whether DCVS is opperational.
Change-Id: Ifa1646e8dde0a792c3d22d5e1cf9e139b0363f08
Signed-off-by: Lucille Sylvester <lsylvest@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Make sure we resume the GPU regardless of its current
state. If the device is not in suspend state restart the
device and print an error message.
Change-Id: I9e71cbe27d360dc06513c54f3a734aaea5b10d2b
Signed-off-by: Suman Tatiraju <sumant@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>
Between kgsl_open and kgsl_release calls active pwrlevel could be
set to anything based on the content. However after a power collapse
always power up the GPU and set the clock to init level.
Change-Id: Ic2858218afa9c1047864ab8551b3495b7d752952
Signed-off-by: Suman Tatiraju <sumant@codeaurora.org>
Signed-off-by: Carter Cooper <ccooper@codeaurora.org>