msm: kgsl: implement server-side waits

msm: kgsl: Add device init function Some device specific parameters need to be setup only once during device initialization. Create an init function for this purpose rather than re-doing this init everytime the device is started. Change-Id: I45c7fcda8d61fd2b212044c9167b64f793eedcda Signed-off-by: Carter Cooper <ccooper@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 2nd commit message: msm: kgsl: improve active_cnt and ACTIVE state management Require any code path which intends to touch the hardware to take a reference on active_cnt with kgsl_active_count_get() and release it with kgsl_active_count_put() when finished. These functions now do the wake / sleep steps that were previously handled by kgsl_check_suspended() and kgsl_check_idle(). Additionally, kgsl_pre_hwaccess() will no longer turn on the clocks, it just enforces via BUG_ON that the clocks are enabled before a register is touched. Change-Id: I31b0d067e6d600f0228450dbd73f69caa919ce13 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 3rd commit message: msm: kgsl: Sync memory with CFF from places where it was missing Before submitting any indirect buffer to GPU via the ringbuffer, the indirect buffer memory should be synced with CFF so that the CFF capture will be complete. Add the syncing of memory with CFF in places where this was missing Change-Id: I18f506dd1ab7bdfb1a68181016e6f661a36ed5a2 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 4th commit message: msm: kgsl: Export some kgsl-core functions to EXPORT_SYMBOLS Export some functions in the KGSL core driver so they can be seen by the leaf drivers. Change-Id: Ic0dedbad5dbe562c2e674f8e885a3525b6feac7b Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 5th commit message: msm: kgsl: Send the right IB size to adreno_find_ctxtmem adreno_find_ctxtmem expects byte lengths and we were sending it dword lengths which was about as effective as you would expect. Change-Id: Ic0dedbad536ed377f6253c3a5e75e5d6cb838acf Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 6th commit message: msm: kgsl: Add 8974 default GPR0 & clk gating values Add correct clock gating values for A330, A305 and A320. Add generic function to return the correct default clock gating values for the respective gpu. Add default GPR0 value for A330. Change-Id: I039e8e3622cbda04924b0510e410a9dc95bec598 Signed-off-by: Harsh Vardhan Dwivedi <hdwivedi@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 7th commit message: msm: kgsl: Move A3XX VBIF settings decision to a table The vbif selection code is turning into a long series of if/else clauses. Move the decision to a look up table that will be easier to update and maintain when when we have eleventy A3XX GPUs. Change-Id: Ic0dedbadd6b16734c91060d7e5fa50dcc9b8774d Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 8th commit message: msm: kgsl: Update settings for the A330v2 GPU in 8972v2 The new GPU spin in 8974v2 has some slightly different settings then the 8974v1: add support for identifying a v2 spin, add a new table of VBIF register settings and update the clock gating registers. Change-Id: Ic0dedbad22bd3ed391b02f6327267cf32f17af3d Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 9th commit message: msm: kgsl: Fix compilation errors when CFF is turned on Fix the compilation errors when option MSM_KGSL_CFF_DUMP option is turned on. Change-Id: I59b0a7314ba77e2c2fef03338e061cd503e88714 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 10th commit message: msm: kgsl: Convert the Adreno GPU cycle counters to run free In anticipation of allowing multiple entities to share access to the performance counters; make the few performance counters that KGSL uses run free. Change-Id: Ic0dedbadbefb400b04e4f3552eed395770ddbb7b Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 11th commit message: msm: kgsl: Handle a possible ringbuffer allocspace error In the GPU specific start functions, account for the possibility that ringbuffer allocation routine might return NULL. Change-Id: Ic0dedbadf6199fee78b6a8c8210a1e76961873a0 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 12th commit message: msm: kgsl: Add a new API to allow sharing of GPU performance counters Adreno uses programmable performance counters, meaning that while there are a limited number of physical counters each counter can be programmed to count a vast number of different measurements (we refer to these as countables). This could cause problems if multiple apps want to use the performance counters, so this API and infrastructure allows the counters to be safely shared. The kernel tracks which countable is selected for each of the physical counters for each counter group (where groups closely match hardware blocks). If the desired countable is already in use, or there is an open physical counter, then the process is allowed to use the counter. The get ioctl reserves the counter and returns the dword offset of the register associated with that physical counter. The put ioctl releases the physical counter. The query ioctl gets the countables used for all of the counters in the block - up to 8 values can be returned. The read ioctl gets the current hardware value in the counter Change-Id: Ic0dedbadae1dedadba60f8a3e685e2ce7d84fb33 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Carter Cooper <ccooper@codeaurora.org> # This is the 13th commit message: msm: kgsl: Print the nearest active GPU buffers to a faulting address Print the two active GPU memory entries that bracket a faulting GPU address. This will help diagnose premature frees and buffer ovverruns. Check if the faulting GPU address was freed by the same process. Change-Id: Ic0dedbadebf57be9abe925a45611de8e597447ea Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Vladimir Razgulin <vrazguli@codeaurora.org> # This is the 14th commit message: msm: kgsl: Remove an uneeded register write for A3XX GPUs A3XX doesn't have the MH block and so the register at 0x40 points somewhere else. Luckily the write was harmless but remove it anyway. Change-Id: Ic0dedbadd1e043cd38bbaec8fcf0c490dcdedc8c Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 15th commit message: msm: kgsl: clean up iommu/gpummu protflag handling Make kgsl_memdesc_protflags() return the correct type of flags for the type of mmu being used. Query the memdesc with this function in kgsl_mmu_map(), rather than passing in the protflags. This prevents translation at multiple layers of the code and makes it easier to enforce that the mapping matches the allocation flags. Change-Id: I2a2f4a43026ae903dd134be00e646d258a83f79f Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 16th commit message: msm: kgsl: remove kgsl_mem_entry.flags The two flags fields in kgsl_memdesc should be enough for anyone. Move the only flag using kgsl_mem_entry, the FROZEN flag for snapshot procesing, to use kgsl_memdesc.priv. Change-Id: Ia12b9a6e6c1f5b5e57fa461b04ecc3d1705f2eaf Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 17th commit message: msm: kgsl: map the guard page readonly on the iommu The guard page needs to be readable by the GPU, due to a prefetch range issue, but it should never be writable. Change the page fault message to indicate if nearby buffers have a guard page. Change-Id: I3955de1409cbf4ccdde92def894945267efa044d Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 18th commit message: msm: kgsl: Add support for VBIF and VBIF_PWR performance counters These 2 counter groups are also "special cases" that require different programming sequences. Change-Id: I73e3e76b340e6c5867c0909b3e0edc78aa62b9ee Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 19th commit message: msm: kgsl: Only allow two counters for VBIF performance counters There are only two VBIF counter groups so validate that the user doesn't pass in > 1 and clean up the if/else clause. Change-Id: Ic0dedbad3d5a54e4ceb1a7302762d6bf13b25da1 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 20th commit message: msm: kgsl: Avoid an array overrun in the perfcounter API Make sure the passed group is less than the size of the list of performance counters. Change-Id: Ic0dedbadf77edf35db78939d1b55a05830979f85 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 21st commit message: msm: kgsl: Don't go to slumber if active_count is non zero If active_cnt happens to be set when we go into kgsl_early_suspend_driver() then don't go to SLUMBER. This avoids trouble if we come back and and try to access the hardware while it is off. Change-Id: Ic0dedbadb13514a052af6199c8ad1982d7483b3f Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 22nd commit message: msm: kgsl: Enable HLSQ registers in snapshot when available Reading the HLSQ registers during a GPU hang recovery might cause the device to hang depending on the state of the HLSQ block. Enable the HLSQ register reads when we know that they will succeed. Change-Id: I69f498e6f67a15328d1d41cc64c43d6c44c54bad Signed-off-by: Carter Cooper <ccooper@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 23rd commit message: msm: kgsl: snapshot: Don't keep parsing indirect buffers on failure Stop parsing an indirect buffer if an error is encountered (such as a missing buffer). This is a pretty good indication that the buffers are not reliable and the further the parser goes with a unreliable buffer the more likely it is to get confused. Change-Id: Ic0dedbadf28ef374c9afe70613048d3c31078ec6 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 24th commit message: msm: kgsl: snapshot: Only push the last IB1 and IB2 in the static space Some IB1 buffers have hundreds of little IB2 buffers and only one of them will actually be interesting enough to push into the static space. Only push the last executed IB1 and IB2 into the static space. Change-Id: Ic0dedbad26fb30fb5bf90c37c29061fd962dd746 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 25th commit message: msm: kgsl: Save the last active context in snapshot Save the last active context that was executing when the hang happened in snapshot. Change-Id: I2d32de6873154ec6c200268844fee7f3947b7395 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 26th commit message: msm: kgsl: In snapshot track a larger object size if address is same If the object being tracked has the same address as a previously tracked object then only track a single object with larger size as the smaller object will be a part of the larger one anyway. Change-Id: I0e33bbaf267bc0ec580865b133917b3253f9e504 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 27th commit message: msm: kgsl: Track memory address from 2 additional registers Add tracking of memory referenced by VS_OBJ_START_REG and FS_OBJ_START_REG registers in snapshot. This makes snapshot more complete in terms of tracking data that is used by the GPU at the time of hang. Change-Id: I7e5f3c94f0d6744cd6f2c6413bf7b7fac4a5a069 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 28th commit message: msm: kgsl: Loop till correct index on type0 packets When searching for memory addresses in type0 packet we were looping from start of the type0 packet till it's end, but the first DWORD is a header so we only need to loop till packet_size - 1. Fix this. Change-Id: I278446c6ab380cf8ebb18d5f3ae192d3d7e7db62 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 29th commit message: msm: kgsl: Add global timestamp information to snapshot Make sure that we always add global timestamp information to snapshot. This is needed in playbacks for searching whereabouts of last executed IB. Change-Id: Ica5b3b2ddff6fd45dbc5a911f42271ad5855a86a Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 30th commit message: msm: kgsl: Skip cff dump for certain functions when its disabled Certain functions were generating CFF when CFF was disabled. Make sure these functions do not dump CFF when it is disabled. Change-Id: Ib5485b03b8a4d12f190f188b80c11ec6f552731d Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 31st commit message: msm: kgsl: Fix searching of memory object Make sure that at least a size of 1 byte is searched when locating the memory entry of a region. If size is 0 then a memory region whose last address is equal to the start address of the memory being searched will be returned which is wrong. Change-Id: I643185d1fdd17296bd70fea483aa3c365e691bc5 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 32nd commit message: msm: kgsl: If adreno start fails then restore state of device Restore the state of the device back to what it was at the start of the adreno_start function if this function fails to execute successfully. Change-Id: I5b279e5186b164d3361fba7c8f8d864395b794c8 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 33rd commit message: msm: kgsl: Fix early exit condition in ringbuffer drain The ringbuffer drain function can be called when the ringbuffer start flag is not set. This happens on startup. Hence, exiting the function early based on start flag is incorrect. Simply execute this function regardless of the start flag. Change-Id: Ibf2075847f8bb1a760bc1550309efb3c7aa1ca49 Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 34th commit message: msm: kgsl: Do not return an error on NULL gpu address If a NULL gpu address is passed to snapshot object tracking function then do not treat this as an error and return 0. NULL objects may be present in an IB so just skip over these objects instead of exiting due to an error. Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Change-Id: Ic253722c58b41f41d03f83c77017e58365da01a7 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 35th commit message: msm: kgsl: Don't hold process list global mutex in process private create Don't hold process list global mutex for long. Instead make use of process specific spin_lock() to serialize access to process private structure while creating it. Holding process list global mutex could lead to deadlocks as other functions depend on it. CRs-fixed: 480732 Change-Id: Id54316770f911d0e23384f54ba5c14a1c9113680 Signed-off-by: Harsh Vardhan Dwivedi <hdwivedi@codeaurora.org> Signed-off-by: Shubhraprakash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 36th commit message: msm: kgsl: Use CPU path to program pagetable when active count is 0 When active count is 0 then we should use the CPU path to program pagetables because the GPU path requires event registration. Events can only be queued when active count is valid. Hence, if the active count is NULL then use the CPU path. Change-Id: I70f5894d20796bdc0f592db7dc2731195c0f7a82 CRs-fixed: 481887 Signed-off-by: Shubhrapralash Das <sadas@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 37th commit message: iommu: msm: prevent partial mappings on error If msm_iommu_map_range() fails mid way through the va range with an error, clean up the PTEs that have already been created so they are not leaked. Change-Id: Ie929343cd6e36cade7b2cc9b4b4408c3453e6b5f Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 38th commit message: msm: kgsl: better handling of virtual address fragmentation When KGSL_MEMFLAGS_USE_CPU_MAP is enabled, the mmap address must try to match the GPU alignment requirements of the buffer, as well as include space in the mapping for the guard page. This can cause -ENOMEM to be returned from get_unmapped_area() when there are a large number of mappings. When this happens, fall back to page alignment and retry to avoid failure. Change-Id: I2176fe57afc96d8cf1fe1c694836305ddc3c3420 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 39th commit message: iommu: msm: Don't treat address 0 as an error case Currently, the iommu page table code treats a scattergather list with physical address 0 as an error. This may not be correct in all cases. Physical address 0 is a valid part of the system and may be used for valid page allocations. Nothing else in the system checks for physical address 0 for error so don't treat it as an error. Change-Id: Ie9f0dae9dace4fff3b1c3449bc89c3afdd2e63a0 CRs-Fixed: 478304 Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 40th commit message: msm: kgsl: prevent race between mmap() and free on timestamp When KGSL_MEMFLAGS_USE_CPU_MAP is set, we must check that the address from get_unmapped_area() is not used as part of a mapping that is present only in the GPU pagetable and not the CPU pagetable. These mappings can occur because when a buffer is freed on timestamp, the CPU mapping is destroyed immediately but the GPU mapping is not destroyed until the GPU timestamp has passed. Because kgsl_mem_entry_detach_process() removed the rbtree entry before removing the iommu mapping, there was a window of time where kgsl thought the address was available even though it was still present in the iommu pagetable. This could cause the address to get assigned to a new buffer, which would cause iommu_map_range() to fail since the old mapping was still in the pagetable. Prevent this race by removing the iommu mapping before removing the rbtree entry tracking the address. Change-Id: I8f42d6d97833293b55fcbc272d180564862cef8a CRs-Fixed: 480222 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 41st commit message: msm: kgsl: add guard page support for imported memory Imported memory buffers sometimes do not have enough padding to prevent page faults due to overzealous GPU prefetch. Attach guard pages to their mappings to prevent these faults. Because we don't create the scatterlist for some types of imported memory, such as ion, the guard page is no longer included as the last entry in the scatterlist. Instead, it is handled by size ajustments and a separate iommu_map() call in the kgsl_mmu_map() and kgsl_mmu_unmap() paths. Change-Id: I3af3c29c3983f8cacdc366a2423f90c8ecdc3059 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 42nd commit message: msm: kgsl: fix kgsl_mem_entry refcounting Make kgsl_sharedmem_find* return a reference to the entry that was found. This makes using an entry without the mem_lock held less race prone. Change-Id: If6eb6470ecfea1332d3130d877922c70ca037467 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 43rd commit message: msm: kgsl: add ftrace for cache operations Add the event kgsl_mem_sync_cache. This event is emitted when only a cache operation is actually performed. Attempts to flush uncached memory, which do nothing, do not cause this event. Change-Id: Id4a940a6b50e08b54fbef0025c4b8aaa71641462 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 44th commit message: msm: kgsl: Add support for bulk cache operations Add a new ioctl, IOCTL_KGSL_GPUMEM_SYNC_CACHE_BULK, which can be used to sync a number of memory ids at once. This gives the driver an opportunity to optimize the cache operations based on the total working set of memory that needs to be managed. Change-Id: I9693c54cb6f12468b7d9abb0afaef348e631a114 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 45th commit message: msm: kgsl: flush the entire cache when the bulk batch is large On 8064 and 8974, flushing more than 16mb of virtual address space is slower than flushing the entire cache. So flush the entire cache when the working set is larger than this. The threshold for full cache flush can be tuned at runtime via the full_cache_threshold sysfs file. Change-Id: If525e4c44eb043d0afc3fe42d7ef2c7de0ba2106 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 46th commit message: msm: kgsl: Use a read/lock for the context idr Everybody loves a rcu but in this case we are dangerously mixing rcus and atomic operations. Add a read/write lock to explicitly protect the idr. Also fix a few spots where the idr was used without protection. Change-Id: Ic0dedbad517a9f89134cbcf7af29c8bf0f034708 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 47th commit message: msm: kgsl: embed kgsl_context struct in adreno_context struct Having a separate allocated struct for the device specific context makes ownership unclear, which could lead to reference counting problems or invalid pointers. Also, duplicate members were starting to appear in adreno_context because there wasn't a safe way to reach the kgsl_context from some parts of the adreno code. This can now be done via container_of(). This change alters the lifecycle of the context->id, which is now freed when the context reference count hits zero rather than in kgsl_context_detach(). It also changes the context creation and destruction sequence. The device specific code must allocate a structure containing a struct kgsl_context and passes a pointer it to kgsl_init_context() before doing any device specific initialization. There is also a separate drawctxt_detach() callback for doing device specific cleanup. This is separate from freeing memory, which is done by the drawctxt_destroy() callback. Change-Id: I7d238476a3bfec98fd8dbc28971cf3187a81dac2 Signed-off-by: Jeremy Gebben <jgebben@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 48th commit message: msm: kgsl: Take a reference count on the active adreno draw context Take a reference count on the currently active draw context to keep it from going away while we are maintaining a pointer to it in the adreno device. Change-Id: Ic0dedbade8c09ecacf822e9a3c5fbaf6e017ec0c Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 49th commit message: msm: kgsl: Add a command dispatcher to manage the ringbuffer Implements a centralized dispatcher for sending user commands to the ringbuffer. Incoming commands are queued by context and sent to the hardware on a round robin basis ensuring each context a small burst of commands at a time. Each command is tracked throughout the pipeline giving the dispatcher better knowledge of how the hardware is being used. This will be the basis for future per-context and cross context enhancements as priority queuing and server-side syncronization. Change-Id: Ic0dedbad49a43e8e6096d1362829c800266c2de3 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 50th commit message: msm: kgsl: Only turn on the idle timer when active_cnt is 0 Only turn on the idle timer when the GPU expected to be quiet. Change-Id: Ic0dedbad57846f1e7bf7820ec3152cd20598b448 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 51st commit message: msm: kgsl: Add a ftrace event for active_cnt Add a new ftrace event for watching the rise and fall of active_cnt: echo 1 > /sys/kernel/debug/tracing/events/kgsl/kgsl_active_count/enable This will give you the current active count and the caller of the function: kgsl_active_count: d_name=kgsl-3d0 active_cnt=8e9 func=kgsl_ioctl Change-Id: Ic0dedbadc80019e96ce759d9d4e0ad43bbcfedd2 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 52nd commit message: msm: kgsl: Implement KGSL fault tolerance policy in the dispatcher Implement the KGSL fault tolerance policy for faults in the dispatcher. Replay (or skip) the inflight command batches as dictated by the policy, iterating progressively through the various behaviors. Change-Id: Ic0dedbade98cc3aa35b26813caf4265c74ccab56 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 53rd commit message: msm: kgsl: Don't process events if the timestamp hasn't changed Keep track of the global timestamp every time the event code runs. If the timestamp hasn't changed then we are caught up and we can politely bow out. This avoids the situation where multiple interrupts queue the work queue multiple times: IRQ -> process events IRQ IRQ -> process events The actual retired timestamp in the first work item might be well ahead of the delivered interrupts. The event loop will end up processing every event that has been retired by the hardware at that point. If the work item gets re-queued by a subesquent interrupt then we might have already addressed all the pending timestamps. Change-Id: Ic0dedbad79722654cb17e82b7149e93d3c3f86a0 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 54th commit message: msm: kgsl: Make active_cnt an atomic variable In kgsl_active_cnt_light() the mutex was needed just to check and increment the active_cnt value. Move active_cnt to an atomic to begin the task of freeing ourselves from the grip of the device mutex if we can avoid it. Change-Id: Ic0dedbad78e086e3aa3559fab8ecebc43539f769 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 55th commit message: msm: kgsl: Add a new command submission API Add an new ioctl entry point for submitting commands to the GPU called IOCTL_KGSL_SUBMIT_COMMANDS. As with IOCTL_KGSL_RINGBUFFER_ISSUEIBCMDS the user passes a list of indirect buffers, flags and optionally a user specified timestamp. The old way of passing a list of indirect buffers is no longer supported. IOCTL_KGSL_SUBMIT_COMMANDS also allows the user to define a list of sync points for the command. Sync points are dependencies on events that need to be satisfied before the command will be issued to the hardware. Events are designed to be flexible. To start with the only events that are supported are GPU events for a given context/ timestamp pair. Pending events are stored in a list in the command batch. As each event is expired it is deleted from the list. The adreno dispatcher won't send the command until the list is empty. Sync points are not supported for Z180. CRs-Fixed: 468770 Change-Id: Ic0dedbad5a5935f486acaeb033ae9a6010f82346 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 56th commit message: msm: kgsl: add kgsl_sync_fence_waiter for server side sync For server side sync the KGSL kernel module needs to perform an asynchronous wait for a fence object prior to issuing subsequent commands. Change-Id: I1ee614aa3af84afc4813f1e47007f741beb3bc92 Signed-off-by: Jeff Boody <jboody@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 57th commit message: msm: kgsl: Add support for KGSL_CMD_SYNCPOINT_TYPE_FENCE Allow command batches to wait for external fence sync events. Change-Id: Ic0dedbad3a211019e1cd3a3d62ab6a3e4d4eeb05 Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 58th commit message: msm: kgsl: fix potential double free of the kwaiter Change-Id: Ic0dedbad66a0af6eaef52b2ad53c067110bdc6e4 Signed-off-by: Jeff Boody <jboody@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> # This is the 59th commit message: msm: kgsl: free an event only after canceling successfully Change-Id: Ic0dedbade256443d090dd11df452dc9cdf65530b Signed-off-by: Jeff Boody <jboody@codeaurora.org> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2024-09-21 12:10:52 +00:00 · 2013-06-24 11:40:20 -06:00 · 2013-06-24 11:40:20 -06:00 · 26ec3b0af3
parent ef1874fadf
commit 26ec3b0af3
44 changed files with 6645 additions and 3087 deletions
--- a/drivers/gpu/msm/Makefile
+++ b/drivers/gpu/msm/Makefile
@ -23,8 +23,10 @@ msm_kgsl_core-$(CONFIG_SYNC) += kgsl_sync.o
 msm_adreno-y += \
 	adreno_ringbuffer.o \
 	adreno_drawctxt.o \
+	adreno_dispatch.o \
 	adreno_postmortem.o \
 	adreno_snapshot.o \
+	adreno_trace.o \
 	adreno_a2xx.o \
 	adreno_a2xx_trace.o \
 	adreno_a2xx_snapshot.o \
--- a/drivers/gpu/msm/a3xx_reg.h
+++ b/drivers/gpu/msm/a3xx_reg.h
@ -66,15 +66,103 @@
 #define A3XX_RBBM_INT_0_MASK 0x063
 #define A3XX_RBBM_INT_0_STATUS 0x064
 #define A3XX_RBBM_PERFCTR_CTL 0x80
+#define A3XX_RBBM_PERFCTR_LOAD_CMD0 0x81
+#define A3XX_RBBM_PERFCTR_LOAD_CMD1 0x82
+#define A3XX_RBBM_PERFCTR_LOAD_VALUE_LO 0x84
+#define A3XX_RBBM_PERFCTR_LOAD_VALUE_HI 0x85
+#define A3XX_RBBM_PERFCOUNTER0_SELECT 0x86
+#define A3XX_RBBM_PERFCOUNTER1_SELECT 0x87
 #define A3XX_RBBM_GPU_BUSY_MASKED 0x88
+#define A3XX_RBBM_PERFCTR_CP_0_LO 0x90
+#define A3XX_RBBM_PERFCTR_CP_0_HI 0x91
+#define A3XX_RBBM_PERFCTR_RBBM_0_LO 0x92
+#define A3XX_RBBM_PERFCTR_RBBM_0_HI 0x93
+#define A3XX_RBBM_PERFCTR_RBBM_1_LO 0x94
+#define A3XX_RBBM_PERFCTR_RBBM_1_HI 0x95
+#define A3XX_RBBM_PERFCTR_PC_0_LO 0x96
+#define A3XX_RBBM_PERFCTR_PC_0_HI 0x97
+#define A3XX_RBBM_PERFCTR_PC_1_LO 0x98
+#define A3XX_RBBM_PERFCTR_PC_1_HI 0x99
+#define A3XX_RBBM_PERFCTR_PC_2_LO 0x9A
+#define A3XX_RBBM_PERFCTR_PC_2_HI 0x9B
+#define A3XX_RBBM_PERFCTR_PC_3_LO 0x9C
+#define A3XX_RBBM_PERFCTR_PC_3_HI 0x9D
+#define A3XX_RBBM_PERFCTR_VFD_0_LO 0x9E
+#define A3XX_RBBM_PERFCTR_VFD_0_HI 0x9F
+#define A3XX_RBBM_PERFCTR_VFD_1_LO 0xA0
+#define A3XX_RBBM_PERFCTR_VFD_1_HI 0xA1
+#define A3XX_RBBM_PERFCTR_HLSQ_0_LO 0xA2
+#define A3XX_RBBM_PERFCTR_HLSQ_0_HI 0xA3
+#define A3XX_RBBM_PERFCTR_HLSQ_1_LO 0xA4
+#define A3XX_RBBM_PERFCTR_HLSQ_1_HI 0xA5
+#define A3XX_RBBM_PERFCTR_HLSQ_2_LO 0xA6
+#define A3XX_RBBM_PERFCTR_HLSQ_2_HI 0xA7
+#define A3XX_RBBM_PERFCTR_HLSQ_3_LO 0xA8
+#define A3XX_RBBM_PERFCTR_HLSQ_3_HI 0xA9
+#define A3XX_RBBM_PERFCTR_HLSQ_4_LO 0xAA
+#define A3XX_RBBM_PERFCTR_HLSQ_4_HI 0xAB
+#define A3XX_RBBM_PERFCTR_HLSQ_5_LO 0xAC
+#define A3XX_RBBM_PERFCTR_HLSQ_5_HI 0xAD
+#define A3XX_RBBM_PERFCTR_VPC_0_LO 0xAE
+#define A3XX_RBBM_PERFCTR_VPC_0_HI 0xAF
+#define A3XX_RBBM_PERFCTR_VPC_1_LO 0xB0
+#define A3XX_RBBM_PERFCTR_VPC_1_HI 0xB1
+#define A3XX_RBBM_PERFCTR_TSE_0_LO 0xB2
+#define A3XX_RBBM_PERFCTR_TSE_0_HI 0xB3
+#define A3XX_RBBM_PERFCTR_TSE_1_LO 0xB4
+#define A3XX_RBBM_PERFCTR_TSE_1_HI 0xB5
+#define A3XX_RBBM_PERFCTR_RAS_0_LO 0xB6
+#define A3XX_RBBM_PERFCTR_RAS_0_HI 0xB7
+#define A3XX_RBBM_PERFCTR_RAS_1_LO 0xB8
+#define A3XX_RBBM_PERFCTR_RAS_1_HI 0xB9
+#define A3XX_RBBM_PERFCTR_UCHE_0_LO 0xBA
+#define A3XX_RBBM_PERFCTR_UCHE_0_HI 0xBB
+#define A3XX_RBBM_PERFCTR_UCHE_1_LO 0xBC
+#define A3XX_RBBM_PERFCTR_UCHE_1_HI 0xBD
+#define A3XX_RBBM_PERFCTR_UCHE_2_LO 0xBE
+#define A3XX_RBBM_PERFCTR_UCHE_2_HI 0xBF
+#define A3XX_RBBM_PERFCTR_UCHE_3_LO 0xC0
+#define A3XX_RBBM_PERFCTR_UCHE_3_HI 0xC1
+#define A3XX_RBBM_PERFCTR_UCHE_4_LO 0xC2
+#define A3XX_RBBM_PERFCTR_UCHE_4_HI 0xC3
+#define A3XX_RBBM_PERFCTR_UCHE_5_LO 0xC4
+#define A3XX_RBBM_PERFCTR_UCHE_5_HI 0xC5
+#define A3XX_RBBM_PERFCTR_TP_0_LO 0xC6
+#define A3XX_RBBM_PERFCTR_TP_0_HI 0xC7
+#define A3XX_RBBM_PERFCTR_TP_1_LO 0xC8
+#define A3XX_RBBM_PERFCTR_TP_1_HI 0xC9
+#define A3XX_RBBM_PERFCTR_TP_2_LO 0xCA
+#define A3XX_RBBM_PERFCTR_TP_2_HI 0xCB
+#define A3XX_RBBM_PERFCTR_TP_3_LO 0xCC
+#define A3XX_RBBM_PERFCTR_TP_3_HI 0xCD
+#define A3XX_RBBM_PERFCTR_TP_4_LO 0xCE
+#define A3XX_RBBM_PERFCTR_TP_4_HI 0xCF
+#define A3XX_RBBM_PERFCTR_TP_5_LO 0xD0
+#define A3XX_RBBM_PERFCTR_TP_5_HI 0xD1
+#define A3XX_RBBM_PERFCTR_SP_0_LO 0xD2
+#define A3XX_RBBM_PERFCTR_SP_0_HI 0xD3
+#define A3XX_RBBM_PERFCTR_SP_1_LO 0xD4
+#define A3XX_RBBM_PERFCTR_SP_1_HI 0xD5
+#define A3XX_RBBM_PERFCTR_SP_2_LO 0xD6
+#define A3XX_RBBM_PERFCTR_SP_2_HI 0xD7
+#define A3XX_RBBM_PERFCTR_SP_3_LO 0xD8
+#define A3XX_RBBM_PERFCTR_SP_3_HI 0xD9
+#define A3XX_RBBM_PERFCTR_SP_4_LO 0xDA
+#define A3XX_RBBM_PERFCTR_SP_4_HI 0xDB
 #define A3XX_RBBM_PERFCTR_SP_5_LO 0xDC
 #define A3XX_RBBM_PERFCTR_SP_5_HI 0xDD
 #define A3XX_RBBM_PERFCTR_SP_6_LO 0xDE
 #define A3XX_RBBM_PERFCTR_SP_6_HI 0xDF
 #define A3XX_RBBM_PERFCTR_SP_7_LO 0xE0
 #define A3XX_RBBM_PERFCTR_SP_7_HI 0xE1
+#define A3XX_RBBM_PERFCTR_RB_0_LO 0xE2
+#define A3XX_RBBM_PERFCTR_RB_0_HI 0xE3
+#define A3XX_RBBM_PERFCTR_RB_1_LO 0xE4
+#define A3XX_RBBM_PERFCTR_RB_1_HI 0xE5
+
 #define A3XX_RBBM_RBBM_CTL 0x100
-#define A3XX_RBBM_RBBM_CTL 0x100
+#define A3XX_RBBM_PERFCTR_PWR_0_LO 0x0EA
+#define A3XX_RBBM_PERFCTR_PWR_0_HI 0x0EB
 #define A3XX_RBBM_PERFCTR_PWR_1_LO 0x0EC
 #define A3XX_RBBM_PERFCTR_PWR_1_HI 0x0ED
 #define A3XX_RBBM_DEBUG_BUS_CTL             0x111
@ -90,6 +178,7 @@
 #define A3XX_CP_MERCIU_DATA2 0x1D3
 #define A3XX_CP_MEQ_ADDR 0x1DA
 #define A3XX_CP_MEQ_DATA 0x1DB
+#define A3XX_CP_PERFCOUNTER_SELECT 0x445
 #define A3XX_CP_HW_FAULT  0x45C
 #define A3XX_CP_AHB_FAULT 0x54D
 #define A3XX_CP_PROTECT_CTRL 0x45E
@ -138,6 +227,14 @@
 #define A3XX_VSC_PIPE_CONFIG_7 0xC1B
 #define A3XX_VSC_PIPE_DATA_ADDRESS_7 0xC1C
 #define A3XX_VSC_PIPE_DATA_LENGTH_7 0xC1D
+#define A3XX_PC_PERFCOUNTER0_SELECT 0xC48
+#define A3XX_PC_PERFCOUNTER1_SELECT 0xC49
+#define A3XX_PC_PERFCOUNTER2_SELECT 0xC4A
+#define A3XX_PC_PERFCOUNTER3_SELECT 0xC4B
+#define A3XX_GRAS_PERFCOUNTER0_SELECT 0xC88
+#define A3XX_GRAS_PERFCOUNTER1_SELECT 0xC89
+#define A3XX_GRAS_PERFCOUNTER2_SELECT 0xC8A
+#define A3XX_GRAS_PERFCOUNTER3_SELECT 0xC8B
 #define A3XX_GRAS_CL_USER_PLANE_X0 0xCA0
 #define A3XX_GRAS_CL_USER_PLANE_Y0 0xCA1
 #define A3XX_GRAS_CL_USER_PLANE_Z0 0xCA2
@ -163,14 +260,42 @@
 #define A3XX_GRAS_CL_USER_PLANE_Z5 0xCB6
 #define A3XX_GRAS_CL_USER_PLANE_W5 0xCB7
 #define A3XX_RB_GMEM_BASE_ADDR 0xCC0
+#define A3XX_RB_PERFCOUNTER0_SELECT   0xCC6
+#define A3XX_RB_PERFCOUNTER1_SELECT   0xCC7
+#define A3XX_HLSQ_PERFCOUNTER0_SELECT 0xE00
+#define A3XX_HLSQ_PERFCOUNTER1_SELECT 0xE01
+#define A3XX_HLSQ_PERFCOUNTER2_SELECT 0xE02
+#define A3XX_HLSQ_PERFCOUNTER3_SELECT 0xE03
+#define A3XX_HLSQ_PERFCOUNTER4_SELECT 0xE04
+#define A3XX_HLSQ_PERFCOUNTER5_SELECT 0xE05
 #define A3XX_VFD_PERFCOUNTER0_SELECT 0xE44
+#define A3XX_VFD_PERFCOUNTER1_SELECT 0xE45
 #define A3XX_VPC_VPC_DEBUG_RAM_SEL 0xE61
 #define A3XX_VPC_VPC_DEBUG_RAM_READ 0xE62
+#define A3XX_VPC_PERFCOUNTER0_SELECT 0xE64
+#define A3XX_VPC_PERFCOUNTER1_SELECT 0xE65
 #define A3XX_UCHE_CACHE_MODE_CONTROL_REG 0xE82
+#define A3XX_UCHE_PERFCOUNTER0_SELECT 0xE84
+#define A3XX_UCHE_PERFCOUNTER1_SELECT 0xE85
+#define A3XX_UCHE_PERFCOUNTER2_SELECT 0xE86
+#define A3XX_UCHE_PERFCOUNTER3_SELECT 0xE87
+#define A3XX_UCHE_PERFCOUNTER4_SELECT 0xE88
+#define A3XX_UCHE_PERFCOUNTER5_SELECT 0xE89
 #define A3XX_UCHE_CACHE_INVALIDATE0_REG 0xEA0
+#define A3XX_SP_PERFCOUNTER0_SELECT 0xEC4
+#define A3XX_SP_PERFCOUNTER1_SELECT 0xEC5
+#define A3XX_SP_PERFCOUNTER2_SELECT 0xEC6
+#define A3XX_SP_PERFCOUNTER3_SELECT 0xEC7
+#define A3XX_SP_PERFCOUNTER4_SELECT 0xEC8
 #define A3XX_SP_PERFCOUNTER5_SELECT 0xEC9
 #define A3XX_SP_PERFCOUNTER6_SELECT 0xECA
 #define A3XX_SP_PERFCOUNTER7_SELECT 0xECB
+#define A3XX_TP_PERFCOUNTER0_SELECT 0xF04
+#define A3XX_TP_PERFCOUNTER1_SELECT 0xF05
+#define A3XX_TP_PERFCOUNTER2_SELECT 0xF06
+#define A3XX_TP_PERFCOUNTER3_SELECT 0xF07
+#define A3XX_TP_PERFCOUNTER4_SELECT 0xF08
+#define A3XX_TP_PERFCOUNTER5_SELECT 0xF09
 #define A3XX_GRAS_CL_CLIP_CNTL 0x2040
 #define A3XX_GRAS_CL_GB_CLIP_ADJ 0x2044
 #define A3XX_GRAS_CL_VPORT_XOFFSET 0x2048
@ -232,12 +357,14 @@
 #define A3XX_SP_VS_OUT_REG_7 0x22CE
 #define A3XX_SP_VS_VPC_DST_REG_0 0x22D0
 #define A3XX_SP_VS_OBJ_OFFSET_REG 0x22D4
+#define A3XX_SP_VS_OBJ_START_REG 0x22D5
 #define A3XX_SP_VS_PVT_MEM_ADDR_REG 0x22D7
 #define A3XX_SP_VS_PVT_MEM_SIZE_REG 0x22D8
 #define A3XX_SP_VS_LENGTH_REG 0x22DF
 #define A3XX_SP_FS_CTRL_REG0 0x22E0
 #define A3XX_SP_FS_CTRL_REG1 0x22E1
 #define A3XX_SP_FS_OBJ_OFFSET_REG 0x22E2
+#define A3XX_SP_FS_OBJ_START_REG 0x22E3
 #define A3XX_SP_FS_PVT_MEM_ADDR_REG 0x22E5
 #define A3XX_SP_FS_PVT_MEM_SIZE_REG 0x22E6
 #define A3XX_SP_FS_FLAT_SHAD_MODE_REG_0 0x22E8
@ -269,10 +396,25 @@
 #define A3XX_VBIF_OUT_AXI_AMEMTYPE_CONF0 0x3058
 #define A3XX_VBIF_OUT_AXI_AOOO_EN 0x305E
 #define A3XX_VBIF_OUT_AXI_AOOO 0x305F
+#define A3XX_VBIF_PERF_CNT_EN 0x3070
+#define A3XX_VBIF_PERF_CNT_CLR 0x3071
+#define A3XX_VBIF_PERF_CNT_SEL 0x3072
+#define A3XX_VBIF_PERF_CNT0_LO 0x3073
+#define A3XX_VBIF_PERF_CNT0_HI 0x3074
+#define A3XX_VBIF_PERF_CNT1_LO 0x3075
+#define A3XX_VBIF_PERF_CNT1_HI 0x3076
+#define A3XX_VBIF_PERF_PWR_CNT0_LO 0x3077
+#define A3XX_VBIF_PERF_PWR_CNT0_HI 0x3078
+#define A3XX_VBIF_PERF_PWR_CNT1_LO 0x3079
+#define A3XX_VBIF_PERF_PWR_CNT1_HI 0x307a
+#define A3XX_VBIF_PERF_PWR_CNT2_LO 0x307b
+#define A3XX_VBIF_PERF_PWR_CNT2_HI 0x307c

 /* Bit flags for RBBM_CTL */
-#define RBBM_RBBM_CTL_RESET_PWR_CTR1  (1 << 1)
-#define RBBM_RBBM_CTL_ENABLE_PWR_CTR1  (1 << 17)
+#define RBBM_RBBM_CTL_RESET_PWR_CTR0  BIT(0)
+#define RBBM_RBBM_CTL_RESET_PWR_CTR1  BIT(1)
+#define RBBM_RBBM_CTL_ENABLE_PWR_CTR0  BIT(16)
+#define RBBM_RBBM_CTL_ENABLE_PWR_CTR1  BIT(17)

 /* Various flags used by the context switch code */

@ -537,7 +679,13 @@
 #define RBBM_BLOCK_ID_MARB_3           0x2b

 /* RBBM_CLOCK_CTL default value */
-#define A3XX_RBBM_CLOCK_CTL_DEFAULT 0xBFFFFFFF
+#define A305_RBBM_CLOCK_CTL_DEFAULT   0xAAAAAAAA
+#define A320_RBBM_CLOCK_CTL_DEFAULT   0xBFFFFFFF
+#define A330_RBBM_CLOCK_CTL_DEFAULT   0xAAAAAAAE
+#define A330v2_RBBM_CLOCK_CTL_DEFAULT 0xAAAAAAAA
+
+#define A330_RBBM_GPR0_CTL_DEFAULT  0x0AE2B8AE
+#define A330v2_RBBM_GPR0_CTL_DEFAULT  0x0AA2A8AA

 /* COUNTABLE FOR SP PERFCOUNTER */
 #define SP_FS_FULL_ALU_INSTRUCTIONS    0x0E
@ -545,4 +693,20 @@
 #define SP0_ICL1_MISSES                0x1A
 #define SP_FS_CFLOW_INSTRUCTIONS       0x0C

+/* VBIF PERFCOUNTER ENA/CLR values */
+#define VBIF_PERF_CNT_0 BIT(0)
+#define VBIF_PERF_CNT_1 BIT(1)
+#define VBIF_PERF_PWR_CNT_0 BIT(2)
+#define VBIF_PERF_PWR_CNT_1 BIT(3)
+#define VBIF_PERF_PWR_CNT_2 BIT(4)
+
+/* VBIF PERFCOUNTER SEL values */
+#define VBIF_PERF_CNT_0_SEL 0
+#define VBIF_PERF_CNT_0_SEL_MASK 0x7f
+#define VBIF_PERF_CNT_1_SEL 8
+#define VBIF_PERF_CNT_1_SEL_MASK 0x7f00
+
+/* VBIF countables */
+#define VBIF_DDR_TOTAL_CYCLES 110
+
 #endif
--- a/drivers/gpu/msm/adreno.c
+++ b/drivers/gpu/msm/adreno.c
--- a/drivers/gpu/msm/adreno.h
+++ b/drivers/gpu/msm/adreno.h
@ -1,4 +1,4 @@
-/* Copyright (c) 2008-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2008-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -25,17 +25,20 @@
 #define ADRENO_DEVICE(device) \
 		KGSL_CONTAINER_OF(device, struct adreno_device, dev)

+#define ADRENO_CONTEXT(device) \
+		KGSL_CONTAINER_OF(device, struct adreno_context, base)
+
 #define ADRENO_CHIPID_CORE(_id) (((_id) >> 24) & 0xFF)
 #define ADRENO_CHIPID_MAJOR(_id) (((_id) >> 16) & 0xFF)
 #define ADRENO_CHIPID_MINOR(_id) (((_id) >> 8) & 0xFF)
 #define ADRENO_CHIPID_PATCH(_id) ((_id) & 0xFF)

 /* Flags to control command packet settings */
-#define KGSL_CMD_FLAGS_NONE             0x00000000
-#define KGSL_CMD_FLAGS_PMODE		0x00000001
-#define KGSL_CMD_FLAGS_INTERNAL_ISSUE	0x00000002
-#define KGSL_CMD_FLAGS_GET_INT		0x00000004
-#define KGSL_CMD_FLAGS_EOF	        0x00000100
+#define KGSL_CMD_FLAGS_NONE             0
+#define KGSL_CMD_FLAGS_PMODE		BIT(0)
+#define KGSL_CMD_FLAGS_INTERNAL_ISSUE   BIT(1)
+#define KGSL_CMD_FLAGS_GET_INT		BIT(2)
+#define KGSL_CMD_FLAGS_WFI              BIT(3)

 /* Command identifiers */
 #define KGSL_CONTEXT_TO_MEM_IDENTIFIER	0x2EADBEEF
@ -78,6 +81,47 @@ enum adreno_gpurev {
 	ADRENO_REV_A330 = 330,
 };

+/*
+ * Maximum size of the dispatcher ringbuffer - the actual inflight size will be
+ * smaller then this but this size will allow for a larger range of inflight
+ * sizes that can be chosen at runtime
+ */
+
+#define ADRENO_DISPATCH_CMDQUEUE_SIZE 128
+
+/**
+ * struct adreno_dispatcher - container for the adreno GPU dispatcher
+ * @mutex: Mutex to protect the structure
+ * @state: Current state of the dispatcher (active or paused)
+ * @timer: Timer to monitor the progress of the command batches
+ * @inflight: Number of command batch operations pending in the ringbuffer
+ * @fault: True if a HW fault was detected
+ * @pending: Priority list of contexts waiting to submit command batches
+ * @plist_lock: Spin lock to protect the pending queue
+ * @cmdqueue: Queue of command batches currently flight
+ * @head: pointer to the head of of the cmdqueue.  This is the oldest pending
+ * operation
+ * @tail: pointer to the tail of the cmdqueue.  This is the most recently
+ * submitted operation
+ * @work: work_struct to put the dispatcher in a work queue
+ * @kobj: kobject for the dispatcher directory in the device sysfs node
+ */
+struct adreno_dispatcher {
+	struct mutex mutex;
+	unsigned int state;
+	struct timer_list timer;
+	struct timer_list fault_timer;
+	unsigned int inflight;
+	int fault;
+	struct plist_head pending;
+	spinlock_t plist_lock;
+	struct kgsl_cmdbatch *cmdqueue[ADRENO_DISPATCH_CMDQUEUE_SIZE];
+	unsigned int head;
+	unsigned int tail;
+	struct work_struct work;
+	struct kobject kobj;
+};
+
 struct adreno_gpudev;

 struct adreno_device {
@ -105,7 +149,6 @@ struct adreno_device {
 	unsigned int ib_check_level;
 	unsigned int fast_hang_detect;
 	unsigned int ft_policy;
-	unsigned int ft_user_control;
 	unsigned int long_ib_detect;
 	unsigned int long_ib;
 	unsigned int long_ib_ts;
@ -113,6 +156,46 @@ struct adreno_device {
 	unsigned int gpulist_index;
 	struct ocmem_buf *ocmem_hdl;
 	unsigned int ocmem_base;
+	unsigned int gpu_cycles;
+	struct adreno_dispatcher dispatcher;
+};
+
+#define PERFCOUNTER_FLAG_NONE 0x0
+#define PERFCOUNTER_FLAG_KERNEL 0x1
+
+/* Structs to maintain the list of active performance counters */
+
+/**
+ * struct adreno_perfcount_register: register state
+ * @countable: countable the register holds
+ * @refcount: number of users of the register
+ * @offset: register hardware offset
+ */
+struct adreno_perfcount_register {
+	unsigned int countable;
+	unsigned int refcount;
+	unsigned int offset;
+	unsigned int flags;
+};
+
+/**
+ * struct adreno_perfcount_group: registers for a hardware group
+ * @regs: available registers for this group
+ * @reg_count: total registers for this group
+ */
+struct adreno_perfcount_group {
+	struct adreno_perfcount_register *regs;
+	unsigned int reg_count;
+};
+
+/**
+ * adreno_perfcounts: all available perfcounter groups
+ * @groups: available groups for this device
+ * @group_count: total groups for this device
+ */
+struct adreno_perfcounters {
+	struct adreno_perfcount_group *groups;
+	unsigned int group_count;
 };

 struct adreno_gpudev {
@ -126,60 +209,50 @@ struct adreno_gpudev {
 	/* keeps track of when we need to execute the draw workaround code */
 	int ctx_switches_since_last_draw;

+	struct adreno_perfcounters *perfcounters;
+
 	/* GPU specific function hooks */
 	int (*ctxt_create)(struct adreno_device *, struct adreno_context *);
-	void (*ctxt_save)(struct adreno_device *, struct adreno_context *);
-	void (*ctxt_restore)(struct adreno_device *, struct adreno_context *);
-	void (*ctxt_draw_workaround)(struct adreno_device *,
+	int (*ctxt_save)(struct adreno_device *, struct adreno_context *);
+	int (*ctxt_restore)(struct adreno_device *, struct adreno_context *);
+	int (*ctxt_draw_workaround)(struct adreno_device *,
 					struct adreno_context *);
 	irqreturn_t (*irq_handler)(struct adreno_device *);
 	void (*irq_control)(struct adreno_device *, int);
 	unsigned int (*irq_pending)(struct adreno_device *);
 	void * (*snapshot)(struct adreno_device *, void *, int *, int);
-	void (*rb_init)(struct adreno_device *, struct adreno_ringbuffer *);
+	int (*rb_init)(struct adreno_device *, struct adreno_ringbuffer *);
+	void (*perfcounter_init)(struct adreno_device *);
 	void (*start)(struct adreno_device *);
 	unsigned int (*busy_cycles)(struct adreno_device *);
+	void (*perfcounter_enable)(struct adreno_device *, unsigned int group,
+		unsigned int counter, unsigned int countable);
+	uint64_t (*perfcounter_read)(struct adreno_device *adreno_dev,
+		unsigned int group, unsigned int counter,
+		unsigned int offset);
 };

-/*
- * struct adreno_ft_data - Structure that contains all information to
- * perform gpu fault tolerance
- * @ib1 - IB1 that the GPU was executing when hang happened
- * @context_id - Context which caused the hang
- * @global_eop - eoptimestamp at time of hang
- * @rb_buffer - Buffer that holds the commands from good contexts
- * @rb_size - Number of valid dwords in rb_buffer
- * @bad_rb_buffer - Buffer that holds commands from the hanging context
- * bad_rb_size - Number of valid dwords in bad_rb_buffer
- * @good_rb_buffer - Buffer that holds commands from good contexts
- * good_rb_size - Number of valid dwords in good_rb_buffer
- * @last_valid_ctx_id - The last context from which commands were placed in
- * ringbuffer before the GPU hung
- * @step - Current fault tolerance step being executed
- * @err_code - Fault tolerance error code
- * @fault - Indicates whether the hang was caused due to a pagefault
- * @start_of_replay_cmds - Offset in ringbuffer from where commands can be
- * replayed during fault tolerance
- * @replay_for_snapshot - Offset in ringbuffer where IB's can be saved for
- * replaying with snapshot
- */
-struct adreno_ft_data {
-	unsigned int ib1;
-	unsigned int context_id;
-	unsigned int global_eop;
-	unsigned int *rb_buffer;
-	unsigned int rb_size;
-	unsigned int *bad_rb_buffer;
-	unsigned int bad_rb_size;
-	unsigned int *good_rb_buffer;
-	unsigned int good_rb_size;
-	unsigned int last_valid_ctx_id;
-	unsigned int status;
-	unsigned int ft_policy;
-	unsigned int err_code;
-	unsigned int start_of_replay_cmds;
-	unsigned int replay_for_snapshot;
-};
+#define FT_DETECT_REGS_COUNT 12
+
+/* Fault Tolerance policy flags */
+#define  KGSL_FT_OFF                      BIT(0)
+#define  KGSL_FT_REPLAY                   BIT(1)
+#define  KGSL_FT_SKIPIB                   BIT(2)
+#define  KGSL_FT_SKIPFRAME                BIT(3)
+#define  KGSL_FT_DISABLE                  BIT(4)
+#define  KGSL_FT_TEMP_DISABLE             BIT(5)
+#define  KGSL_FT_DEFAULT_POLICY           (KGSL_FT_REPLAY + KGSL_FT_SKIPIB)
+
+/* This internal bit is used to skip the PM dump on replayed command batches */
+#define  KGSL_FT_SKIP_PMDUMP              BIT(31)
+
+/* Pagefault policy flags */
+#define KGSL_FT_PAGEFAULT_INT_ENABLE         BIT(0)
+#define KGSL_FT_PAGEFAULT_GPUHALT_ENABLE     BIT(1)
+#define KGSL_FT_PAGEFAULT_LOG_ONE_PER_PAGE   BIT(2)
+#define KGSL_FT_PAGEFAULT_LOG_ONE_PER_INT    BIT(3)
+#define KGSL_FT_PAGEFAULT_DEFAULT_POLICY     (KGSL_FT_PAGEFAULT_INT_ENABLE + \
+					KGSL_FT_PAGEFAULT_GPUHALT_ENABLE)

 extern struct adreno_gpudev adreno_a2xx_gpudev;
 extern struct adreno_gpudev adreno_a3xx_gpudev;
@ -203,7 +276,6 @@ extern const unsigned int a330_registers[];
 extern const unsigned int a330_registers_count;

 extern unsigned int ft_detect_regs[];
-extern const unsigned int ft_detect_regs_count;


 int adreno_idle(struct kgsl_device *device);
@ -213,6 +285,8 @@ void adreno_regwrite(struct kgsl_device *device, unsigned int offsetwords,
 				unsigned int value);

 int adreno_dump(struct kgsl_device *device, int manual);
+unsigned int adreno_a3xx_rbbm_clock_ctl_default(struct adreno_device
+							*adreno_dev);

 struct kgsl_memdesc *adreno_find_region(struct kgsl_device *device,
 						unsigned int pt_base,
@ -228,13 +302,30 @@ struct kgsl_memdesc *adreno_find_ctxtmem(struct kgsl_device *device,
 void *adreno_snapshot(struct kgsl_device *device, void *snapshot, int *remain,
 		int hang);

-int adreno_dump_and_exec_ft(struct kgsl_device *device);
+void adreno_dispatcher_start(struct adreno_device *adreno_dev);
+int adreno_dispatcher_init(struct adreno_device *adreno_dev);
+void adreno_dispatcher_close(struct adreno_device *adreno_dev);
+int adreno_dispatcher_idle(struct adreno_device *adreno_dev,
+		unsigned int timeout);
+void adreno_dispatcher_irq_fault(struct kgsl_device *device);
+void adreno_dispatcher_stop(struct adreno_device *adreno_dev);

-void adreno_dump_rb(struct kgsl_device *device, const void *buf,
-			 size_t len, int start, int size);
+int adreno_context_queue_cmd(struct adreno_device *adreno_dev,
+		struct adreno_context *drawctxt, struct kgsl_cmdbatch *cmdbatch,
+		uint32_t *timestamp);

-unsigned int adreno_ft_detect(struct kgsl_device *device,
-						unsigned int *prev_reg_val);
+void adreno_dispatcher_schedule(struct kgsl_device *device);
+void adreno_dispatcher_pause(struct adreno_device *adreno_dev);
+void adreno_dispatcher_queue_context(struct kgsl_device *device,
+	struct adreno_context *drawctxt);
+int adreno_reset(struct kgsl_device *device);
+
+int adreno_perfcounter_get(struct adreno_device *adreno_dev,
+	unsigned int groupid, unsigned int countable, unsigned int *offset,
+	unsigned int flags);
+
+int adreno_perfcounter_put(struct adreno_device *adreno_dev,
+	unsigned int groupid, unsigned int countable);

 static inline int adreno_is_a200(struct adreno_device *adreno_dev)
 {
@ -297,23 +388,33 @@ static inline int adreno_is_a330(struct adreno_device *adreno_dev)
 	return (adreno_dev->gpurev == ADRENO_REV_A330);
 }

+static inline int adreno_is_a330v2(struct adreno_device *adreno_dev)
+{
+	return ((adreno_dev->gpurev == ADRENO_REV_A330) &&
+		(ADRENO_CHIPID_PATCH(adreno_dev->chip_id) > 0));
+}
+
 static inline int adreno_rb_ctxtswitch(unsigned int *cmd)
 {
 	return (cmd[0] == cp_nop_packet(1) &&
 		cmd[1] == KGSL_CONTEXT_TO_MEM_IDENTIFIER);
 }

+/**
+ * adreno_context_timestamp() - Return the last queued timestamp for the context
+ * @k_ctxt: Pointer to the KGSL context to query
+ * @rb: Pointer to the ringbuffer structure for the GPU
+ *
+ * Return the last queued context for the given context. This is used to verify
+ * that incoming requests are not using an invalid (unsubmitted) timestamp
+ */
 static inline int adreno_context_timestamp(struct kgsl_context *k_ctxt,
 		struct adreno_ringbuffer *rb)
 {
-	struct adreno_context *a_ctxt = NULL;
-
-	if (k_ctxt)
-		a_ctxt = k_ctxt->devctxt;
-
-	if (a_ctxt && a_ctxt->flags & CTXT_FLAGS_PER_CONTEXT_TS)
+	if (k_ctxt) {
+		struct adreno_context *a_ctxt = ADRENO_CONTEXT(k_ctxt);
 		return a_ctxt->timestamp;
-
+	}
 	return rb->global_ts;
 }

--- a/drivers/gpu/msm/adreno_a2xx.c
+++ b/drivers/gpu/msm/adreno_a2xx.c
@ -1355,7 +1355,7 @@ static int a2xx_create_gmem_shadow(struct adreno_device *adreno_dev,
 	tmp_ctx.gmem_base = adreno_dev->gmem_base;

 	result = kgsl_allocate(&drawctxt->context_gmem_shadow.gmemshadow,
-		drawctxt->pagetable, drawctxt->context_gmem_shadow.size);
+		drawctxt->base.pagetable, drawctxt->context_gmem_shadow.size);

 	if (result)
 		return result;
@ -1365,7 +1365,7 @@ static int a2xx_create_gmem_shadow(struct adreno_device *adreno_dev,

 	/* blank out gmem shadow. */
 	kgsl_sharedmem_set(&drawctxt->context_gmem_shadow.gmemshadow, 0, 0,
-			   drawctxt->context_gmem_shadow.size);
+			drawctxt->context_gmem_shadow.size);

 	/* build quad vertex buffer */
 	build_quad_vtxbuff(drawctxt, &drawctxt->context_gmem_shadow,
@ -1409,13 +1409,13 @@ static int a2xx_drawctxt_create(struct adreno_device *adreno_dev,
 	 */

 	ret = kgsl_allocate(&drawctxt->gpustate,
-		drawctxt->pagetable, _context_size(adreno_dev));
+		drawctxt->base.pagetable, _context_size(adreno_dev));

 	if (ret)
 		return ret;

-	kgsl_sharedmem_set(&drawctxt->gpustate, 0, 0,
-		_context_size(adreno_dev));
+	kgsl_sharedmem_set(&drawctxt->gpustate,
+		0, 0, _context_size(adreno_dev));

 	tmp_ctx.cmd = tmp_ctx.start
 	    = (unsigned int *)((char *)drawctxt->gpustate.hostptr + CMD_OFFSET);
@ -1439,8 +1439,8 @@ static int a2xx_drawctxt_create(struct adreno_device *adreno_dev,
 	kgsl_cache_range_op(&drawctxt->gpustate,
 			    KGSL_CACHE_OP_FLUSH);

-	kgsl_cffdump_syncmem(NULL, &drawctxt->gpustate,
-			drawctxt->gpustate.gpuaddr,
+	kgsl_cffdump_syncmem(NULL,
+			&drawctxt->gpustate, drawctxt->gpustate.gpuaddr,
 			drawctxt->gpustate.size, false);

 done:
@ -1450,7 +1450,7 @@ done:
 	return ret;
 }

-static void a2xx_drawctxt_draw_workaround(struct adreno_device *adreno_dev,
+static int a2xx_drawctxt_draw_workaround(struct adreno_device *adreno_dev,
 					struct adreno_context *context)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
@ -1467,7 +1467,7 @@ static void a2xx_drawctxt_draw_workaround(struct adreno_device *adreno_dev,
 				ADRENO_NUM_CTX_SWITCH_ALLOWED_BEFORE_DRAW)
 			adreno_dev->gpudev->ctx_switches_since_last_draw = 0;
 		else
-			return;
+			return 0;
 		/*
 		 * Issue an empty draw call to avoid possible hangs due to
 		 * repeated idles without intervening draw calls.
@ -1498,138 +1498,201 @@ static void a2xx_drawctxt_draw_workaround(struct adreno_device *adreno_dev,
 					| adreno_dev->pix_shader_start;
 	}

-	adreno_ringbuffer_issuecmds(device, context, KGSL_CMD_FLAGS_PMODE,
-			&cmd[0], cmds - cmd);
+	return adreno_ringbuffer_issuecmds(device, context,
+			KGSL_CMD_FLAGS_PMODE, &cmd[0], cmds - cmd);
 }

-static void a2xx_drawctxt_save(struct adreno_device *adreno_dev,
+static int a2xx_drawctxt_save(struct adreno_device *adreno_dev,
 			struct adreno_context *context)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
+	int ret;

 	if (context == NULL || (context->flags & CTXT_FLAGS_BEING_DESTROYED))
-		return;
+		return 0;

-	if (context->flags & CTXT_FLAGS_GPU_HANG)
-		KGSL_CTXT_WARN(device,
-			"Current active context has caused gpu hang\n");
+	if (context->state == ADRENO_CONTEXT_STATE_INVALID)
+		return 0;

 	if (!(context->flags & CTXT_FLAGS_PREAMBLE)) {
-
+		kgsl_cffdump_syncmem(NULL, &context->gpustate,
+			context->reg_save[1],
+			context->reg_save[2] << 2, true);
 		/* save registers and constants. */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE,
 			context->reg_save, 3);

+		if (ret)
+			return ret;
+
 		if (context->flags & CTXT_FLAGS_SHADER_SAVE) {
+			kgsl_cffdump_syncmem(NULL,
+				&context->gpustate,
+				context->shader_save[1],
+				context->shader_save[2] << 2, true);
 			/* save shader partitioning and instructions. */
-			adreno_ringbuffer_issuecmds(device, context,
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_PMODE,
 				context->shader_save, 3);

+			kgsl_cffdump_syncmem(NULL,
+				&context->gpustate,
+				context->shader_fixup[1],
+				context->shader_fixup[2] << 2, true);
 			/*
 			 * fixup shader partitioning parameter for
 			 *  SET_SHADER_BASES.
 			 */
-			adreno_ringbuffer_issuecmds(device, context,
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_NONE,
 				context->shader_fixup, 3);

+			if (ret)
+				return ret;
+
 			context->flags |= CTXT_FLAGS_SHADER_RESTORE;
 		}
 	}

 	if ((context->flags & CTXT_FLAGS_GMEM_SAVE) &&
 	    (context->flags & CTXT_FLAGS_GMEM_SHADOW)) {
+		kgsl_cffdump_syncmem(NULL, &context->gpustate,
+			context->context_gmem_shadow.gmem_save[1],
+			context->context_gmem_shadow.gmem_save[2] << 2, true);
 		/* save gmem.
 		 * (note: changes shader. shader must already be saved.)
 		 */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_PMODE,
 			context->context_gmem_shadow.gmem_save, 3);

+		if (ret)
+			return ret;
+
+		kgsl_cffdump_syncmem(NULL, &context->gpustate,
+			context->chicken_restore[1],
+			context->chicken_restore[2] << 2, true);
+
 		/* Restore TP0_CHICKEN */
 		if (!(context->flags & CTXT_FLAGS_PREAMBLE)) {
-			adreno_ringbuffer_issuecmds(device, context,
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_NONE,
 				context->chicken_restore, 3);
+
+			if (ret)
+				return ret;
 		}
 		adreno_dev->gpudev->ctx_switches_since_last_draw = 0;

 		context->flags |= CTXT_FLAGS_GMEM_RESTORE;
 	} else if (adreno_is_a2xx(adreno_dev))
-		a2xx_drawctxt_draw_workaround(adreno_dev, context);
+		return a2xx_drawctxt_draw_workaround(adreno_dev, context);
+
+	return 0;
 }

-static void a2xx_drawctxt_restore(struct adreno_device *adreno_dev,
+static int a2xx_drawctxt_restore(struct adreno_device *adreno_dev,
 			struct adreno_context *context)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
 	unsigned int cmds[5];
+	int ret = 0;

 	if (context == NULL) {
-		/* No context - set the default apgetable and thats it */
+		/* No context - set the default pagetable and thats it */
+		unsigned int id;
+		/*
+		 * If there isn't a current context, the kgsl_mmu_setstate
+		 * will use the CPU path so we don't need to give
+		 * it a valid context id.
+		 */
+		id = (adreno_dev->drawctxt_active != NULL)
+			? adreno_dev->drawctxt_active->base.id
+			: KGSL_CONTEXT_INVALID;
 		kgsl_mmu_setstate(&device->mmu, device->mmu.defaultpagetable,
-				adreno_dev->drawctxt_active->id);
-		return;
+				  id);
+		return 0;
 	}

-	KGSL_CTXT_INFO(device, "context flags %08x\n", context->flags);
-
 	cmds[0] = cp_nop_packet(1);
 	cmds[1] = KGSL_CONTEXT_TO_MEM_IDENTIFIER;
 	cmds[2] = cp_type3_packet(CP_MEM_WRITE, 2);
 	cmds[3] = device->memstore.gpuaddr +
 		KGSL_MEMSTORE_OFFSET(KGSL_MEMSTORE_GLOBAL, current_context);
-	cmds[4] = context->id;
-	adreno_ringbuffer_issuecmds(device, context, KGSL_CMD_FLAGS_NONE,
+	cmds[4] = context->base.id;
+	ret = adreno_ringbuffer_issuecmds(device, context, KGSL_CMD_FLAGS_NONE,
 					cmds, 5);
-	kgsl_mmu_setstate(&device->mmu, context->pagetable, context->id);
+	if (ret)
+		return ret;

-#ifndef CONFIG_MSM_KGSL_CFF_DUMP_NO_CONTEXT_MEM_DUMP
-	kgsl_cffdump_syncmem(NULL, &context->gpustate,
-		context->gpustate.gpuaddr, LCC_SHADOW_SIZE +
-		REG_SHADOW_SIZE + CMD_BUFFER_SIZE + TEX_SHADOW_SIZE, false);
-#endif
+	kgsl_mmu_setstate(&device->mmu, context->base.pagetable,
+			context->base.id);

 	/* restore gmem.
 	 *  (note: changes shader. shader must not already be restored.)
 	 */
 	if (context->flags & CTXT_FLAGS_GMEM_RESTORE) {
-		adreno_ringbuffer_issuecmds(device, context,
+		kgsl_cffdump_syncmem(NULL, &context->gpustate,
+			context->context_gmem_shadow.gmem_restore[1],
+			context->context_gmem_shadow.gmem_restore[2] << 2,
+			true);
+
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_PMODE,
 			context->context_gmem_shadow.gmem_restore, 3);
+		if (ret)
+			return ret;

 		if (!(context->flags & CTXT_FLAGS_PREAMBLE)) {
+			kgsl_cffdump_syncmem(NULL, &context->gpustate,
+				context->chicken_restore[1],
+				context->chicken_restore[2] << 2, true);
+
 			/* Restore TP0_CHICKEN */
-			adreno_ringbuffer_issuecmds(device, context,
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_NONE,
 				context->chicken_restore, 3);
+			if (ret)
+				return ret;
 		}

 		context->flags &= ~CTXT_FLAGS_GMEM_RESTORE;
 	}

 	if (!(context->flags & CTXT_FLAGS_PREAMBLE)) {
+		kgsl_cffdump_syncmem(NULL, &context->gpustate,
+			context->reg_restore[1],
+			context->reg_restore[2] << 2, true);

 		/* restore registers and constants. */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE, context->reg_restore, 3);
+		if (ret)
+			return ret;

 		/* restore shader instructions & partitioning. */
 		if (context->flags & CTXT_FLAGS_SHADER_RESTORE) {
-			adreno_ringbuffer_issuecmds(device, context,
+			kgsl_cffdump_syncmem(NULL, &context->gpustate,
+				context->shader_restore[1],
+				context->shader_restore[2] << 2, true);
+
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_NONE,
 				context->shader_restore, 3);
+			if (ret)
+				return ret;
 		}
 	}

 	if (adreno_is_a20x(adreno_dev)) {
 		cmds[0] = cp_type3_packet(CP_SET_BIN_BASE_OFFSET, 1);
 		cmds[1] = context->bin_base_offset;
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE, cmds, 2);
 	}
+
+	return ret;
 }

 /*
@ -1696,13 +1759,14 @@ static void a2xx_cp_intrcallback(struct kgsl_device *device)

 	if (!status) {
 		if (master_status & MASTER_INT_SIGNAL__CP_INT_STAT) {
-			/* This indicates that we could not read CP_INT_STAT.
-			 * As a precaution just wake up processes so
-			 * they can check their timestamps. Since, we
-			 * did not ack any interrupts this interrupt will
-			 * be generated again */
+			/*
+			 * This indicates that we could not read CP_INT_STAT.
+			 * As a precaution schedule the dispatcher to check
+			 * things out. Since we did not ack any interrupts this
+			 * interrupt will be generated again
+			 */
 			KGSL_DRV_WARN(device, "Unable to read CP_INT_STATUS\n");
-			wake_up_interruptible_all(&device->wait_queue);
+			adreno_dispatcher_schedule(device);
 		} else
 			KGSL_DRV_WARN(device, "Spurious interrput detected\n");
 		return;
@ -1727,9 +1791,8 @@ static void a2xx_cp_intrcallback(struct kgsl_device *device)
 	adreno_regwrite(device, REG_CP_INT_ACK, status);

 	if (status & (CP_INT_CNTL__IB1_INT_MASK | CP_INT_CNTL__RB_INT_MASK)) {
-		KGSL_CMD_WARN(rb->device, "ringbuffer ib1/rb interrupt\n");
 		queue_work(device->work_queue, &device->ts_expired_ws);
-		wake_up_interruptible_all(&device->wait_queue);
+		adreno_dispatcher_schedule(device);
 	}
 }

@ -1828,13 +1891,16 @@ static unsigned int a2xx_irq_pending(struct adreno_device *adreno_dev)
 		(mh & kgsl_mmu_get_int_mask())) ? 1 : 0;
 }

-static void a2xx_rb_init(struct adreno_device *adreno_dev,
+static int a2xx_rb_init(struct adreno_device *adreno_dev,
 			struct adreno_ringbuffer *rb)
 {
 	unsigned int *cmds, cmds_gpu;

 	/* ME_INIT */
 	cmds = adreno_ringbuffer_allocspace(rb, NULL, 19);
+	if (cmds == NULL)
+		return -ENOMEM;
+
 	cmds_gpu = rb->buffer_desc.gpuaddr + sizeof(uint)*(rb->wptr-19);

 	GSL_RB_WRITE(cmds, cmds_gpu, cp_type3_packet(CP_ME_INIT, 18));
@ -1887,6 +1953,8 @@ static void a2xx_rb_init(struct adreno_device *adreno_dev,
 	GSL_RB_WRITE(cmds, cmds_gpu, 0x00000000);

 	adreno_ringbuffer_submit(rb);
+
+	return 0;
 }

 static unsigned int a2xx_busy_cycles(struct adreno_device *adreno_dev)
--- a/drivers/gpu/msm/adreno_a3xx.c
+++ b/drivers/gpu/msm/adreno_a3xx.c
@ -445,6 +445,21 @@ static void build_regconstantsave_cmds(struct adreno_device *adreno_dev,
 	tmp_ctx.cmd = cmd;
 }

+unsigned int adreno_a3xx_rbbm_clock_ctl_default(struct adreno_device
+							*adreno_dev)
+{
+	if (adreno_is_a305(adreno_dev))
+		return A305_RBBM_CLOCK_CTL_DEFAULT;
+	else if (adreno_is_a320(adreno_dev))
+		return A320_RBBM_CLOCK_CTL_DEFAULT;
+	else if (adreno_is_a330v2(adreno_dev))
+		return A330v2_RBBM_CLOCK_CTL_DEFAULT;
+	else if (adreno_is_a330(adreno_dev))
+		return A330_RBBM_CLOCK_CTL_DEFAULT;
+
+	BUG_ON(1);
+}
+
 /* Copy GMEM contents to system memory shadow. */
 static unsigned int *build_gmem2sys_cmds(struct adreno_device *adreno_dev,
 					 struct adreno_context *drawctxt,
@ -454,7 +469,7 @@ static unsigned int *build_gmem2sys_cmds(struct adreno_device *adreno_dev,
 	unsigned int *start = cmds;

 	*cmds++ = cp_type0_packet(A3XX_RBBM_CLOCK_CTL, 1);
-	*cmds++ = A3XX_RBBM_CLOCK_CTL_DEFAULT;
+	*cmds++ = adreno_a3xx_rbbm_clock_ctl_default(adreno_dev);

 	*cmds++ = cp_type3_packet(CP_SET_CONSTANT, 3);
 	*cmds++ = CP_REG(A3XX_RB_MODE_CONTROL);
@ -1250,7 +1265,7 @@ static unsigned int *build_sys2gmem_cmds(struct adreno_device *adreno_dev,
 	unsigned int *start = cmds;

 	*cmds++ = cp_type0_packet(A3XX_RBBM_CLOCK_CTL, 1);
-	*cmds++ = A3XX_RBBM_CLOCK_CTL_DEFAULT;
+	*cmds++ = adreno_a3xx_rbbm_clock_ctl_default(adreno_dev);

 	*cmds++ = cp_type3_packet(CP_SET_CONSTANT, 5);
 	*cmds++ = CP_REG(A3XX_HLSQ_CONTROL_0_REG);
@ -2302,7 +2317,7 @@ static int a3xx_create_gmem_shadow(struct adreno_device *adreno_dev,
 	tmp_ctx.gmem_base = adreno_dev->gmem_base;

 	result = kgsl_allocate(&drawctxt->context_gmem_shadow.gmemshadow,
-		drawctxt->pagetable, drawctxt->context_gmem_shadow.size);
+		drawctxt->base.pagetable, drawctxt->context_gmem_shadow.size);

 	if (result)
 		return result;
@ -2336,7 +2351,7 @@ static int a3xx_drawctxt_create(struct adreno_device *adreno_dev,
 	 */

 	ret = kgsl_allocate(&drawctxt->gpustate,
-		drawctxt->pagetable, CONTEXT_SIZE);
+		drawctxt->base.pagetable, CONTEXT_SIZE);

 	if (ret)
 		return ret;
@ -2362,32 +2377,38 @@ done:
 	return ret;
 }

-static void a3xx_drawctxt_save(struct adreno_device *adreno_dev,
+static int a3xx_drawctxt_save(struct adreno_device *adreno_dev,
 			   struct adreno_context *context)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
+	int ret;

 	if (context == NULL || (context->flags & CTXT_FLAGS_BEING_DESTROYED))
-		return;
+		return 0;

-	if (context->flags & CTXT_FLAGS_GPU_HANG)
-		KGSL_CTXT_WARN(device,
-			       "Current active context has caused gpu hang\n");
+	if (context->state == ADRENO_CONTEXT_STATE_INVALID)
+		return 0;

 	if (!(context->flags & CTXT_FLAGS_PREAMBLE)) {
 		/* Fixup self modifying IBs for save operations */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE, context->save_fixup, 3);
+		if (ret)
+			return ret;

 		/* save registers and constants. */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE,
 			context->regconstant_save, 3);
+		if (ret)
+			return ret;

 		if (context->flags & CTXT_FLAGS_SHADER_SAVE) {
 			/* Save shader instructions */
-			adreno_ringbuffer_issuecmds(device, context,
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_PMODE, context->shader_save, 3);
+			if (ret)
+				return ret;

 			context->flags |= CTXT_FLAGS_SHADER_RESTORE;
 		}
@ -2400,38 +2421,60 @@ static void a3xx_drawctxt_save(struct adreno_device *adreno_dev,
 		 * already be saved.)
 		 */

-		adreno_ringbuffer_issuecmds(device, context,
+		kgsl_cffdump_syncmem(context->base.device,
+			&context->gpustate,
+			context->context_gmem_shadow.gmem_save[1],
+			context->context_gmem_shadow.gmem_save[2] << 2, true);
+
+		ret = adreno_ringbuffer_issuecmds(device, context,
 					KGSL_CMD_FLAGS_PMODE,
 					    context->context_gmem_shadow.
 					    gmem_save, 3);
+		if (ret)
+			return ret;
+
 		context->flags |= CTXT_FLAGS_GMEM_RESTORE;
 	}
+
+	return 0;
 }

-static void a3xx_drawctxt_restore(struct adreno_device *adreno_dev,
+static int a3xx_drawctxt_restore(struct adreno_device *adreno_dev,
 			      struct adreno_context *context)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
 	unsigned int cmds[5];
+	int ret = 0;

 	if (context == NULL) {
 		/* No context - set the default pagetable and thats it */
+		unsigned int id;
+		/*
+		 * If there isn't a current context, the kgsl_mmu_setstate
+		 * will use the CPU path so we don't need to give
+		 * it a valid context id.
+		 */
+		id = (adreno_dev->drawctxt_active != NULL)
+			? adreno_dev->drawctxt_active->base.id
+			: KGSL_CONTEXT_INVALID;
 		kgsl_mmu_setstate(&device->mmu, device->mmu.defaultpagetable,
-				adreno_dev->drawctxt_active->id);
-		return;
+				  id);
+		return 0;
 	}

-	KGSL_CTXT_INFO(device, "context flags %08x\n", context->flags);
-
 	cmds[0] = cp_nop_packet(1);
 	cmds[1] = KGSL_CONTEXT_TO_MEM_IDENTIFIER;
 	cmds[2] = cp_type3_packet(CP_MEM_WRITE, 2);
 	cmds[3] = device->memstore.gpuaddr +
 		KGSL_MEMSTORE_OFFSET(KGSL_MEMSTORE_GLOBAL, current_context);
-	cmds[4] = context->id;
-	adreno_ringbuffer_issuecmds(device, context, KGSL_CMD_FLAGS_NONE,
+	cmds[4] = context->base.id;
+	ret = adreno_ringbuffer_issuecmds(device, context, KGSL_CMD_FLAGS_NONE,
 					cmds, 5);
-	kgsl_mmu_setstate(&device->mmu, context->pagetable, context->id);
+	if (ret)
+		return ret;
+
+	kgsl_mmu_setstate(&device->mmu, context->base.pagetable,
+			context->base.id);

 	/*
 	 * Restore GMEM.  (note: changes shader.
@ -2439,43 +2482,63 @@ static void a3xx_drawctxt_restore(struct adreno_device *adreno_dev,
 	 */

 	if (context->flags & CTXT_FLAGS_GMEM_RESTORE) {
-		adreno_ringbuffer_issuecmds(device, context,
+		kgsl_cffdump_syncmem(NULL,
+			&context->gpustate,
+			context->context_gmem_shadow.gmem_restore[1],
+			context->context_gmem_shadow.gmem_restore[2] << 2,
+			true);
+
+		ret = adreno_ringbuffer_issuecmds(device, context,
 					KGSL_CMD_FLAGS_PMODE,
 					    context->context_gmem_shadow.
 					    gmem_restore, 3);
+		if (ret)
+			return ret;
 		context->flags &= ~CTXT_FLAGS_GMEM_RESTORE;
 	}

 	if (!(context->flags & CTXT_FLAGS_PREAMBLE)) {
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE, context->reg_restore, 3);
+		if (ret)
+			return ret;

 		/* Fixup self modifying IBs for restore operations */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE,
 			context->restore_fixup, 3);
+		if (ret)
+			return ret;

-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE,
 			context->constant_restore, 3);
+		if (ret)
+			return ret;

 		if (context->flags & CTXT_FLAGS_SHADER_RESTORE)
-			adreno_ringbuffer_issuecmds(device, context,
+			ret = adreno_ringbuffer_issuecmds(device, context,
 				KGSL_CMD_FLAGS_NONE,
 				context->shader_restore, 3);
-
+			if (ret)
+				return ret;
 		/* Restore HLSQ_CONTROL_0 register */
-		adreno_ringbuffer_issuecmds(device, context,
+		ret = adreno_ringbuffer_issuecmds(device, context,
 			KGSL_CMD_FLAGS_NONE,
 			context->hlsqcontrol_restore, 3);
 	}
+
+	return ret;
 }

-static void a3xx_rb_init(struct adreno_device *adreno_dev,
+static int a3xx_rb_init(struct adreno_device *adreno_dev,
 			 struct adreno_ringbuffer *rb)
 {
 	unsigned int *cmds, cmds_gpu;
 	cmds = adreno_ringbuffer_allocspace(rb, NULL, 18);
+	if (cmds == NULL)
+		return -ENOMEM;
+
 	cmds_gpu = rb->buffer_desc.gpuaddr + sizeof(uint) * (rb->wptr - 18);

 	GSL_RB_WRITE(cmds, cmds_gpu, cp_type3_packet(CP_ME_INIT, 17));
@ -2499,6 +2562,8 @@ static void a3xx_rb_init(struct adreno_device *adreno_dev,
 	GSL_RB_WRITE(cmds, cmds_gpu, 0x00000000);

 	adreno_ringbuffer_submit(rb);
+
+	return 0;
 }

 static void a3xx_err_callback(struct adreno_device *adreno_dev, int bit)
@ -2525,6 +2590,9 @@ static void a3xx_err_callback(struct adreno_device *adreno_dev, int bit)

 		/* Clear the error */
 		adreno_regwrite(device, A3XX_RBBM_AHB_CMD, (1 << 3));
+
+		/* Trigger a fault in the interrupt handler */
+		adreno_dispatcher_irq_fault(device);
 		return;
 	}
 	case A3XX_INT_RBBM_REG_TIMEOUT:
@ -2566,8 +2634,13 @@ static void a3xx_err_callback(struct adreno_device *adreno_dev, int bit)
 	case A3XX_INT_UCHE_OOB_ACCESS:
 		err = "UCHE:  Out of bounds access";
 		break;
+	default:
+		return;
 	}

+	/* Trigger a fault in the dispatcher - this will effect a restart */
+	adreno_dispatcher_irq_fault(device);
+
 	KGSL_DRV_CRIT(device, "%s\n", err);
 	kgsl_pwrctrl_irq(device, KGSL_PWRFLAGS_OFF);
 }
@ -2576,11 +2649,276 @@ static void a3xx_cp_callback(struct adreno_device *adreno_dev, int irq)
 {
 	struct kgsl_device *device = &adreno_dev->dev;

-	/* Wake up everybody waiting for the interrupt */
-	wake_up_interruptible_all(&device->wait_queue);
-
-	/* Schedule work to free mem and issue ibs */
+	/* Schedule the event queue */
 	queue_work(device->work_queue, &device->ts_expired_ws);
+
+	adreno_dispatcher_schedule(device);
+}
+
+/**
+ * struct a3xx_perfcounter_register - Define a performance counter register
+ * @load_bit: the bit to set in RBBM_LOAD_CMD0/RBBM_LOAD_CMD1 to force the RBBM
+ * to load the reset value into the appropriate counter
+ * @select: The dword offset of the register to write the selected
+ * countable into
+ */
+
+struct a3xx_perfcounter_register {
+	unsigned int load_bit;
+	unsigned int select;
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_cp[] = {
+	{ 0, A3XX_CP_PERFCOUNTER_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_rbbm[] = {
+	{ 1, A3XX_RBBM_PERFCOUNTER0_SELECT },
+	{ 2, A3XX_RBBM_PERFCOUNTER1_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_pc[] = {
+	{ 3, A3XX_PC_PERFCOUNTER0_SELECT },
+	{ 4, A3XX_PC_PERFCOUNTER1_SELECT },
+	{ 5, A3XX_PC_PERFCOUNTER2_SELECT },
+	{ 6, A3XX_PC_PERFCOUNTER3_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_vfd[] = {
+	{ 7, A3XX_VFD_PERFCOUNTER0_SELECT },
+	{ 8, A3XX_VFD_PERFCOUNTER1_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_hlsq[] = {
+	{ 9, A3XX_HLSQ_PERFCOUNTER0_SELECT },
+	{ 10, A3XX_HLSQ_PERFCOUNTER1_SELECT },
+	{ 11, A3XX_HLSQ_PERFCOUNTER2_SELECT },
+	{ 12, A3XX_HLSQ_PERFCOUNTER3_SELECT },
+	{ 13, A3XX_HLSQ_PERFCOUNTER4_SELECT },
+	{ 14, A3XX_HLSQ_PERFCOUNTER5_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_vpc[] = {
+	{ 15, A3XX_VPC_PERFCOUNTER0_SELECT },
+	{ 16, A3XX_VPC_PERFCOUNTER1_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_tse[] = {
+	{ 17, A3XX_GRAS_PERFCOUNTER0_SELECT },
+	{ 18, A3XX_GRAS_PERFCOUNTER1_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_ras[] = {
+	{ 19, A3XX_GRAS_PERFCOUNTER2_SELECT },
+	{ 20, A3XX_GRAS_PERFCOUNTER3_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_uche[] = {
+	{ 21, A3XX_UCHE_PERFCOUNTER0_SELECT },
+	{ 22, A3XX_UCHE_PERFCOUNTER1_SELECT },
+	{ 23, A3XX_UCHE_PERFCOUNTER2_SELECT },
+	{ 24, A3XX_UCHE_PERFCOUNTER3_SELECT },
+	{ 25, A3XX_UCHE_PERFCOUNTER4_SELECT },
+	{ 26, A3XX_UCHE_PERFCOUNTER5_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_tp[] = {
+	{ 27, A3XX_TP_PERFCOUNTER0_SELECT },
+	{ 28, A3XX_TP_PERFCOUNTER1_SELECT },
+	{ 29, A3XX_TP_PERFCOUNTER2_SELECT },
+	{ 30, A3XX_TP_PERFCOUNTER3_SELECT },
+	{ 31, A3XX_TP_PERFCOUNTER4_SELECT },
+	{ 32, A3XX_TP_PERFCOUNTER5_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_sp[] = {
+	{ 33, A3XX_SP_PERFCOUNTER0_SELECT },
+	{ 34, A3XX_SP_PERFCOUNTER1_SELECT },
+	{ 35, A3XX_SP_PERFCOUNTER2_SELECT },
+	{ 36, A3XX_SP_PERFCOUNTER3_SELECT },
+	{ 37, A3XX_SP_PERFCOUNTER4_SELECT },
+	{ 38, A3XX_SP_PERFCOUNTER5_SELECT },
+	{ 39, A3XX_SP_PERFCOUNTER6_SELECT },
+	{ 40, A3XX_SP_PERFCOUNTER7_SELECT },
+};
+
+static struct a3xx_perfcounter_register a3xx_perfcounter_reg_rb[] = {
+	{ 41, A3XX_RB_PERFCOUNTER0_SELECT },
+	{ 42, A3XX_RB_PERFCOUNTER1_SELECT },
+};
+
+#define REGCOUNTER_GROUP(_x) { (_x), ARRAY_SIZE((_x)) }
+
+static struct {
+	struct a3xx_perfcounter_register *regs;
+	int count;
+} a3xx_perfcounter_reglist[] = {
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_cp),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_rbbm),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_pc),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_vfd),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_hlsq),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_vpc),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_tse),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_ras),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_uche),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_tp),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_sp),
+	REGCOUNTER_GROUP(a3xx_perfcounter_reg_rb),
+};
+
+static void a3xx_perfcounter_enable_pwr(struct kgsl_device *device,
+	unsigned int countable)
+{
+	unsigned int in, out;
+
+	adreno_regread(device, A3XX_RBBM_RBBM_CTL, &in);
+
+	if (countable == 0)
+		out = in | RBBM_RBBM_CTL_RESET_PWR_CTR0;
+	else
+		out = in | RBBM_RBBM_CTL_RESET_PWR_CTR1;
+
+	adreno_regwrite(device, A3XX_RBBM_RBBM_CTL, out);
+
+	if (countable == 0)
+		out = in | RBBM_RBBM_CTL_ENABLE_PWR_CTR0;
+	else
+		out = in | RBBM_RBBM_CTL_ENABLE_PWR_CTR1;
+
+	adreno_regwrite(device, A3XX_RBBM_RBBM_CTL, out);
+
+	return;
+}
+
+static void a3xx_perfcounter_enable_vbif(struct kgsl_device *device,
+					 unsigned int counter,
+					 unsigned int countable)
+{
+	unsigned int in, out, bit, sel;
+
+	if (counter > 1 || countable > 0x7f)
+		return;
+
+	adreno_regread(device, A3XX_VBIF_PERF_CNT_EN, &in);
+	adreno_regread(device, A3XX_VBIF_PERF_CNT_SEL, &sel);
+
+	if (counter == 0) {
+		bit = VBIF_PERF_CNT_0;
+		sel = (sel & ~VBIF_PERF_CNT_0_SEL_MASK) | countable;
+	} else {
+		bit = VBIF_PERF_CNT_1;
+		sel = (sel & ~VBIF_PERF_CNT_1_SEL_MASK)
+			| (countable << VBIF_PERF_CNT_1_SEL);
+	}
+
+	out = in | bit;
+
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_SEL, sel);
+
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_CLR, bit);
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_CLR, 0);
+
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_EN, out);
+}
+
+static void a3xx_perfcounter_enable_vbif_pwr(struct kgsl_device *device,
+					     unsigned int countable)
+{
+	unsigned int in, out, bit;
+
+	adreno_regread(device, A3XX_VBIF_PERF_CNT_EN, &in);
+	if (countable == 0)
+		bit = VBIF_PERF_PWR_CNT_0;
+	else if (countable == 1)
+		bit = VBIF_PERF_PWR_CNT_1;
+	else
+		bit = VBIF_PERF_PWR_CNT_2;
+
+	out = in | bit;
+
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_CLR, bit);
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_CLR, 0);
+
+	adreno_regwrite(device, A3XX_VBIF_PERF_CNT_EN, out);
+}
+
+/*
+ * a3xx_perfcounter_enable - Configure a performance counter for a countable
+ * @adreno_dev -  Adreno device to configure
+ * @group - Desired performance counter group
+ * @counter - Desired performance counter in the group
+ * @countable - Desired countable
+ *
+ * Physically set up a counter within a group with the desired countable
+ */
+
+static void a3xx_perfcounter_enable(struct adreno_device *adreno_dev,
+	unsigned int group, unsigned int counter, unsigned int countable)
+{
+	struct kgsl_device *device = &adreno_dev->dev;
+	unsigned int val = 0;
+	struct a3xx_perfcounter_register *reg;
+
+	if (group >= ARRAY_SIZE(a3xx_perfcounter_reglist))
+		return;
+
+	if (counter >= a3xx_perfcounter_reglist[group].count)
+		return;
+
+	/* Special cases */
+	if (group == KGSL_PERFCOUNTER_GROUP_PWR)
+		return a3xx_perfcounter_enable_pwr(device, countable);
+	else if (group == KGSL_PERFCOUNTER_GROUP_VBIF)
+		return a3xx_perfcounter_enable_vbif(device, counter, countable);
+	else if (group == KGSL_PERFCOUNTER_GROUP_VBIF_PWR)
+		return a3xx_perfcounter_enable_vbif_pwr(device, countable);
+
+	reg = &(a3xx_perfcounter_reglist[group].regs[counter]);
+
+	/* Select the desired perfcounter */
+	adreno_regwrite(device, reg->select, countable);
+
+	if (reg->load_bit < 32) {
+		val = 1 << reg->load_bit;
+		adreno_regwrite(device, A3XX_RBBM_PERFCTR_LOAD_CMD0, val);
+	} else {
+		val  = 1 << (reg->load_bit - 32);
+		adreno_regwrite(device, A3XX_RBBM_PERFCTR_LOAD_CMD1, val);
+	}
+}
+
+static uint64_t a3xx_perfcounter_read(struct adreno_device *adreno_dev,
+	unsigned int group, unsigned int counter,
+	unsigned int offset)
+{
+	struct kgsl_device *device = &adreno_dev->dev;
+	struct a3xx_perfcounter_register *reg = NULL;
+	unsigned int lo = 0, hi = 0;
+	unsigned int val;
+
+	if (group >= ARRAY_SIZE(a3xx_perfcounter_reglist))
+		return 0;
+
+	if (counter >= a3xx_perfcounter_reglist[group].count)
+		return 0;
+
+	reg = &(a3xx_perfcounter_reglist[group].regs[counter]);
+
+	/* Freeze the counter */
+	adreno_regread(device, A3XX_RBBM_PERFCTR_CTL, &val);
+	val &= ~reg->load_bit;
+	adreno_regwrite(device, A3XX_RBBM_PERFCTR_CTL, val);
+
+	/* Read the values */
+	adreno_regread(device, offset, &lo);
+	adreno_regread(device, offset + 1, &hi);
+
+	/* Re-Enable the counter */
+	val |= reg->load_bit;
+	adreno_regwrite(device, A3XX_RBBM_PERFCTR_CTL, val);
+
+	return (((uint64_t) hi) << 32) | lo;
 }

 #define A3XX_IRQ_CALLBACK(_c) { .func = _c }
@ -2684,26 +3022,22 @@ static unsigned int a3xx_irq_pending(struct adreno_device *adreno_dev)
 static unsigned int a3xx_busy_cycles(struct adreno_device *adreno_dev)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
-	unsigned int reg, val;
-
-	/* Freeze the counter */
-	adreno_regread(device, A3XX_RBBM_RBBM_CTL, &reg);
-	reg &= ~RBBM_RBBM_CTL_ENABLE_PWR_CTR1;
-	adreno_regwrite(device, A3XX_RBBM_RBBM_CTL, reg);
+	unsigned int val;
+	unsigned int ret = 0;

 	/* Read the value */
 	adreno_regread(device, A3XX_RBBM_PERFCTR_PWR_1_LO, &val);

-	/* Reset the counter */
-	reg |= RBBM_RBBM_CTL_RESET_PWR_CTR1;
-	adreno_regwrite(device, A3XX_RBBM_RBBM_CTL, reg);
+	/* Return 0 for the first read */
+	if (adreno_dev->gpu_cycles != 0) {
+		if (val < adreno_dev->gpu_cycles)
+			ret = (0xFFFFFFFF - adreno_dev->gpu_cycles) + val;
+		else
+			ret = val - adreno_dev->gpu_cycles;
+	}

-	/* Re-enable the counter */
-	reg &= ~RBBM_RBBM_CTL_RESET_PWR_CTR1;
-	reg |= RBBM_RBBM_CTL_ENABLE_PWR_CTR1;
-	adreno_regwrite(device, A3XX_RBBM_RBBM_CTL, reg);
-
-	return val;
+	adreno_dev->gpu_cycles = val;
+	return ret;
 }

 struct a3xx_vbif_data {
@ -2781,17 +3115,83 @@ static struct a3xx_vbif_data a330_vbif[] = {
 	{0, 0},
 };

+/*
+ * Most of the VBIF registers on 8974v2 have the correct values at power on, so
+ * we won't modify those if we don't need to
+ */
+static struct a3xx_vbif_data a330v2_vbif[] = {
+	/* Enable 1k sort */
+	{ A3XX_VBIF_ABIT_SORT, 0x0001003F },
+	{ A3XX_VBIF_ABIT_SORT_CONF, 0x000000A4 },
+	/* Enable WR-REQ */
+	{ A3XX_VBIF_GATE_OFF_WRREQ_EN, 0x00003F },
+	{ A3XX_VBIF_DDR_OUT_MAX_BURST, 0x0000303 },
+	/* Set up VBIF_ROUND_ROBIN_QOS_ARB */
+	{ A3XX_VBIF_ROUND_ROBIN_QOS_ARB, 0x0003 },
+	/* Disable VBIF clock gating. This is to enable AXI running
+	 * higher frequency than GPU.
+	 */
+	{ A3XX_VBIF_CLKON, 1 },
+	{0, 0},
+};
+
+static struct {
+	int(*devfunc)(struct adreno_device *);
+	struct a3xx_vbif_data *vbif;
+} a3xx_vbif_platforms[] = {
+	{ adreno_is_a305, a305_vbif },
+	{ adreno_is_a320, a320_vbif },
+	/* A330v2 needs to be ahead of A330 so the right device matches */
+	{ adreno_is_a330v2, a330v2_vbif },
+	{ adreno_is_a330, a330_vbif },
+};
+
+static void a3xx_perfcounter_init(struct adreno_device *adreno_dev)
+{
+	/*
+	 * Set SP to count SP_ALU_ACTIVE_CYCLES, it includes
+	 * all ALU instruction execution regardless precision or shader ID.
+	 * Set SP to count SP0_ICL1_MISSES, It counts
+	 * USP L1 instruction miss request.
+	 * Set SP to count SP_FS_FULL_ALU_INSTRUCTIONS, it
+	 * counts USP flow control instruction execution.
+	 * we will use this to augment our hang detection
+	 */
+	if (adreno_dev->fast_hang_detect) {
+		adreno_perfcounter_get(adreno_dev, KGSL_PERFCOUNTER_GROUP_SP,
+			SP_ALU_ACTIVE_CYCLES, &ft_detect_regs[6],
+			PERFCOUNTER_FLAG_KERNEL);
+		ft_detect_regs[7] = ft_detect_regs[6] + 1;
+		adreno_perfcounter_get(adreno_dev, KGSL_PERFCOUNTER_GROUP_SP,
+			SP0_ICL1_MISSES, &ft_detect_regs[8],
+			PERFCOUNTER_FLAG_KERNEL);
+		ft_detect_regs[9] = ft_detect_regs[8] + 1;
+		adreno_perfcounter_get(adreno_dev, KGSL_PERFCOUNTER_GROUP_SP,
+			SP_FS_CFLOW_INSTRUCTIONS, &ft_detect_regs[10],
+			PERFCOUNTER_FLAG_KERNEL);
+		ft_detect_regs[11] = ft_detect_regs[10] + 1;
+	}
+
+	adreno_perfcounter_get(adreno_dev, KGSL_PERFCOUNTER_GROUP_SP,
+		SP_FS_FULL_ALU_INSTRUCTIONS, NULL, PERFCOUNTER_FLAG_KERNEL);
+
+	/* Reserve and start countable 1 in the PWR perfcounter group */
+	adreno_perfcounter_get(adreno_dev, KGSL_PERFCOUNTER_GROUP_PWR, 1,
+			NULL, PERFCOUNTER_FLAG_KERNEL);
+}
+
 static void a3xx_start(struct adreno_device *adreno_dev)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
 	struct a3xx_vbif_data *vbif = NULL;
+	int i;

-	if (adreno_is_a305(adreno_dev))
-		vbif = a305_vbif;
-	else if (adreno_is_a320(adreno_dev))
-		vbif = a320_vbif;
-	else if (adreno_is_a330(adreno_dev))
-		vbif = a330_vbif;
+	for (i = 0; i < ARRAY_SIZE(a3xx_vbif_platforms); i++) {
+		if (a3xx_vbif_platforms[i].devfunc(adreno_dev)) {
+			vbif = a3xx_vbif_platforms[i].vbif;
+			break;
+		}
+	}

 	BUG_ON(vbif == NULL);

@ -2829,7 +3229,14 @@ static void a3xx_start(struct adreno_device *adreno_dev)

 	/* Enable Clock gating */
 	adreno_regwrite(device, A3XX_RBBM_CLOCK_CTL,
-			A3XX_RBBM_CLOCK_CTL_DEFAULT);
+		adreno_a3xx_rbbm_clock_ctl_default(adreno_dev));
+
+	if (adreno_is_a330v2(adreno_dev))
+		adreno_regwrite(device, A3XX_RBBM_GPR0_CTL,
+			A330v2_RBBM_GPR0_CTL_DEFAULT);
+	else if (adreno_is_a330(adreno_dev))
+		adreno_regwrite(device, A3XX_RBBM_GPR0_CTL,
+			A330_RBBM_GPR0_CTL_DEFAULT);

 	/* Set the OCMEM base address for A330 */
 	if (adreno_is_a330(adreno_dev)) {
@ -2840,25 +3247,133 @@ static void a3xx_start(struct adreno_device *adreno_dev)
 	/* Turn on performance counters */
 	adreno_regwrite(device, A3XX_RBBM_PERFCTR_CTL, 0x01);

-	/*
-	 * Set SP perfcounter 5 to count SP_ALU_ACTIVE_CYCLES, it includes
-	 * all ALU instruction execution regardless precision or shader ID.
-	 * Set SP perfcounter 6 to count SP0_ICL1_MISSES, It counts
-	 * USP L1 instruction miss request.
-	 * Set SP perfcounter 7 to count SP_FS_FULL_ALU_INSTRUCTIONS, it
-	 * counts USP flow control instruction execution.
-	 * we will use this to augment our hang detection
-	 */
-	if (adreno_dev->fast_hang_detect) {
-		adreno_regwrite(device, A3XX_SP_PERFCOUNTER5_SELECT,
-			SP_ALU_ACTIVE_CYCLES);
-		adreno_regwrite(device, A3XX_SP_PERFCOUNTER6_SELECT,
-			SP0_ICL1_MISSES);
-		adreno_regwrite(device, A3XX_SP_PERFCOUNTER7_SELECT,
-			SP_FS_CFLOW_INSTRUCTIONS);
-	}
+	/* Turn on the GPU busy counter and let it run free */
+
+	adreno_dev->gpu_cycles = 0;
 }

+/*
+ * Define the available perfcounter groups - these get used by
+ * adreno_perfcounter_get and adreno_perfcounter_put
+ */
+
+static struct adreno_perfcount_register a3xx_perfcounters_cp[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_CP_0_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_rbbm[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_RBBM_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_RBBM_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_pc[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_PC_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_PC_1_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_PC_2_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_PC_3_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_vfd[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_VFD_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_VFD_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_hlsq[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_HLSQ_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_HLSQ_1_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_HLSQ_2_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_HLSQ_3_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_HLSQ_4_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_HLSQ_5_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_vpc[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_VPC_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_VPC_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_tse[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TSE_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TSE_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_ras[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_RAS_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_RAS_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_uche[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_UCHE_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_UCHE_1_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_UCHE_2_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_UCHE_3_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_UCHE_4_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_UCHE_5_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_tp[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TP_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TP_1_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TP_2_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TP_3_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TP_4_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_TP_5_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_sp[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_1_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_2_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_3_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_4_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_5_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_6_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_SP_7_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_rb[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_RB_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_RB_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_pwr[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_PWR_0_LO, 0 },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_RBBM_PERFCTR_PWR_1_LO, 0 },
+};
+
+static struct adreno_perfcount_register a3xx_perfcounters_vbif[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_VBIF_PERF_CNT0_LO },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_VBIF_PERF_CNT1_LO },
+};
+static struct adreno_perfcount_register a3xx_perfcounters_vbif_pwr[] = {
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_VBIF_PERF_PWR_CNT0_LO },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_VBIF_PERF_PWR_CNT1_LO },
+	{ KGSL_PERFCOUNTER_NOT_USED, 0, A3XX_VBIF_PERF_PWR_CNT2_LO },
+};
+
+static struct adreno_perfcount_group a3xx_perfcounter_groups[] = {
+	{ a3xx_perfcounters_cp, ARRAY_SIZE(a3xx_perfcounters_cp) },
+	{ a3xx_perfcounters_rbbm, ARRAY_SIZE(a3xx_perfcounters_rbbm) },
+	{ a3xx_perfcounters_pc, ARRAY_SIZE(a3xx_perfcounters_pc) },
+	{ a3xx_perfcounters_vfd, ARRAY_SIZE(a3xx_perfcounters_vfd) },
+	{ a3xx_perfcounters_hlsq, ARRAY_SIZE(a3xx_perfcounters_hlsq) },
+	{ a3xx_perfcounters_vpc, ARRAY_SIZE(a3xx_perfcounters_vpc) },
+	{ a3xx_perfcounters_tse, ARRAY_SIZE(a3xx_perfcounters_tse) },
+	{ a3xx_perfcounters_ras, ARRAY_SIZE(a3xx_perfcounters_ras) },
+	{ a3xx_perfcounters_uche, ARRAY_SIZE(a3xx_perfcounters_uche) },
+	{ a3xx_perfcounters_tp, ARRAY_SIZE(a3xx_perfcounters_tp) },
+	{ a3xx_perfcounters_sp, ARRAY_SIZE(a3xx_perfcounters_sp) },
+	{ a3xx_perfcounters_rb, ARRAY_SIZE(a3xx_perfcounters_rb) },
+	{ a3xx_perfcounters_pwr, ARRAY_SIZE(a3xx_perfcounters_pwr) },
+	{ a3xx_perfcounters_vbif, ARRAY_SIZE(a3xx_perfcounters_vbif) },
+	{ a3xx_perfcounters_vbif_pwr, ARRAY_SIZE(a3xx_perfcounters_vbif_pwr) },
+};
+
+static struct adreno_perfcounters a3xx_perfcounters = {
+	a3xx_perfcounter_groups,
+	ARRAY_SIZE(a3xx_perfcounter_groups),
+};
+
 /* Defined in adreno_a3xx_snapshot.c */
 void *a3xx_snapshot(struct adreno_device *adreno_dev, void *snapshot,
 	int *remain, int hang);
@ -2867,16 +3382,20 @@ struct adreno_gpudev adreno_a3xx_gpudev = {
 	.reg_rbbm_status = A3XX_RBBM_STATUS,
 	.reg_cp_pfp_ucode_addr = A3XX_CP_PFP_UCODE_ADDR,
 	.reg_cp_pfp_ucode_data = A3XX_CP_PFP_UCODE_DATA,
+	.perfcounters = &a3xx_perfcounters,

 	.ctxt_create = a3xx_drawctxt_create,
 	.ctxt_save = a3xx_drawctxt_save,
 	.ctxt_restore = a3xx_drawctxt_restore,
 	.ctxt_draw_workaround = NULL,
 	.rb_init = a3xx_rb_init,
+	.perfcounter_init = a3xx_perfcounter_init,
 	.irq_control = a3xx_irq_control,
 	.irq_handler = a3xx_irq_handler,
 	.irq_pending = a3xx_irq_pending,
 	.busy_cycles = a3xx_busy_cycles,
 	.start = a3xx_start,
 	.snapshot = a3xx_snapshot,
+	.perfcounter_enable = a3xx_perfcounter_enable,
+	.perfcounter_read = a3xx_perfcounter_read,
 };
--- a/drivers/gpu/msm/adreno_a3xx_snapshot.c
+++ b/drivers/gpu/msm/adreno_a3xx_snapshot.c
@ -21,6 +21,22 @@

 #define SHADER_MEMORY_SIZE 0x4000

+/**
+ * _rbbm_debug_bus_read - Helper function to read data from the RBBM
+ * debug bus.
+ * @device - GPU device to read/write registers
+ * @block_id - Debug bus block to read from
+ * @index - Index in the debug bus block to read
+ * @ret - Value of the register read
+ */
+static void _rbbm_debug_bus_read(struct kgsl_device *device,
+	unsigned int block_id, unsigned int index, unsigned int *val)
+{
+	unsigned int block = (block_id << 8) | 1 << 16;
+	adreno_regwrite(device, A3XX_RBBM_DEBUG_BUS_CTL, block | index);
+	adreno_regread(device, A3XX_RBBM_DEBUG_BUS_DATA_STATUS, val);
+}
+
 static int a3xx_snapshot_shader_memory(struct kgsl_device *device,
 	void *snapshot, int remain, void *priv)
 {
@ -243,11 +259,8 @@ static int a3xx_snapshot_debugbus_block(struct kgsl_device *device,
 	header->id = id;
 	header->count = DEBUGFS_BLOCK_SIZE;

-	for (i = 0; i < DEBUGFS_BLOCK_SIZE; i++) {
-		adreno_regwrite(device, A3XX_RBBM_DEBUG_BUS_CTL, val | i);
-		adreno_regread(device, A3XX_RBBM_DEBUG_BUS_DATA_STATUS,
-			&data[i]);
-	}
+	for (i = 0; i < DEBUGFS_BLOCK_SIZE; i++)
+		_rbbm_debug_bus_read(device, id, i, &data[i]);

 	return size;
 }
@ -309,18 +322,58 @@ static void _snapshot_hlsq_regs(struct kgsl_snapshot_registers *regs,
 	struct kgsl_snapshot_registers_list *list,
 	struct adreno_device *adreno_dev)
 {
-	/* HLSQ specific registers */
+	struct kgsl_device *device = &adreno_dev->dev;
+
 	/*
-	 * Don't dump any a3xx HLSQ registers just yet.  Reading the HLSQ
-	 * registers can cause the device to hang if the HLSQ block is
-	 * busy.  Add specific checks for each a3xx core as the requirements
-	 * are discovered.  Disable by default for now.
+	 * Trying to read HLSQ registers when the HLSQ block is busy
+	 * will cause the device to hang.  The RBBM_DEBUG_BUS has information
+	 * that will tell us if the HLSQ block is busy or not.  Read values
+	 * from the debug bus to ensure the HLSQ block is not busy (this
+	 * is hardware dependent).  If the HLSQ block is busy do not
+	 * dump the registers, otherwise dump the HLSQ registers.
 	 */
-	if (!adreno_is_a3xx(adreno_dev)) {
-		regs[list->count].regs = (unsigned int *) a3xx_hlsq_registers;
-		regs[list->count].count = a3xx_hlsq_registers_count;
-		list->count++;
+
+	if (adreno_is_a330(adreno_dev)) {
+		/*
+		 * stall_ctxt_full status bit: RBBM_BLOCK_ID_HLSQ index 49 [27]
+		 *
+		 * if (!stall_context_full)
+		 * then dump HLSQ registers
+		 */
+		unsigned int stall_context_full = 0;
+
+		_rbbm_debug_bus_read(device, RBBM_BLOCK_ID_HLSQ, 49,
+				&stall_context_full);
+		stall_context_full &= 0x08000000;
+
+		if (stall_context_full)
+			return;
+	} else {
+		/*
+		 * tpif status bits: RBBM_BLOCK_ID_HLSQ index 4 [4:0]
+		 * spif status bits: RBBM_BLOCK_ID_HLSQ index 7 [5:0]
+		 *
+		 * if ((tpif == 0, 1, 28) && (spif == 0, 1, 10))
+		 * then dump HLSQ registers
+		 */
+		unsigned int next_pif = 0;
+
+		/* check tpif */
+		_rbbm_debug_bus_read(device, RBBM_BLOCK_ID_HLSQ, 4, &next_pif);
+		next_pif &= 0x1f;
+		if (next_pif != 0 && next_pif != 1 && next_pif != 28)
+			return;
+
+		/* check spif */
+		_rbbm_debug_bus_read(device, RBBM_BLOCK_ID_HLSQ, 7, &next_pif);
+		next_pif &= 0x3f;
+		if (next_pif != 0 && next_pif != 1 && next_pif != 10)
+			return;
 	}
+
+	regs[list->count].regs = (unsigned int *) a3xx_hlsq_registers;
+	regs[list->count].count = a3xx_hlsq_registers_count;
+	list->count++;
 }

 static void _snapshot_a330_regs(struct kgsl_snapshot_registers *regs,
@ -414,7 +467,7 @@ void *a3xx_snapshot(struct adreno_device *adreno_dev, void *snapshot,

 	/* Enable Clock gating */
 	adreno_regwrite(device, A3XX_RBBM_CLOCK_CTL,
-			A3XX_RBBM_CLOCK_CTL_DEFAULT);
+		adreno_a3xx_rbbm_clock_ctl_default(adreno_dev));

 	return snapshot;
 }
--- a/drivers/gpu/msm/adreno_debugfs.c
+++ b/drivers/gpu/msm/adreno_debugfs.c
@ -43,6 +43,17 @@ static int kgsl_cff_dump_enable_get(void *data, u64 *val)
 DEFINE_SIMPLE_ATTRIBUTE(kgsl_cff_dump_enable_fops, kgsl_cff_dump_enable_get,
 			kgsl_cff_dump_enable_set, "%llu\n");

+static int _active_count_get(void *data, u64 *val)
+{
+	struct kgsl_device *device = data;
+	unsigned int i = atomic_read(&device->active_cnt);
+
+	*val = (u64) i;
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(_active_count_fops, _active_count_get, NULL, "%llu\n");
+
 typedef void (*reg_read_init_t)(struct kgsl_device *device);
 typedef void (*reg_read_fill_t)(struct kgsl_device *device, int i,
 	unsigned int *vals, int linec);
@ -64,23 +75,18 @@ void adreno_debugfs_init(struct kgsl_device *device)
 	adreno_dev->fast_hang_detect = 1;
 	debugfs_create_u32("fast_hang_detect", 0644, device->d_debugfs,
 			   &adreno_dev->fast_hang_detect);
-
-	/* Top level switch to enable/disable userspace FT control */
-	adreno_dev->ft_user_control = 0;
-	debugfs_create_u32("ft_user_control", 0644, device->d_debugfs,
-			   &adreno_dev->ft_user_control);
 	/*
 	 * FT policy can be set to any of the options below.
-	 * KGSL_FT_DISABLE -> BIT(0) Set to disable FT
+	 * KGSL_FT_OFF -> BIT(0) Set to turn off FT
 	 * KGSL_FT_REPLAY  -> BIT(1) Set to enable replay
 	 * KGSL_FT_SKIPIB  -> BIT(2) Set to skip IB
 	 * KGSL_FT_SKIPFRAME -> BIT(3) Set to skip frame
+	 * KGSL_FT_DISABLE -> BIT(4) Set to disable FT for faulting context
 	 * by default set FT policy to KGSL_FT_DEFAULT_POLICY
 	 */
 	adreno_dev->ft_policy = KGSL_FT_DEFAULT_POLICY;
 	debugfs_create_u32("ft_policy", 0644, device->d_debugfs,
 			   &adreno_dev->ft_policy);
-
 	/* By default enable long IB detection */
 	adreno_dev->long_ib_detect = 1;
 	debugfs_create_u32("long_ib_detect", 0644, device->d_debugfs,
@ -96,7 +102,10 @@ void adreno_debugfs_init(struct kgsl_device *device)
 	 * KGSL_FT_PAGEFAULT_LOG_ONE_PER_INT -> BIT(3) Set to log only one
 	 * pagefault per INT.
 	 */
-	adreno_dev->ft_pf_policy = KGSL_FT_PAGEFAULT_DEFAULT_POLICY;
-	debugfs_create_u32("ft_pagefault_policy", 0644, device->d_debugfs,
-			   &adreno_dev->ft_pf_policy);
+	 adreno_dev->ft_pf_policy = KGSL_FT_PAGEFAULT_DEFAULT_POLICY;
+	 debugfs_create_u32("ft_pagefault_policy", 0644, device->d_debugfs,
+			&adreno_dev->ft_pf_policy);
+
+	debugfs_create_file("active_cnt", 0644, device->d_debugfs, device,
+			    &_active_count_fops);
 }
--- a/drivers/gpu/msm/adreno_dispatch.c
+++ b/drivers/gpu/msm/adreno_dispatch.c
--- a/drivers/gpu/msm/adreno_drawctxt.c
+++ b/drivers/gpu/msm/adreno_drawctxt.c
@ -13,10 +13,12 @@

 #include <linux/slab.h>
 #include <linux/msm_kgsl.h>
+#include <linux/sched.h>

 #include "kgsl.h"
 #include "kgsl_sharedmem.h"
 #include "adreno.h"
+#include "adreno_trace.h"

 #define KGSL_INIT_REFTIMESTAMP		0x7FFFFFFF

@ -132,6 +134,245 @@ void build_quad_vtxbuff(struct adreno_context *drawctxt,
 	*incmd = cmd;
 }

+static void wait_callback(struct kgsl_device *device, void *priv, u32 id,
+		u32 timestamp, u32 type)
+{
+	struct adreno_context *drawctxt = priv;
+	wake_up_interruptible_all(&drawctxt->waiting);
+}
+
+#define adreno_wait_event_interruptible_timeout(wq, condition, timeout, io)   \
+({                                                                            \
+	long __ret = timeout;                                                 \
+	if (io)                                                               \
+		__wait_io_event_interruptible_timeout(wq, condition, __ret);  \
+	else                                                                  \
+		__wait_event_interruptible_timeout(wq, condition, __ret);     \
+	__ret;                                                                \
+})
+
+#define adreno_wait_event_interruptible(wq, condition, io)                    \
+({                                                                            \
+	long __ret;                                                           \
+	if (io)                                                               \
+		__wait_io_event_interruptible(wq, condition, __ret);          \
+	else                                                                  \
+		__wait_event_interruptible(wq, condition, __ret);             \
+	__ret;                                                                \
+})
+
+static int _check_context_timestamp(struct kgsl_device *device,
+		struct adreno_context *drawctxt, unsigned int timestamp)
+{
+	int ret = 0;
+
+	/* Bail if the drawctxt has been invalidated or destroyed */
+	if (kgsl_context_detached(&drawctxt->base) ||
+		drawctxt->state != ADRENO_CONTEXT_STATE_ACTIVE)
+		return 1;
+
+	mutex_lock(&device->mutex);
+	ret = kgsl_check_timestamp(device, &drawctxt->base, timestamp);
+	mutex_unlock(&device->mutex);
+
+	return ret;
+}
+
+/**
+ * adreno_drawctxt_wait() - sleep until a timestamp expires
+ * @adreno_dev: pointer to the adreno_device struct
+ * @drawctxt: Pointer to the draw context to sleep for
+ * @timetamp: Timestamp to wait on
+ * @timeout: Number of jiffies to wait (0 for infinite)
+ *
+ * Register an event to wait for a timestamp on a context and sleep until it
+ * has past.  Returns < 0 on error, -ETIMEDOUT if the timeout expires or 0
+ * on success
+ */
+int adreno_drawctxt_wait(struct adreno_device *adreno_dev,
+		struct kgsl_context *context,
+		uint32_t timestamp, unsigned int timeout)
+{
+	static unsigned int io_cnt;
+	struct kgsl_device *device = &adreno_dev->dev;
+	struct kgsl_pwrctrl *pwr = &device->pwrctrl;
+	struct adreno_context *drawctxt = ADRENO_CONTEXT(context);
+	int ret, io;
+
+	if (kgsl_context_detached(context))
+		return -EINVAL;
+
+	if (drawctxt->state == ADRENO_CONTEXT_STATE_INVALID)
+		return -EDEADLK;
+
+	/* Needs to hold the device mutex */
+	BUG_ON(!mutex_is_locked(&device->mutex));
+
+	trace_adreno_drawctxt_wait_start(context->id, timestamp);
+
+	ret = kgsl_add_event(device, context->id, timestamp,
+		wait_callback, drawctxt, NULL);
+	if (ret)
+		goto done;
+
+	/*
+	 * For proper power accounting sometimes we need to call
+	 * io_wait_interruptible_timeout and sometimes we need to call
+	 * plain old wait_interruptible_timeout. We call the regular
+	 * timeout N times out of 100, where N is a number specified by
+	 * the current power level
+	 */
+
+	io_cnt = (io_cnt + 1) % 100;
+	io = (io_cnt < pwr->pwrlevels[pwr->active_pwrlevel].io_fraction)
+		? 0 : 1;
+
+	mutex_unlock(&device->mutex);
+
+	if (timeout) {
+		ret = (int) adreno_wait_event_interruptible_timeout(
+			drawctxt->waiting,
+			_check_context_timestamp(device, drawctxt, timestamp),
+			msecs_to_jiffies(timeout), io);
+
+		if (ret == 0)
+			ret = -ETIMEDOUT;
+		else if (ret > 0)
+			ret = 0;
+	} else {
+		ret = (int) adreno_wait_event_interruptible(drawctxt->waiting,
+			_check_context_timestamp(device, drawctxt, timestamp),
+				io);
+	}
+
+	mutex_lock(&device->mutex);
+
+	/* -EDEADLK if the context was invalidated while we were waiting */
+	if (drawctxt->state == ADRENO_CONTEXT_STATE_INVALID)
+		ret = -EDEADLK;
+
+
+	/* Return -EINVAL if the context was detached while we were waiting */
+	if (kgsl_context_detached(context))
+		ret = -EINVAL;
+
+done:
+	trace_adreno_drawctxt_wait_done(context->id, timestamp, ret);
+	return ret;
+}
+
+static void global_wait_callback(struct kgsl_device *device, void *priv, u32 id,
+		u32 timestamp, u32 type)
+{
+	struct adreno_context *drawctxt = priv;
+
+	wake_up_interruptible_all(&drawctxt->waiting);
+	kgsl_context_put(&drawctxt->base);
+}
+
+static int _check_global_timestamp(struct kgsl_device *device,
+		unsigned int timestamp)
+{
+	int ret;
+
+	mutex_lock(&device->mutex);
+	ret = kgsl_check_timestamp(device, NULL, timestamp);
+	mutex_unlock(&device->mutex);
+
+	return ret;
+}
+
+int adreno_drawctxt_wait_global(struct adreno_device *adreno_dev,
+		struct kgsl_context *context,
+		uint32_t timestamp, unsigned int timeout)
+{
+	struct kgsl_device *device = &adreno_dev->dev;
+	struct adreno_context *drawctxt = ADRENO_CONTEXT(context);
+	int ret;
+
+	/* Needs to hold the device mutex */
+	BUG_ON(!mutex_is_locked(&device->mutex));
+
+	_kgsl_context_get(context);
+
+	trace_adreno_drawctxt_wait_start(KGSL_MEMSTORE_GLOBAL, timestamp);
+
+	ret = kgsl_add_event(device, KGSL_MEMSTORE_GLOBAL, timestamp,
+		global_wait_callback, drawctxt, NULL);
+	if (ret) {
+		kgsl_context_put(context);
+		goto done;
+	}
+
+	mutex_unlock(&device->mutex);
+
+	if (timeout) {
+		ret = (int) wait_event_interruptible_timeout(drawctxt->waiting,
+			_check_global_timestamp(device, timestamp),
+			msecs_to_jiffies(timeout));
+
+		if (ret == 0)
+			ret = -ETIMEDOUT;
+		else if (ret > 0)
+			ret = 0;
+	} else {
+		ret = (int) wait_event_interruptible(drawctxt->waiting,
+			_check_global_timestamp(device, timestamp));
+	}
+
+	mutex_lock(&device->mutex);
+
+	if (ret)
+		kgsl_cancel_events_timestamp(device, NULL, timestamp);
+
+done:
+	trace_adreno_drawctxt_wait_done(KGSL_MEMSTORE_GLOBAL, timestamp, ret);
+	return ret;
+}
+
+/**
+ * adreno_drawctxt_invalidate() - Invalidate an adreno draw context
+ * @device: Pointer to the KGSL device structure for the GPU
+ * @context: Pointer to the KGSL context structure
+ *
+ * Invalidate the context and remove all queued commands and cancel any pending
+ * waiters
+ */
+void adreno_drawctxt_invalidate(struct kgsl_device *device,
+		struct kgsl_context *context)
+{
+	struct adreno_context *drawctxt = ADRENO_CONTEXT(context);
+
+	drawctxt->state = ADRENO_CONTEXT_STATE_INVALID;
+
+	/* Clear the pending queue */
+	mutex_lock(&drawctxt->mutex);
+
+	while (drawctxt->cmdqueue_head != drawctxt->cmdqueue_tail) {
+		struct kgsl_cmdbatch *cmdbatch =
+			drawctxt->cmdqueue[drawctxt->cmdqueue_head];
+
+		drawctxt->cmdqueue_head = (drawctxt->cmdqueue_head + 1) %
+			ADRENO_CONTEXT_CMDQUEUE_SIZE;
+
+		mutex_unlock(&drawctxt->mutex);
+
+		mutex_lock(&device->mutex);
+		kgsl_cancel_events_timestamp(device, context,
+			cmdbatch->timestamp);
+		mutex_unlock(&device->mutex);
+
+		kgsl_cmdbatch_destroy(cmdbatch);
+		mutex_lock(&drawctxt->mutex);
+	}
+
+	mutex_unlock(&drawctxt->mutex);
+
+	/* Give the bad news to everybody waiting around */
+	wake_up_interruptible_all(&drawctxt->waiting);
+	wake_up_interruptible_all(&drawctxt->wq);
+}
+
 /**
 * adreno_drawctxt_create - create a new adreno draw context
 * @device - KGSL device to create the context on
@ -142,48 +383,60 @@ void build_quad_vtxbuff(struct adreno_context *drawctxt,
 * Create a new draw context for the 3D core.  Return 0 on success,
 * or error code on failure.
 */
-int adreno_drawctxt_create(struct kgsl_device *device,
-			struct kgsl_pagetable *pagetable,
-			struct kgsl_context *context, uint32_t *flags)
+	struct kgsl_context *
+adreno_drawctxt_create(struct kgsl_device_private *dev_priv,
+		uint32_t *flags)
 {
 	struct adreno_context *drawctxt;
+	struct kgsl_device *device = dev_priv->device;
 	struct adreno_device *adreno_dev = ADRENO_DEVICE(device);
 	int ret;

 	drawctxt = kzalloc(sizeof(struct adreno_context), GFP_KERNEL);

 	if (drawctxt == NULL)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
+
+	ret = kgsl_context_init(dev_priv, &drawctxt->base);
+	if (ret != 0) {
+		kfree(drawctxt);
+		return ERR_PTR(ret);
+	}

-	drawctxt->pid = task_pid_nr(current);
-	strlcpy(drawctxt->pid_name, current->comm, TASK_COMM_LEN);
-	drawctxt->pagetable = pagetable;
 	drawctxt->bin_base_offset = 0;
-	drawctxt->id = context->id;
 	drawctxt->timestamp = 0;

 	*flags &= (KGSL_CONTEXT_PREAMBLE |
 		KGSL_CONTEXT_NO_GMEM_ALLOC |
 		KGSL_CONTEXT_PER_CONTEXT_TS |
 		KGSL_CONTEXT_USER_GENERATED_TS |
+		KGSL_CONTEXT_NO_FAULT_TOLERANCE |
 		KGSL_CONTEXT_TYPE_MASK);

+	/* Always enable per-context timestamps */
+	*flags |= KGSL_CONTEXT_PER_CONTEXT_TS;
+	drawctxt->flags |= CTXT_FLAGS_PER_CONTEXT_TS;
+
 	if (*flags & KGSL_CONTEXT_PREAMBLE)
 		drawctxt->flags |= CTXT_FLAGS_PREAMBLE;

 	if (*flags & KGSL_CONTEXT_NO_GMEM_ALLOC)
 		drawctxt->flags |= CTXT_FLAGS_NOGMEMALLOC;

-	if (*flags & KGSL_CONTEXT_PER_CONTEXT_TS)
-		drawctxt->flags |= CTXT_FLAGS_PER_CONTEXT_TS;
-
-	if (*flags & KGSL_CONTEXT_USER_GENERATED_TS) {
-		if (!(*flags & KGSL_CONTEXT_PER_CONTEXT_TS)) {
-			ret = -EINVAL;
-			goto err;
-		}
+	if (*flags & KGSL_CONTEXT_USER_GENERATED_TS)
 		drawctxt->flags |= CTXT_FLAGS_USER_GENERATED_TS;
-	}
+
+	mutex_init(&drawctxt->mutex);
+	init_waitqueue_head(&drawctxt->wq);
+	init_waitqueue_head(&drawctxt->waiting);
+
+	/*
+	 * Set up the plist node for the dispatcher.  For now all contexts have
+	 * the same priority, but later the priority will be set at create time
+	 * by the user
+	 */
+
+	plist_node_init(&drawctxt->pending, ADRENO_CONTEXT_DEFAULT_PRIORITY);

 	if (*flags & KGSL_CONTEXT_NO_FAULT_TOLERANCE)
 		drawctxt->flags |= CTXT_FLAGS_NO_FAULT_TOLERANCE;
@ -196,43 +449,52 @@ int adreno_drawctxt_create(struct kgsl_device *device,
 		goto err;

 	kgsl_sharedmem_writel(&device->memstore,
-			KGSL_MEMSTORE_OFFSET(drawctxt->id, ref_wait_ts),
-			KGSL_INIT_REFTIMESTAMP);
+			KGSL_MEMSTORE_OFFSET(drawctxt->base.id, soptimestamp),
+			0);
 	kgsl_sharedmem_writel(&device->memstore,
-			KGSL_MEMSTORE_OFFSET(drawctxt->id, ts_cmp_enable), 0);
-	kgsl_sharedmem_writel(&device->memstore,
-			KGSL_MEMSTORE_OFFSET(drawctxt->id, soptimestamp), 0);
-	kgsl_sharedmem_writel(&device->memstore,
-			KGSL_MEMSTORE_OFFSET(drawctxt->id, eoptimestamp), 0);
+			KGSL_MEMSTORE_OFFSET(drawctxt->base.id, eoptimestamp),
+			0);

-	context->devctxt = drawctxt;
-	return 0;
+	return &drawctxt->base;
 err:
-	kfree(drawctxt);
-	return ret;
+	kgsl_context_put(&drawctxt->base);
+	return ERR_PTR(ret);
 }

 /**
- * adreno_drawctxt_destroy - destroy a draw context
- * @device - KGSL device that owns the context
- * @context- Generic KGSL context container for the context
+ * adreno_drawctxt_sched() - Schedule a previously blocked context
+ * @device: pointer to a KGSL device
+ * @drawctxt: drawctxt to rechedule
 *
- * Destroy an existing context.  Return 0 on success or error
- * code on failure.
+ * This function is called by the core when it knows that a previously blocked
+ * context has been unblocked.  The default adreno response is to reschedule the
+ * context on the dispatcher
 */
-
-/* destroy a drawing context */
-
-void adreno_drawctxt_destroy(struct kgsl_device *device,
-			  struct kgsl_context *context)
+void adreno_drawctxt_sched(struct kgsl_device *device,
+		struct kgsl_context *context)
 {
-	struct adreno_device *adreno_dev = ADRENO_DEVICE(device);
+	adreno_dispatcher_queue_context(device, ADRENO_CONTEXT(context));
+}
+
+/**
+ * adreno_drawctxt_detach(): detach a context from the GPU
+ * @context: Generic KGSL context container for the context
+ *
+ */
+int adreno_drawctxt_detach(struct kgsl_context *context)
+{
+	struct kgsl_device *device;
+	struct adreno_device *adreno_dev;
 	struct adreno_context *drawctxt;
+	int ret;

-	if (context == NULL || context->devctxt == NULL)
-		return;
+	if (context == NULL)
+		return 0;
+
+	device = context->device;
+	adreno_dev = ADRENO_DEVICE(device);
+	drawctxt = ADRENO_CONTEXT(context);

-	drawctxt = context->devctxt;
 	/* deactivate context */
 	if (adreno_dev->drawctxt_active == drawctxt) {
 		/* no need to save GMEM or shader, the context is
@ -248,18 +510,48 @@ void adreno_drawctxt_destroy(struct kgsl_device *device,
 		adreno_drawctxt_switch(adreno_dev, NULL, 0);
 	}

-	if (device->state != KGSL_STATE_HUNG)
-		adreno_idle(device);
+	mutex_lock(&drawctxt->mutex);

-	if (adreno_is_a20x(adreno_dev) && adreno_dev->drawctxt_active)
-		kgsl_setstate(&device->mmu, adreno_dev->drawctxt_active->id,
-			KGSL_MMUFLAGS_PTUPDATE);
+	while (drawctxt->cmdqueue_head != drawctxt->cmdqueue_tail) {
+		struct kgsl_cmdbatch *cmdbatch =
+			drawctxt->cmdqueue[drawctxt->cmdqueue_head];
+
+		drawctxt->cmdqueue_head = (drawctxt->cmdqueue_head + 1) %
+			ADRENO_CONTEXT_CMDQUEUE_SIZE;
+
+		mutex_unlock(&drawctxt->mutex);
+
+		/*
+		 * Don't hold the drawctxt mutex while the cmdbatch is being
+		 * destroyed because the cmdbatch destroy takes the device
+		 * mutex and the world falls in on itself
+		 */
+
+		kgsl_cmdbatch_destroy(cmdbatch);
+		mutex_lock(&drawctxt->mutex);
+	}
+
+	mutex_unlock(&drawctxt->mutex);
+
+	/* Wait for the last global timestamp to pass before continuing */
+	ret = adreno_drawctxt_wait_global(adreno_dev, context,
+		drawctxt->internal_timestamp, 10 * 1000);

 	kgsl_sharedmem_free(&drawctxt->gpustate);
 	kgsl_sharedmem_free(&drawctxt->context_gmem_shadow.gmemshadow);

+	return ret;
+}
+
+
+void adreno_drawctxt_destroy(struct kgsl_context *context)
+{
+	struct adreno_context *drawctxt;
+	if (context == NULL)
+		return;
+
+	drawctxt = ADRENO_CONTEXT(context);
 	kfree(drawctxt);
-	context->devctxt = NULL;
 }

 /**
@ -275,10 +567,12 @@ void adreno_drawctxt_set_bin_base_offset(struct kgsl_device *device,
 				      struct kgsl_context *context,
 				      unsigned int offset)
 {
-	struct adreno_context *drawctxt = context->devctxt;
+	struct adreno_context *drawctxt;

-	if (drawctxt)
-		drawctxt->bin_base_offset = offset;
+	if (context == NULL)
+		return;
+	drawctxt = ADRENO_CONTEXT(context);
+	drawctxt->bin_base_offset = offset;
 }

 /**
@ -290,11 +584,12 @@ void adreno_drawctxt_set_bin_base_offset(struct kgsl_device *device,
 * Switch the current draw context
 */

-void adreno_drawctxt_switch(struct adreno_device *adreno_dev,
+int adreno_drawctxt_switch(struct adreno_device *adreno_dev,
 				struct adreno_context *drawctxt,
 				unsigned int flags)
 {
 	struct kgsl_device *device = &adreno_dev->dev;
+	int ret = 0;

 	if (drawctxt) {
 		if (flags & KGSL_CONTEXT_SAVE_GMEM)
@ -310,18 +605,44 @@ void adreno_drawctxt_switch(struct adreno_device *adreno_dev,
 	if (adreno_dev->drawctxt_active == drawctxt) {
 		if (adreno_dev->gpudev->ctxt_draw_workaround &&
 			adreno_is_a225(adreno_dev))
-				adreno_dev->gpudev->ctxt_draw_workaround(
+				ret = adreno_dev->gpudev->ctxt_draw_workaround(
 					adreno_dev, drawctxt);
-		return;
+		return ret;
 	}

-	KGSL_CTXT_INFO(device, "from %p to %p flags %d\n",
-			adreno_dev->drawctxt_active, drawctxt, flags);
+	KGSL_CTXT_INFO(device, "from %d to %d flags %d\n",
+		adreno_dev->drawctxt_active ?
+		adreno_dev->drawctxt_active->base.id : 0,
+		drawctxt ? drawctxt->base.id : 0, flags);

 	/* Save the old context */
-	adreno_dev->gpudev->ctxt_save(adreno_dev, adreno_dev->drawctxt_active);
+	ret = adreno_dev->gpudev->ctxt_save(adreno_dev,
+		adreno_dev->drawctxt_active);
+
+	if (ret) {
+		KGSL_DRV_ERR(device,
+			"Error in GPU context %d save: %d\n",
+			adreno_dev->drawctxt_active->base.id, ret);
+		return ret;
+	}
+
+	/* Put the old instance of the active drawctxt */
+	if (adreno_dev->drawctxt_active)
+		kgsl_context_put(&adreno_dev->drawctxt_active->base);
+
+	/* Get a refcount to the new instance */
+	if (drawctxt)
+		_kgsl_context_get(&drawctxt->base);

 	/* Set the new context */
-	adreno_dev->gpudev->ctxt_restore(adreno_dev, drawctxt);
+	ret = adreno_dev->gpudev->ctxt_restore(adreno_dev, drawctxt);
+	if (ret) {
+		KGSL_DRV_ERR(device,
+			"Error in GPU context %d restore: %d\n",
+			drawctxt->base.id, ret);
+		return ret;
+	}
+
 	adreno_dev->drawctxt_active = drawctxt;
+	return 0;
 }
--- a/drivers/gpu/msm/adreno_drawctxt.h
+++ b/drivers/gpu/msm/adreno_drawctxt.h
@ -13,8 +13,6 @@
 #ifndef __ADRENO_DRAWCTXT_H
 #define __ADRENO_DRAWCTXT_H

-#include <linux/sched.h>
-
 #include "adreno_pm4types.h"
 #include "a2xx_reg.h"

@ -56,6 +54,8 @@
 #define CTXT_FLAGS_SKIP_EOF             BIT(15)
 /* Context no fault tolerance */
 #define CTXT_FLAGS_NO_FAULT_TOLERANCE  BIT(16)
+/* Force the preamble for the next submission */
+#define CTXT_FLAGS_FORCE_PREAMBLE      BIT(17)

 /* Symbolic table for the adreno draw context type */
 #define ADRENO_DRAWCTXT_TYPES \
@ -65,6 +65,13 @@
 	{ KGSL_CONTEXT_TYPE_C2D, "C2D" }, \
 	{ KGSL_CONTEXT_TYPE_RS, "RS" }

+#define ADRENO_CONTEXT_CMDQUEUE_SIZE 128
+
+#define ADRENO_CONTEXT_DEFAULT_PRIORITY 1
+
+#define ADRENO_CONTEXT_STATE_ACTIVE 0
+#define ADRENO_CONTEXT_STATE_INVALID 1
+
 struct kgsl_device;
 struct adreno_device;
 struct kgsl_device_private;
@ -95,21 +102,58 @@ struct gmem_shadow_t {
 	struct kgsl_memdesc quad_vertices_restore;
 };

+/**
+ * struct adreno_context - Adreno GPU draw context
+ * @id: Unique integer ID of the context
+ * @timestamp: Last issued context-specific timestamp
+ * @internal_timestamp: Global timestamp of the last issued command
+ * @state: Current state of the context
+ * @flags: Bitfield controlling behavior of the context
+ * @type: Context type (GL, CL, RS)
+ * @mutex: Mutex to protect the cmdqueue
+ * @pagetable: Pointer to the GPU pagetable for the context
+ * @gpustate: Pointer to the GPU scratch memory for context save/restore
+ * @reg_restore: Command buffer for restoring context registers
+ * @shader_save: Command buffer for saving shaders
+ * @shader_restore: Command buffer to restore shaders
+ * @context_gmem_shadow: GMEM shadow structure for save/restore
+ * @reg_save: A2XX command buffer to save context registers
+ * @shader_fixup: A2XX command buffer to "fix" shaders on restore
+ * @chicken_restore: A2XX command buffer to "fix" register restore
+ * @bin_base_offset: Saved value of the A2XX BIN_BASE_OFFSET register
+ * @regconstant_save: A3XX command buffer to save some registers
+ * @constant_retore: A3XX command buffer to restore some registers
+ * @hslqcontrol_restore: A3XX command buffer to restore HSLSQ registers
+ * @save_fixup: A3XX command buffer to "fix" register save
+ * @restore_fixup: A3XX cmmand buffer to restore register save fixes
+ * @shader_load_commands: A3XX GPU memory descriptor for shader load IB
+ * @shader_save_commands: A3XX GPU memory descriptor for shader save IB
+ * @constantr_save_commands: A3XX GPU memory descriptor for constant save IB
+ * @constant_load_commands: A3XX GPU memory descriptor for constant load IB
+ * @cond_execs: A3XX GPU memory descriptor for conditional exec IB
+ * @hlsq_restore_commands: A3XX GPU memory descriptor for HLSQ restore IB
+ * @cmdqueue: Queue of command batches waiting to be dispatched for this context
+ * @cmdqueue_head: Head of the cmdqueue queue
+ * @cmdqueue_tail: Tail of the cmdqueue queue
+ * @pending: Priority list node for the dispatcher list of pending contexts
+ * @wq: Workqueue structure for contexts to sleep pending room in the queue
+ * @waiting: Workqueue structure for contexts waiting for a timestamp or event
+ * @queued: Number of commands queued in the cmdqueue
+ */
 struct adreno_context {
-	pid_t pid;
-	char pid_name[TASK_COMM_LEN];
-	unsigned int id;
+	struct kgsl_context base;
 	unsigned int ib_gpu_time_used;
 	unsigned int timestamp;
+	unsigned int internal_timestamp;
+	int state;
 	uint32_t flags;
 	unsigned int type;
-	struct kgsl_pagetable *pagetable;
+	struct mutex mutex;
 	struct kgsl_memdesc gpustate;
 	unsigned int reg_restore[3];
 	unsigned int shader_save[3];
 	unsigned int shader_restore[3];

-	/* Information of the GMEM shadow that is created in context create */
 	struct gmem_shadow_t context_gmem_shadow;

 	/* A2XX specific items */
@ -130,23 +174,44 @@ struct adreno_context {
 	struct kgsl_memdesc constant_load_commands[3];
 	struct kgsl_memdesc cond_execs[4];
 	struct kgsl_memdesc hlsqcontrol_restore_commands[1];
+
+	/* Dispatcher */
+	struct kgsl_cmdbatch *cmdqueue[ADRENO_CONTEXT_CMDQUEUE_SIZE];
+	int cmdqueue_head;
+	int cmdqueue_tail;
+
+	struct plist_node pending;
+	wait_queue_head_t wq;
+	wait_queue_head_t waiting;
+
+	int queued;
 };

-int adreno_drawctxt_create(struct kgsl_device *device,
-			struct kgsl_pagetable *pagetable,
-			struct kgsl_context *context,
+
+struct kgsl_context *adreno_drawctxt_create(struct kgsl_device_private *,
 			uint32_t *flags);

-void adreno_drawctxt_destroy(struct kgsl_device *device,
-			  struct kgsl_context *context);
+int adreno_drawctxt_detach(struct kgsl_context *context);

-void adreno_drawctxt_switch(struct adreno_device *adreno_dev,
+void adreno_drawctxt_destroy(struct kgsl_context *context);
+
+void adreno_drawctxt_sched(struct kgsl_device *device,
+		struct kgsl_context *context);
+
+int adreno_drawctxt_switch(struct adreno_device *adreno_dev,
 				struct adreno_context *drawctxt,
 				unsigned int flags);
 void adreno_drawctxt_set_bin_base_offset(struct kgsl_device *device,
 					struct kgsl_context *context,
 					unsigned int offset);

+int adreno_drawctxt_wait(struct adreno_device *adreno_dev,
+		struct kgsl_context *context,
+		uint32_t timestamp, unsigned int timeout);
+
+void adreno_drawctxt_invalidate(struct kgsl_device *device,
+		struct kgsl_context *context);
+
 /* GPU context switch helper functions */

 void build_quad_vtxbuff(struct adreno_context *drawctxt,
--- a/drivers/gpu/msm/adreno_postmortem.c
+++ b/drivers/gpu/msm/adreno_postmortem.c
@ -1,4 +1,4 @@
-/* Copyright (c) 2010-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2010-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -21,6 +21,7 @@
 #include "adreno_ringbuffer.h"
 #include "kgsl_cffdump.h"
 #include "kgsl_pwrctrl.h"
+#include "adreno_trace.h"

 #include "a2xx_reg.h"
 #include "a3xx_reg.h"
@ -724,6 +725,9 @@ int adreno_dump(struct kgsl_device *device, int manual)
 	kgsl_regread(device, REG_CP_IB2_BASE, &cp_ib2_base);
 	kgsl_regread(device, REG_CP_IB2_BUFSZ, &cp_ib2_bufsz);

+	trace_adreno_gpu_fault(rbbm_status, cp_rb_rptr, cp_rb_wptr,
+			cp_ib1_base, cp_ib1_bufsz, cp_ib2_base, cp_ib2_bufsz);
+
 	/* If postmortem dump is not enabled, dump minimal set and return */
 	if (!device->pm_dump_enable) {

@ -903,5 +907,9 @@ int adreno_dump(struct kgsl_device *device, int manual)
 error_vfree:
 	vfree(rb_copy);
 end:
+	/* Restart the dispatcher after a manually triggered dump */
+	if (manual)
+		adreno_dispatcher_start(adreno_dev);
+
 	return result;
 }
--- a/drivers/gpu/msm/adreno_ringbuffer.c
+++ b/drivers/gpu/msm/adreno_ringbuffer.c
@ -18,7 +18,6 @@
 #include "kgsl.h"
 #include "kgsl_sharedmem.h"
 #include "kgsl_cffdump.h"
-#include "kgsl_trace.h"

 #include "adreno.h"
 #include "adreno_pm4types.h"
@ -65,9 +64,6 @@ adreno_ringbuffer_waitspace(struct adreno_ringbuffer *rb,
 	unsigned long wait_time;
 	unsigned long wait_timeout = msecs_to_jiffies(ADRENO_IDLE_TIMEOUT);
 	unsigned long wait_time_part;
-	unsigned int prev_reg_val[ft_detect_regs_count];
-
-	memset(prev_reg_val, 0, sizeof(prev_reg_val));

 	/* if wptr ahead, fill the remaining with NOPs */
 	if (wptr_ahead) {
@ -105,43 +101,13 @@ adreno_ringbuffer_waitspace(struct adreno_ringbuffer *rb,
 		if (freecmds == 0 || freecmds > numcmds)
 			break;

-		/* Dont wait for timeout, detect hang faster.
-		 */
-		if (time_after(jiffies, wait_time_part)) {
-			wait_time_part = jiffies +
-				msecs_to_jiffies(KGSL_TIMEOUT_PART);
-			if ((adreno_ft_detect(rb->device,
-						prev_reg_val))){
-				KGSL_DRV_ERR(rb->device,
-				"Hang detected while waiting for freespace in"
-				"ringbuffer rptr: 0x%x, wptr: 0x%x\n",
-				rb->rptr, rb->wptr);
-				goto err;
-			}
-		}
-
 		if (time_after(jiffies, wait_time)) {
 			KGSL_DRV_ERR(rb->device,
 			"Timed out while waiting for freespace in ringbuffer "
 			"rptr: 0x%x, wptr: 0x%x\n", rb->rptr, rb->wptr);
-			goto err;
+			return -ETIMEDOUT;
 		}

-		continue;
-
-err:
-		if (!adreno_dump_and_exec_ft(rb->device)) {
-			if (context && context->flags & CTXT_FLAGS_GPU_HANG) {
-				KGSL_CTXT_WARN(rb->device,
-				"Context %p caused a gpu hang. Will not accept commands for context %d\n",
-				context, context->id);
-				return -EDEADLK;
-			}
-			wait_time = jiffies + wait_timeout;
-		} else {
-			/* GPU is hung and fault tolerance failed */
-			BUG();
-		}
 	}
 	return 0;
 }
@ -179,7 +145,8 @@ unsigned int *adreno_ringbuffer_allocspace(struct adreno_ringbuffer *rb,
 	if (!ret) {
 		ptr = (unsigned int *)rb->buffer_desc.hostptr + rb->wptr;
 		rb->wptr += numcmds;
-	}
+	} else
+		ptr = ERR_PTR(ret);

 	return ptr;
 }
@ -320,10 +287,9 @@ int adreno_ringbuffer_load_pfp_ucode(struct kgsl_device *device)
 	return 0;
 }

-int adreno_ringbuffer_start(struct adreno_ringbuffer *rb, unsigned int init_ram)
+int adreno_ringbuffer_start(struct adreno_ringbuffer *rb)
 {
 	int status;
-	/*cp_rb_cntl_u cp_rb_cntl; */
 	union reg_cp_rb_cntl cp_rb_cntl;
 	unsigned int rb_cntl;
 	struct kgsl_device *device = rb->device;
@ -332,9 +298,6 @@ int adreno_ringbuffer_start(struct adreno_ringbuffer *rb, unsigned int init_ram)
 	if (rb->flags & KGSL_FLAGS_STARTED)
 		return 0;

-	if (init_ram)
-		rb->global_ts = 0;
-
 	kgsl_sharedmem_set(&rb->memptrs_desc, 0, 0,
 			   sizeof(struct kgsl_rbmemptrs));

@ -444,7 +407,9 @@ int adreno_ringbuffer_start(struct adreno_ringbuffer *rb, unsigned int init_ram)
 	adreno_regwrite(device, REG_CP_ME_CNTL, 0);

 	/* ME init is GPU specific, so jump into the sub-function */
-	adreno_dev->gpudev->rb_init(adreno_dev, rb);
+	status = adreno_dev->gpudev->rb_init(adreno_dev, rb);
+	if (status)
+		return status;

 	/* idle device to validate ME INIT */
 	status = adreno_idle(device);
@ -482,6 +447,7 @@ int adreno_ringbuffer_init(struct kgsl_device *device)
 	 */
 	rb->sizedwords = KGSL_RB_SIZE >> 2;

+	rb->buffer_desc.flags = KGSL_MEMFLAGS_GPUREADONLY;
 	/* allocate memory for ringbuffer */
 	status = kgsl_allocate_contiguous(&rb->buffer_desc,
 		(rb->sizedwords << 2));
@ -505,6 +471,8 @@ int adreno_ringbuffer_init(struct kgsl_device *device)
 	/* overlay structure on memptrs memory */
 	rb->memptrs = (struct kgsl_rbmemptrs *) rb->memptrs_desc.hostptr;

+	rb->global_ts = 0;
+
 	return 0;
 }

@ -526,9 +494,9 @@ void adreno_ringbuffer_close(struct adreno_ringbuffer *rb)

 static int
 adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
-				struct adreno_context *context,
+				struct adreno_context *drawctxt,
 				unsigned int flags, unsigned int *cmds,
-				int sizedwords)
+				int sizedwords, uint32_t timestamp)
 {
 	struct adreno_device *adreno_dev = ADRENO_DEVICE(rb->device);
 	unsigned int *ringcmds;
@ -537,19 +505,20 @@ adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
 	unsigned int rcmd_gpu;
 	unsigned int context_id;
 	unsigned int gpuaddr = rb->device->memstore.gpuaddr;
-	unsigned int timestamp;

-	/*
-	 * if the context was not created with per context timestamp
-	 * support, we must use the global timestamp since issueibcmds
-	 * will be returning that one, or if an internal issue then
-	 * use global timestamp.
-	 */
-	if ((context && (context->flags & CTXT_FLAGS_PER_CONTEXT_TS)) &&
-		!(flags & KGSL_CMD_FLAGS_INTERNAL_ISSUE))
-		context_id = context->id;
-	else
+	/* The global timestamp always needs to be incremented */
+	rb->global_ts++;
+
+	/* If this is a internal IB, use the global timestamp for it */
+	if (!drawctxt || (flags & KGSL_CMD_FLAGS_INTERNAL_ISSUE)) {
+		timestamp = rb->global_ts;
 		context_id = KGSL_MEMSTORE_GLOBAL;
+	} else {
+		context_id = drawctxt->base.id;
+	}
+
+	if (drawctxt)
+		drawctxt->internal_timestamp = rb->global_ts;

 	/* reserve space to temporarily turn off protected mode
 	*  error checking if needed
@ -560,13 +529,8 @@ adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
 	/* internal ib command identifier for the ringbuffer */
 	total_sizedwords += (flags & KGSL_CMD_FLAGS_INTERNAL_ISSUE) ? 2 : 0;

-	/* Add CP_COND_EXEC commands to generate CP_INTERRUPT */
-	total_sizedwords += context ? 13 : 0;
-
-	if ((context) && (context->flags & CTXT_FLAGS_PER_CONTEXT_TS) &&
-		(flags & (KGSL_CMD_FLAGS_INTERNAL_ISSUE |
-		KGSL_CMD_FLAGS_GET_INT)))
-			total_sizedwords += 2;
+	/* Add two dwords for the CP_INTERRUPT */
+	total_sizedwords += drawctxt ? 2 : 0;

 	if (adreno_is_a3xx(adreno_dev))
 		total_sizedwords += 7;
@ -574,16 +538,25 @@ adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
 	if (adreno_is_a2xx(adreno_dev))
 		total_sizedwords += 2; /* CP_WAIT_FOR_IDLE */

-	total_sizedwords += 2; /* scratchpad ts for recovery */
 	total_sizedwords += 3; /* sop timestamp */
 	total_sizedwords += 4; /* eop timestamp */

-	if (KGSL_MEMSTORE_GLOBAL != context_id)
+	if (drawctxt) {
 		total_sizedwords += 3; /* global timestamp without cache
 					* flush for non-zero context */
+	}

-	ringcmds = adreno_ringbuffer_allocspace(rb, context, total_sizedwords);
-	if (!ringcmds)
+	if (adreno_is_a20x(adreno_dev))
+		total_sizedwords += 2; /* CACHE_FLUSH */
+
+	if (flags & KGSL_CMD_FLAGS_WFI)
+		total_sizedwords += 2; /* WFI */
+
+	ringcmds = adreno_ringbuffer_allocspace(rb, drawctxt, total_sizedwords);
+
+	if (IS_ERR(ringcmds))
+		return PTR_ERR(ringcmds);
+	if (ringcmds == NULL)
 		return -ENOSPC;

 	rcmd_gpu = rb->buffer_desc.gpuaddr
@ -597,18 +570,6 @@ adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
 		GSL_RB_WRITE(ringcmds, rcmd_gpu, KGSL_CMD_INTERNAL_IDENTIFIER);
 	}

-	/* always increment the global timestamp. once. */
-	rb->global_ts++;
-
-	if (KGSL_MEMSTORE_GLOBAL != context_id)
-		timestamp = context->timestamp;
-	else
-		timestamp = rb->global_ts;
-
-	/* scratchpad ts for recovery */
-	GSL_RB_WRITE(ringcmds, rcmd_gpu, cp_type0_packet(REG_CP_TIMESTAMP, 1));
-	GSL_RB_WRITE(ringcmds, rcmd_gpu, rb->global_ts);
-
 	/* start-of-pipeline timestamp */
 	GSL_RB_WRITE(ringcmds, rcmd_gpu, cp_type3_packet(CP_MEM_WRITE, 2));
 	GSL_RB_WRITE(ringcmds, rcmd_gpu, (gpuaddr +
@ -669,63 +630,21 @@ adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
 		KGSL_MEMSTORE_OFFSET(context_id, eoptimestamp)));
 	GSL_RB_WRITE(ringcmds, rcmd_gpu, timestamp);

-	if (KGSL_MEMSTORE_GLOBAL != context_id) {
+	if (drawctxt) {
 		GSL_RB_WRITE(ringcmds, rcmd_gpu,
 			cp_type3_packet(CP_MEM_WRITE, 2));
 		GSL_RB_WRITE(ringcmds, rcmd_gpu, (gpuaddr +
-		KGSL_MEMSTORE_OFFSET(KGSL_MEMSTORE_GLOBAL,
-			eoptimestamp)));
+			KGSL_MEMSTORE_OFFSET(KGSL_MEMSTORE_GLOBAL,
+				eoptimestamp)));
 		GSL_RB_WRITE(ringcmds, rcmd_gpu, rb->global_ts);
 	}
-	if (context) {
-		/* Conditional execution based on memory values */
-		GSL_RB_WRITE(ringcmds, rcmd_gpu,
-			cp_type3_packet(CP_COND_EXEC, 4));
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, (gpuaddr +
-			KGSL_MEMSTORE_OFFSET(
-				context_id, ts_cmp_enable)) >> 2);
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, (gpuaddr +
-			KGSL_MEMSTORE_OFFSET(
-				context_id, ref_wait_ts)) >> 2);
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, timestamp);
-		/* # of conditional command DWORDs */
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, 8);

-		/* Clear the ts_cmp_enable for the context */
-		GSL_RB_WRITE(ringcmds, rcmd_gpu,
-			cp_type3_packet(CP_MEM_WRITE, 2));
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, gpuaddr +
-			KGSL_MEMSTORE_OFFSET(
-				context_id, ts_cmp_enable));
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, 0x0);
-
-		/* Clear the ts_cmp_enable for the global timestamp */
-		GSL_RB_WRITE(ringcmds, rcmd_gpu,
-			cp_type3_packet(CP_MEM_WRITE, 2));
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, gpuaddr +
-			KGSL_MEMSTORE_OFFSET(
-				KGSL_MEMSTORE_GLOBAL, ts_cmp_enable));
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, 0x0);
-
-		/* Trigger the interrupt */
+	if (drawctxt || (flags & KGSL_CMD_FLAGS_INTERNAL_ISSUE)) {
 		GSL_RB_WRITE(ringcmds, rcmd_gpu,
 			cp_type3_packet(CP_INTERRUPT, 1));
 		GSL_RB_WRITE(ringcmds, rcmd_gpu, CP_INT_CNTL__RB_INT_MASK);
 	}

-	/*
-	 * If per context timestamps are enabled and any of the kgsl
-	 * internal commands want INT to be generated trigger the INT
-	*/
-	if ((context) && (context->flags & CTXT_FLAGS_PER_CONTEXT_TS) &&
-		(flags & (KGSL_CMD_FLAGS_INTERNAL_ISSUE |
-		KGSL_CMD_FLAGS_GET_INT))) {
-			GSL_RB_WRITE(ringcmds, rcmd_gpu,
-				cp_type3_packet(CP_INTERRUPT, 1));
-			GSL_RB_WRITE(ringcmds, rcmd_gpu,
-				CP_INT_CNTL__RB_INT_MASK);
-	}
-
 	if (adreno_is_a3xx(adreno_dev)) {
 		/* Dummy set-constant to trigger context rollover */
 		GSL_RB_WRITE(ringcmds, rcmd_gpu,
@ -735,9 +654,10 @@ adreno_ringbuffer_addcmds(struct adreno_ringbuffer *rb,
 		GSL_RB_WRITE(ringcmds, rcmd_gpu, 0);
 	}

-	if (flags & KGSL_CMD_FLAGS_EOF) {
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, cp_nop_packet(1));
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, KGSL_END_OF_FRAME_IDENTIFIER);
+	if (flags & KGSL_CMD_FLAGS_WFI) {
+		GSL_RB_WRITE(ringcmds, rcmd_gpu,
+			cp_type3_packet(CP_WAIT_FOR_IDLE, 1));
+		GSL_RB_WRITE(ringcmds, rcmd_gpu, 0x00000000);
 	}

 	adreno_ringbuffer_submit(rb);
@ -755,14 +675,10 @@ adreno_ringbuffer_issuecmds(struct kgsl_device *device,
 	struct adreno_device *adreno_dev = ADRENO_DEVICE(device);
 	struct adreno_ringbuffer *rb = &adreno_dev->ringbuffer;

-	if (device->state & KGSL_STATE_HUNG)
-		return kgsl_readtimestamp(device, KGSL_MEMSTORE_GLOBAL,
-					KGSL_TIMESTAMP_RETIRED);
-
 	flags |= KGSL_CMD_FLAGS_INTERNAL_ISSUE;

 	return adreno_ringbuffer_addcmds(rb, drawctxt, flags, cmds,
-							sizedwords);
+		sizedwords, 0);
 }

 static bool _parse_ibs(struct kgsl_device_private *dev_priv, uint gpuaddr,
@ -957,50 +873,108 @@ done:
 	return ret;
 }

-int
-adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
-				struct kgsl_context *context,
-				struct kgsl_ibdesc *ibdesc,
-				unsigned int numibs,
-				uint32_t *timestamp,
-				unsigned int flags)
+/**
+ * _ringbuffer_verify_ib() - parse an IB and verify that it is correct
+ * @dev_priv: Pointer to the process struct
+ * @ibdesc: Pointer to the IB descriptor
+ *
+ * This function only gets called if debugging is enabled  - it walks the IB and
+ * does additional level parsing and verification above and beyond what KGSL
+ * core does
+ */
+static inline bool _ringbuffer_verify_ib(struct kgsl_device_private *dev_priv,
+		struct kgsl_ibdesc *ibdesc)
 {
 	struct kgsl_device *device = dev_priv->device;
 	struct adreno_device *adreno_dev = ADRENO_DEVICE(device);
-	unsigned int *link = 0;
+
+	/* Check that the size of the IBs is under the allowable limit */
+	if (ibdesc->sizedwords == 0 || ibdesc->sizedwords > 0xFFFFF) {
+		KGSL_DRV_ERR(device, "Invalid IB size 0x%X\n",
+				ibdesc->sizedwords);
+		return false;
+	}
+
+	if (unlikely(adreno_dev->ib_check_level >= 1) &&
+		!_parse_ibs(dev_priv, ibdesc->gpuaddr, ibdesc->sizedwords)) {
+		KGSL_DRV_ERR(device, "Could not verify the IBs\n");
+		return false;
+	}
+
+	return true;
+}
+
+int
+adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
+				struct kgsl_context *context,
+				struct kgsl_cmdbatch *cmdbatch,
+				uint32_t *timestamp)
+{
+	struct kgsl_device *device = dev_priv->device;
+	struct adreno_device *adreno_dev = ADRENO_DEVICE(device);
+	struct adreno_context *drawctxt = ADRENO_CONTEXT(context);
+	int i, ret;
+
+	if (drawctxt->state == ADRENO_CONTEXT_STATE_INVALID)
+		return -EDEADLK;
+
+	/* Verify the IBs before they get queued */
+
+	for (i = 0; i < cmdbatch->ibcount; i++) {
+		if (!_ringbuffer_verify_ib(dev_priv, &cmdbatch->ibdesc[i]))
+			return -EINVAL;
+	}
+
+	/* Queue the command in the ringbuffer */
+	ret = adreno_context_queue_cmd(adreno_dev, drawctxt, cmdbatch,
+		timestamp);
+
+	if (ret)
+		KGSL_DRV_ERR(device, "adreno_context_queue_cmd returned %d\n",
+				ret);
+
+	return ret;
+}
+
+/* adreno_rindbuffer_submitcmd - submit userspace IBs to the GPU */
+int adreno_ringbuffer_submitcmd(struct adreno_device *adreno_dev,
+		struct kgsl_cmdbatch *cmdbatch)
+{
+	struct kgsl_device *device = &adreno_dev->dev;
+	struct kgsl_ibdesc *ibdesc;
+	unsigned int numibs;
+	unsigned int *link;
 	unsigned int *cmds;
 	unsigned int i;
-	struct adreno_context *drawctxt = NULL;
+	struct kgsl_context *context;
+	struct adreno_context *drawctxt;
 	unsigned int start_index = 0;
-	int ret = 0;
+	int ret;

-	if (device->state & KGSL_STATE_HUNG) {
-		ret = -EBUSY;
-		goto done;
-	}
+	context = cmdbatch->context;
+	drawctxt = ADRENO_CONTEXT(context);

-	if (!(adreno_dev->ringbuffer.flags & KGSL_FLAGS_STARTED) ||
-	      context == NULL || ibdesc == 0 || numibs == 0) {
-		ret = -EINVAL;
-		goto done;
-	}
-	drawctxt = context->devctxt;
+	ibdesc = cmdbatch->ibdesc;
+	numibs = cmdbatch->ibcount;

-	if (drawctxt->flags & CTXT_FLAGS_GPU_HANG) {
-		KGSL_CTXT_ERR(device, "proc %s failed fault tolerance"
-			" will not accept commands for context %d\n",
-			drawctxt->pid_name, drawctxt->id);
-		ret = -EDEADLK;
-		goto done;
-	}
+	/*When preamble is enabled, the preamble buffer with state restoration
+	commands are stored in the first node of the IB chain. We can skip that
+	if a context switch hasn't occured */

-	if (drawctxt->flags & CTXT_FLAGS_SKIP_EOF) {
-		KGSL_CTXT_ERR(device,
-			"proc %s triggered fault tolerance"
-			" skipping commands for context till EOF %d\n",
-			drawctxt->pid_name, drawctxt->id);
-		if (flags & KGSL_CMD_FLAGS_EOF)
-			drawctxt->flags &= ~CTXT_FLAGS_SKIP_EOF;
+	if ((drawctxt->flags & CTXT_FLAGS_PREAMBLE) &&
+		!(cmdbatch->priv & CMDBATCH_FLAG_FORCE_PREAMBLE) &&
+		(adreno_dev->drawctxt_active == drawctxt))
+		start_index = 1;
+
+	/*
+	 * In skip mode don't issue the draw IBs but keep all the other
+	 * accoutrements of a submision (including the interrupt) to keep
+	 * the accounting sane. Set start_index and numibs to 0 to just
+	 * generate the start and end markers and skip everything else
+	 */
+
+	if (cmdbatch->priv & CMDBATCH_FLAG_SKIP) {
+		start_index = 0;
 		numibs = 0;
 	}

@ -1011,14 +985,6 @@ adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
 		goto done;
 	}

-	/*When preamble is enabled, the preamble buffer with state restoration
-	commands are stored in the first node of the IB chain. We can skip that
-	if a context switch hasn't occured */
-
-	if (drawctxt->flags & CTXT_FLAGS_PREAMBLE &&
-		adreno_dev->drawctxt_active == drawctxt)
-		start_index = 1;
-
 	if (!start_index) {
 		*cmds++ = cp_nop_packet(1);
 		*cmds++ = KGSL_START_OF_IB_IDENTIFIER;
@ -1030,19 +996,17 @@ adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
 		*cmds++ = ibdesc[0].sizedwords;
 	}
 	for (i = start_index; i < numibs; i++) {
-		if (unlikely(adreno_dev->ib_check_level >= 1 &&
-		    !_parse_ibs(dev_priv, ibdesc[i].gpuaddr,
-				ibdesc[i].sizedwords))) {
-			ret = -EINVAL;
-			goto done;
-		}

-		if (ibdesc[i].sizedwords == 0) {
-			ret = -EINVAL;
-			goto done;
-		}
+		/*
+		 * Skip 0 sized IBs - these are presumed to have been removed
+		 * from consideration by the FT policy
+		 */
+
+		if (ibdesc[i].sizedwords == 0)
+			*cmds++ = cp_nop_packet(2);
+		else
+			*cmds++ = CP_HDR_INDIRECT_BUFFER_PFD;

-		*cmds++ = CP_HDR_INDIRECT_BUFFER_PFD;
 		*cmds++ = ibdesc[i].gpuaddr;
 		*cmds++ = ibdesc[i].sizedwords;
 	}
@ -1050,36 +1014,27 @@ adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
 	*cmds++ = cp_nop_packet(1);
 	*cmds++ = KGSL_END_OF_IB_IDENTIFIER;

-	kgsl_setstate(&device->mmu, context->id,
+	ret = kgsl_setstate(&device->mmu, context->id,
 		      kgsl_mmu_pt_get_flags(device->mmu.hwpagetable,
 					device->id));

-	adreno_drawctxt_switch(adreno_dev, drawctxt, flags);
-
-	if (drawctxt->flags & CTXT_FLAGS_USER_GENERATED_TS) {
-		if (timestamp_cmp(drawctxt->timestamp, *timestamp) >= 0) {
-			KGSL_DRV_ERR(device,
-				"Invalid user generated ts <%d:0x%x>, "
-				"less than last issued ts <%d:0x%x>\n",
-				drawctxt->id, *timestamp, drawctxt->id,
-				drawctxt->timestamp);
-			return -ERANGE;
-		}
-		drawctxt->timestamp = *timestamp;
-	} else
-		drawctxt->timestamp++;
-
-	ret = adreno_ringbuffer_addcmds(&adreno_dev->ringbuffer,
-					drawctxt,
-					(flags & KGSL_CMD_FLAGS_EOF),
-					&link[0], (cmds - link));
 	if (ret)
 		goto done;

-	if (drawctxt->flags & CTXT_FLAGS_PER_CONTEXT_TS)
-		*timestamp = drawctxt->timestamp;
-	else
-		*timestamp = adreno_dev->ringbuffer.global_ts;
+	ret = adreno_drawctxt_switch(adreno_dev, drawctxt, cmdbatch->flags);
+
+	/*
+	 * In the unlikely event of an error in the drawctxt switch,
+	 * treat it like a hang
+	 */
+	if (ret)
+		goto done;
+
+	ret = adreno_ringbuffer_addcmds(&adreno_dev->ringbuffer,
+					drawctxt,
+					cmdbatch->flags,
+					&link[0], (cmds - link),
+					cmdbatch->timestamp);

 #ifdef CONFIG_MSM_KGSL_CFF_DUMP
 	/*
@ -1090,209 +1045,11 @@ adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
 	adreno_idle(device);
 #endif

-	/*
-	 * If context hung and recovered then return error so that the
-	 * application may handle it
-	 */
-	if (drawctxt->flags & CTXT_FLAGS_GPU_HANG_FT) {
-		drawctxt->flags &= ~CTXT_FLAGS_GPU_HANG_FT;
-		ret = -EPROTO;
-	}
-
 done:
-	trace_kgsl_issueibcmds(device, context->id, ibdesc, numibs,
-		*timestamp, flags, ret, drawctxt->type);
+	kgsl_trace_issueibcmds(device, context->id, cmdbatch,
+		cmdbatch->timestamp, cmdbatch->flags, ret,
+		drawctxt->type);

 	kfree(link);
 	return ret;
 }
-
-static void _turn_preamble_on_for_ib_seq(struct adreno_ringbuffer *rb,
-				unsigned int rb_rptr)
-{
-	unsigned int temp_rb_rptr = rb_rptr;
-	unsigned int size = rb->buffer_desc.size;
-	unsigned int val[2];
-	int i = 0;
-	bool check = false;
-	bool cmd_start = false;
-
-	/* Go till the start of the ib sequence and turn on preamble */
-	while (temp_rb_rptr / sizeof(unsigned int) != rb->wptr) {
-		kgsl_sharedmem_readl(&rb->buffer_desc, &val[i], temp_rb_rptr);
-		if (check && KGSL_START_OF_IB_IDENTIFIER == val[i]) {
-			/* decrement i */
-			i = (i + 1) % 2;
-			if (val[i] == cp_nop_packet(4)) {
-				temp_rb_rptr = adreno_ringbuffer_dec_wrapped(
-						temp_rb_rptr, size);
-				kgsl_sharedmem_writel(&rb->buffer_desc,
-					temp_rb_rptr, cp_nop_packet(1));
-			}
-			KGSL_FT_INFO(rb->device,
-			"Turned preamble on at offset 0x%x\n",
-			temp_rb_rptr / 4);
-			break;
-		}
-		/* If you reach beginning of next command sequence then exit
-		 * First command encountered is the current one so don't break
-		 * on that. */
-		if (KGSL_CMD_IDENTIFIER == val[i]) {
-			if (cmd_start)
-				break;
-			cmd_start = true;
-		}
-
-		i = (i + 1) % 2;
-		if (1 == i)
-			check = true;
-		temp_rb_rptr = adreno_ringbuffer_inc_wrapped(temp_rb_rptr,
-								size);
-	}
-}
-
-void adreno_ringbuffer_extract(struct adreno_ringbuffer *rb,
-				struct adreno_ft_data *ft_data)
-{
-	struct kgsl_device *device = rb->device;
-	unsigned int rb_rptr = ft_data->start_of_replay_cmds;
-	unsigned int good_rb_idx = 0, bad_rb_idx = 0, temp_rb_idx = 0;
-	unsigned int last_good_cmd_end_idx = 0, last_bad_cmd_end_idx = 0;
-	unsigned int cmd_start_idx = 0;
-	unsigned int val1 = 0;
-	int copy_rb_contents = 0;
-	unsigned int temp_rb_rptr;
-	struct kgsl_context *k_ctxt;
-	struct adreno_context *a_ctxt;
-	unsigned int size = rb->buffer_desc.size;
-	unsigned int *temp_rb_buffer = ft_data->rb_buffer;
-	int *rb_size = &ft_data->rb_size;
-	unsigned int *bad_rb_buffer = ft_data->bad_rb_buffer;
-	int *bad_rb_size = &ft_data->bad_rb_size;
-	unsigned int *good_rb_buffer = ft_data->good_rb_buffer;
-	int *good_rb_size = &ft_data->good_rb_size;
-
-	/*
-	 * If the start index from where commands need to be copied is invalid
-	 * then no need to save off any commands
-	 */
-	if (0xFFFFFFFF == ft_data->start_of_replay_cmds)
-		return;
-
-	k_ctxt = kgsl_context_get(device, ft_data->context_id);
-
-	if (k_ctxt) {
-		a_ctxt = k_ctxt->devctxt;
-		if (a_ctxt->flags & CTXT_FLAGS_PREAMBLE)
-			_turn_preamble_on_for_ib_seq(rb, rb_rptr);
-		kgsl_context_put(k_ctxt);
-	}
-	k_ctxt = NULL;
-
-	/* Walk the rb from the context switch. Omit any commands
-	 * for an invalid context. */
-	while ((rb_rptr / sizeof(unsigned int)) != rb->wptr) {
-		kgsl_sharedmem_readl(&rb->buffer_desc, &val1, rb_rptr);
-
-		if (KGSL_CMD_IDENTIFIER == val1) {
-			/* Start is the NOP dword that comes before
-			 * KGSL_CMD_IDENTIFIER */
-			cmd_start_idx = temp_rb_idx - 1;
-			if ((copy_rb_contents) && (good_rb_idx))
-				last_good_cmd_end_idx = good_rb_idx - 1;
-			if ((!copy_rb_contents) && (bad_rb_idx))
-				last_bad_cmd_end_idx = bad_rb_idx - 1;
-		}
-
-		/* check for context switch indicator */
-		if (val1 == KGSL_CONTEXT_TO_MEM_IDENTIFIER) {
-			unsigned int temp_idx, val2;
-			/* increment by 3 to get to the context_id */
-			temp_rb_rptr = rb_rptr + (3 * sizeof(unsigned int)) %
-					size;
-			kgsl_sharedmem_readl(&rb->buffer_desc, &val2,
-						temp_rb_rptr);
-
-			/* if context switches to a context that did not cause
-			 * hang then start saving the rb contents as those
-			 * commands can be executed */
-			k_ctxt = kgsl_context_get(rb->device, val2);
-
-			if (k_ctxt) {
-				a_ctxt = k_ctxt->devctxt;
-
-			/* If we are changing to a good context and were not
-			 * copying commands then copy over commands to the good
-			 * context */
-			if (!copy_rb_contents && ((k_ctxt &&
-				!(a_ctxt->flags & CTXT_FLAGS_GPU_HANG)) ||
-				!k_ctxt)) {
-				for (temp_idx = cmd_start_idx;
-					temp_idx < temp_rb_idx;
-					temp_idx++)
-					good_rb_buffer[good_rb_idx++] =
-						temp_rb_buffer[temp_idx];
-				ft_data->last_valid_ctx_id = val2;
-				copy_rb_contents = 1;
-				/* remove the good commands from bad buffer */
-				bad_rb_idx = last_bad_cmd_end_idx;
-			} else if (copy_rb_contents && k_ctxt &&
-				(a_ctxt->flags & CTXT_FLAGS_GPU_HANG)) {
-
-				/* If we are changing back to a bad context
-				 * from good ctxt and were not copying commands
-				 * to bad ctxt then copy over commands to
-				 * the bad context */
-				for (temp_idx = cmd_start_idx;
-					temp_idx < temp_rb_idx;
-					temp_idx++)
-					bad_rb_buffer[bad_rb_idx++] =
-						temp_rb_buffer[temp_idx];
-				/* If we are changing to bad context then
-				 * remove the dwords we copied for this
-				 * sequence from the good buffer */
-				good_rb_idx = last_good_cmd_end_idx;
-				copy_rb_contents = 0;
-			}
-			}
-			kgsl_context_put(k_ctxt);
-		}
-
-		if (copy_rb_contents)
-			good_rb_buffer[good_rb_idx++] = val1;
-		else
-			bad_rb_buffer[bad_rb_idx++] = val1;
-
-		/* Copy both good and bad commands to temp buffer */
-		temp_rb_buffer[temp_rb_idx++] = val1;
-
-		rb_rptr = adreno_ringbuffer_inc_wrapped(rb_rptr, size);
-	}
-	*good_rb_size = good_rb_idx;
-	*bad_rb_size = bad_rb_idx;
-	*rb_size = temp_rb_idx;
-}
-
-void
-adreno_ringbuffer_restore(struct adreno_ringbuffer *rb, unsigned int *rb_buff,
-			int num_rb_contents)
-{
-	int i;
-	unsigned int *ringcmds;
-	unsigned int rcmd_gpu;
-
-	if (!num_rb_contents)
-		return;
-
-	if (num_rb_contents > (rb->buffer_desc.size - rb->wptr)) {
-		adreno_regwrite(rb->device, REG_CP_RB_RPTR, 0);
-		rb->rptr = 0;
-		BUG_ON(num_rb_contents > rb->buffer_desc.size);
-	}
-	ringcmds = (unsigned int *)rb->buffer_desc.hostptr + rb->wptr;
-	rcmd_gpu = rb->buffer_desc.gpuaddr + sizeof(unsigned int) * rb->wptr;
-	for (i = 0; i < num_rb_contents; i++)
-		GSL_RB_WRITE(ringcmds, rcmd_gpu, rb_buff[i]);
-	rb->wptr += num_rb_contents;
-	adreno_ringbuffer_submit(rb);
-}
--- a/drivers/gpu/msm/adreno_ringbuffer.h
+++ b/drivers/gpu/msm/adreno_ringbuffer.h
@ -27,7 +27,6 @@

 struct kgsl_device;
 struct kgsl_device_private;
-struct adreno_ft_data;

 #define GSL_RB_MEMPTRS_SCRATCH_COUNT	 8
 struct kgsl_rbmemptrs {
@ -90,15 +89,15 @@ struct adreno_ringbuffer {

 int adreno_ringbuffer_issueibcmds(struct kgsl_device_private *dev_priv,
 				struct kgsl_context *context,
-				struct kgsl_ibdesc *ibdesc,
-				unsigned int numibs,
-				uint32_t *timestamp,
-				unsigned int flags);
+				struct kgsl_cmdbatch *cmdbatch,
+				uint32_t *timestamp);
+
+int adreno_ringbuffer_submitcmd(struct adreno_device *adreno_dev,
+		struct kgsl_cmdbatch *cmdbatch);

 int adreno_ringbuffer_init(struct kgsl_device *device);

-int adreno_ringbuffer_start(struct adreno_ringbuffer *rb,
-				unsigned int init_ram);
+int adreno_ringbuffer_start(struct adreno_ringbuffer *rb);

 void adreno_ringbuffer_stop(struct adreno_ringbuffer *rb);

@ -114,13 +113,6 @@ void adreno_ringbuffer_submit(struct adreno_ringbuffer *rb);

 void kgsl_cp_intrcallback(struct kgsl_device *device);

-void adreno_ringbuffer_extract(struct adreno_ringbuffer *rb,
-				struct adreno_ft_data *ft_data);
-
-void
-adreno_ringbuffer_restore(struct adreno_ringbuffer *rb, unsigned int *rb_buff,
-			int num_rb_contents);
-
 unsigned int *adreno_ringbuffer_allocspace(struct adreno_ringbuffer *rb,
 						struct adreno_context *context,
 						unsigned int numcmds);
--- a/drivers/gpu/msm/adreno_snapshot.c
+++ b/drivers/gpu/msm/adreno_snapshot.c
@ -1,4 +1,4 @@
-/* Copyright (c) 2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2012-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -161,6 +161,12 @@ static unsigned int vfd_control_0;
 static unsigned int sp_vs_pvt_mem_addr;
 static unsigned int sp_fs_pvt_mem_addr;

+/*
+ * Cached value of SP_VS_OBJ_START_REG and SP_FS_OBJ_START_REG.
+ */
+static unsigned int sp_vs_obj_start_reg;
+static unsigned int sp_fs_obj_start_reg;
+
 /*
 * Each load state block has two possible types.  Each type has a different
 * number of dwords per unit.  Use this handy lookup table to make sure
@ -373,6 +379,26 @@ static int ib_parse_draw_indx(struct kgsl_device *device, unsigned int *pkt,
 		sp_fs_pvt_mem_addr = 0;
 	}

+	if (sp_vs_obj_start_reg) {
+		ret = kgsl_snapshot_get_object(device, ptbase,
+			sp_vs_obj_start_reg & 0xFFFFFFE0, 0,
+			SNAPSHOT_GPU_OBJECT_GENERIC);
+		if (ret < 0)
+			return -EINVAL;
+		snapshot_frozen_objsize += ret;
+		sp_vs_obj_start_reg = 0;
+	}
+
+	if (sp_fs_obj_start_reg) {
+		ret = kgsl_snapshot_get_object(device, ptbase,
+			sp_fs_obj_start_reg & 0xFFFFFFE0, 0,
+			SNAPSHOT_GPU_OBJECT_GENERIC);
+		if (ret < 0)
+			return -EINVAL;
+		snapshot_frozen_objsize += ret;
+		sp_fs_obj_start_reg = 0;
+	}
+
 	/* Finally: VBOs */

 	/* The number of active VBOs is stored in VFD_CONTROL_O[31:27] */
@ -444,7 +470,7 @@ static void ib_parse_type0(struct kgsl_device *device, unsigned int *ptr,
 	int offset = type0_pkt_offset(*ptr);
 	int i;

-	for (i = 0; i < size; i++, offset++) {
+	for (i = 0; i < size - 1; i++, offset++) {

 		/* Visiblity stream buffer */

@ -505,11 +531,20 @@ static void ib_parse_type0(struct kgsl_device *device, unsigned int *ptr,
 			case A3XX_SP_FS_PVT_MEM_ADDR_REG:
 				sp_fs_pvt_mem_addr = ptr[i + 1];
 				break;
+			case A3XX_SP_VS_OBJ_START_REG:
+				sp_vs_obj_start_reg = ptr[i + 1];
+				break;
+			case A3XX_SP_FS_OBJ_START_REG:
+				sp_fs_obj_start_reg = ptr[i + 1];
+				break;
 			}
 		}
 	}
 }

+static inline int parse_ib(struct kgsl_device *device, unsigned int ptbase,
+		unsigned int gpuaddr, unsigned int dwords);
+
 /* Add an IB as a GPU object, but first, parse it to find more goodies within */

 static int ib_add_gpu_object(struct kgsl_device *device, unsigned int ptbase,
@ -549,32 +584,12 @@ static int ib_add_gpu_object(struct kgsl_device *device, unsigned int ptbase,
 			if (adreno_cmd_is_ib(src[i])) {
 				unsigned int gpuaddr = src[i + 1];
 				unsigned int size = src[i + 2];
-				unsigned int ibbase;

-				/* Address of the last processed IB2 */
-				kgsl_regread(device, REG_CP_IB2_BASE, &ibbase);
+				ret = parse_ib(device, ptbase, gpuaddr, size);

-				/*
-				 * If this is the last IB2 that was executed,
-				 * then push it to make sure it goes into the
-				 * static space
-				 */
-
-				if (ibbase == gpuaddr)
-					push_object(device,
-						SNAPSHOT_OBJ_TYPE_IB, ptbase,
-						gpuaddr, size);
-				else {
-					ret = ib_add_gpu_object(device,
-						ptbase, gpuaddr, size);
-
-					/*
-					 * If adding the IB failed then stop
-					 * parsing
-					 */
-					if (ret < 0)
-						goto done;
-				}
+				/* If adding the IB failed then stop parsing */
+				if (ret < 0)
+					goto done;
 			} else {
 				ret = ib_parse_type3(device, &src[i], ptbase);
 				/*
@ -604,6 +619,36 @@ done:
 	return ret;
 }

+/*
+ * We want to store the last executed IB1 and IB2 in the static region to ensure
+ * that we get at least some information out of the snapshot even if we can't
+ * access the dynamic data from the sysfs file.  Push all other IBs on the
+ * dynamic list
+ */
+static inline int parse_ib(struct kgsl_device *device, unsigned int ptbase,
+		unsigned int gpuaddr, unsigned int dwords)
+{
+	unsigned int ib1base, ib2base;
+	int ret = 0;
+
+	/*
+	 * Check the IB address - if it is either the last executed IB1 or the
+	 * last executed IB2 then push it into the static blob otherwise put
+	 * it in the dynamic list
+	 */
+
+	kgsl_regread(device, REG_CP_IB1_BASE, &ib1base);
+	kgsl_regread(device, REG_CP_IB2_BASE, &ib2base);
+
+	if (gpuaddr == ib1base || gpuaddr == ib2base)
+		push_object(device, SNAPSHOT_OBJ_TYPE_IB, ptbase,
+			gpuaddr, dwords);
+	else
+		ret = ib_add_gpu_object(device, ptbase, gpuaddr, dwords);
+
+	return ret;
+}
+
 /* Snapshot the ringbuffer memory */
 static int snapshot_rb(struct kgsl_device *device, void *snapshot,
 	int remain, void *priv)
@ -740,13 +785,13 @@ static int snapshot_rb(struct kgsl_device *device, void *snapshot,

 			struct kgsl_memdesc *memdesc =
 				adreno_find_ctxtmem(device, ptbase, ibaddr,
-					ibsize);
+					ibsize << 2);

 			/* IOMMU uses a NOP IB placed in setsate memory */
 			if (NULL == memdesc)
 				if (kgsl_gpuaddr_in_memdesc(
 						&device->mmu.setstate_memory,
-						ibaddr, ibsize))
+						ibaddr, ibsize << 2))
 					memdesc = &device->mmu.setstate_memory;
 			/*
 			 * The IB from CP_IB1_BASE and the IBs for legacy
@ -754,12 +799,11 @@ static int snapshot_rb(struct kgsl_device *device, void *snapshot,
 			 * others get marked at GPU objects
 			 */

-			if (ibaddr == ibbase || memdesc != NULL)
+			if (memdesc != NULL)
 				push_object(device, SNAPSHOT_OBJ_TYPE_IB,
 					ptbase, ibaddr, ibsize);
 			else
-				ib_add_gpu_object(device, ptbase, ibaddr,
-					ibsize);
+				parse_ib(device, ptbase, ibaddr, ibsize);
 		}

 		index = index + 1;
@ -804,15 +848,14 @@ static int snapshot_ib(struct kgsl_device *device, void *snapshot,
 				continue;

 			if (adreno_cmd_is_ib(*src))
-				push_object(device, SNAPSHOT_OBJ_TYPE_IB,
-					obj->ptbase, src[1], src[2]);
-			else {
+				ret = parse_ib(device, obj->ptbase, src[1],
+					src[2]);
+			else
 				ret = ib_parse_type3(device, src, obj->ptbase);

-				/* Stop parsing if the type3 decode fails */
-				if (ret < 0)
-					break;
-			}
+			/* Stop parsing if the type3 decode fails */
+			if (ret < 0)
+				break;
 		}
 	}

--- a/drivers/gpu/msm/adreno_trace.c
+++ b/drivers/gpu/msm/adreno_trace.c
@ -0,0 +1,18 @@
+/* Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include "adreno.h"
+
+/* Instantiate tracepoints */
+#define CREATE_TRACE_POINTS
+#include "adreno_trace.h"
--- a/drivers/gpu/msm/adreno_trace.h
+++ b/drivers/gpu/msm/adreno_trace.h
@ -0,0 +1,169 @@
+/* Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#if !defined(_ADRENO_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _ADRENO_TRACE_H
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kgsl
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE adreno_trace
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(adreno_cmdbatch_queued,
+	TP_PROTO(struct kgsl_cmdbatch *cmdbatch, unsigned int queued),
+	TP_ARGS(cmdbatch, queued),
+	TP_STRUCT__entry(
+		__field(unsigned int, id)
+		__field(unsigned int, timestamp)
+		__field(unsigned int, queued)
+	),
+	TP_fast_assign(
+		__entry->id = cmdbatch->context->id;
+		__entry->timestamp = cmdbatch->timestamp;
+		__entry->queued = queued;
+	),
+	TP_printk(
+		"ctx=%u ts=%u queued=%u",
+			__entry->id, __entry->timestamp, __entry->queued
+	)
+);
+
+DECLARE_EVENT_CLASS(adreno_cmdbatch_template,
+	TP_PROTO(struct kgsl_cmdbatch *cmdbatch, int inflight),
+	TP_ARGS(cmdbatch, inflight),
+	TP_STRUCT__entry(
+		__field(unsigned int, id)
+		__field(unsigned int, timestamp)
+		__field(unsigned int, inflight)
+	),
+	TP_fast_assign(
+		__entry->id = cmdbatch->context->id;
+		__entry->timestamp = cmdbatch->timestamp;
+		__entry->inflight = inflight;
+	),
+	TP_printk(
+		"ctx=%u ts=%u inflight=%u",
+			__entry->id, __entry->timestamp,
+			__entry->inflight
+	)
+);
+
+DEFINE_EVENT(adreno_cmdbatch_template, adreno_cmdbatch_retired,
+	TP_PROTO(struct kgsl_cmdbatch *cmdbatch, int inflight),
+	TP_ARGS(cmdbatch, inflight)
+);
+
+DEFINE_EVENT(adreno_cmdbatch_template, adreno_cmdbatch_submitted,
+	TP_PROTO(struct kgsl_cmdbatch *cmdbatch, int inflight),
+	TP_ARGS(cmdbatch, inflight)
+);
+
+DECLARE_EVENT_CLASS(adreno_drawctxt_template,
+	TP_PROTO(struct adreno_context *drawctxt),
+	TP_ARGS(drawctxt),
+	TP_STRUCT__entry(
+		__field(unsigned int, id)
+	),
+	TP_fast_assign(
+		__entry->id = drawctxt->base.id;
+	),
+	TP_printk("ctx=%u", __entry->id)
+);
+
+DEFINE_EVENT(adreno_drawctxt_template, adreno_context_sleep,
+	TP_PROTO(struct adreno_context *drawctxt),
+	TP_ARGS(drawctxt)
+);
+
+DEFINE_EVENT(adreno_drawctxt_template, adreno_context_wake,
+	TP_PROTO(struct adreno_context *drawctxt),
+	TP_ARGS(drawctxt)
+);
+
+DEFINE_EVENT(adreno_drawctxt_template, dispatch_queue_context,
+	TP_PROTO(struct adreno_context *drawctxt),
+	TP_ARGS(drawctxt)
+);
+
+TRACE_EVENT(adreno_drawctxt_wait_start,
+	TP_PROTO(unsigned int id, unsigned int ts),
+	TP_ARGS(id, ts),
+	TP_STRUCT__entry(
+		__field(unsigned int, id)
+		__field(unsigned int, ts)
+	),
+	TP_fast_assign(
+		__entry->id = id;
+		__entry->ts = ts;
+	),
+	TP_printk(
+		"ctx=%u ts=%u",
+			__entry->id, __entry->ts
+	)
+);
+
+TRACE_EVENT(adreno_drawctxt_wait_done,
+	TP_PROTO(unsigned int id, unsigned int ts, int status),
+	TP_ARGS(id, ts, status),
+	TP_STRUCT__entry(
+		__field(unsigned int, id)
+		__field(unsigned int, ts)
+		__field(int, status)
+	),
+	TP_fast_assign(
+		__entry->id = id;
+		__entry->ts = ts;
+		__entry->status = status;
+	),
+	TP_printk(
+		"ctx=%u ts=%u status=%d",
+			__entry->id, __entry->ts, __entry->status
+	)
+);
+
+TRACE_EVENT(adreno_gpu_fault,
+	TP_PROTO(unsigned int status, unsigned int rptr, unsigned int wptr,
+		unsigned int ib1base, unsigned int ib1size,
+		unsigned int ib2base, unsigned int ib2size),
+	TP_ARGS(status, rptr, wptr, ib1base, ib1size, ib2base, ib2size),
+	TP_STRUCT__entry(
+		__field(unsigned int, status)
+		__field(unsigned int, rptr)
+		__field(unsigned int, wptr)
+		__field(unsigned int, ib1base)
+		__field(unsigned int, ib1size)
+		__field(unsigned int, ib2base)
+		__field(unsigned int, ib2size)
+	),
+	TP_fast_assign(
+		__entry->status = status;
+		__entry->rptr = rptr;
+		__entry->wptr = wptr;
+		__entry->ib1base = ib1base;
+		__entry->ib1size = ib1size;
+		__entry->ib2base = ib2base;
+		__entry->ib2size = ib2size;
+	),
+	TP_printk("status=%X RB=%X/%X IB1=%X/%X IB2=%X/%X",
+		__entry->status, __entry->wptr, __entry->rptr,
+		__entry->ib1base, __entry->ib1size, __entry->ib2base,
+		__entry->ib2size)
+);
+
+#endif /* _ADRENO_TRACE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--- a/drivers/gpu/msm/kgsl.c
+++ b/drivers/gpu/msm/kgsl.c
--- a/drivers/gpu/msm/kgsl.h
+++ b/drivers/gpu/msm/kgsl.h
@ -130,12 +130,14 @@ struct kgsl_driver {
 		unsigned int mapped_max;
 		unsigned int histogram[16];
 	} stats;
+	unsigned int full_cache_threshold;
 };

 extern struct kgsl_driver kgsl_driver;

 struct kgsl_pagetable;
 struct kgsl_memdesc;
+struct kgsl_cmdbatch;

 struct kgsl_memdesc_ops {
 	int (*vmflags)(struct kgsl_memdesc *);
@ -149,6 +151,8 @@ struct kgsl_memdesc_ops {
 #define KGSL_MEMDESC_GUARD_PAGE BIT(0)
 /* Set if the memdesc is mapped into all pagetables */
 #define KGSL_MEMDESC_GLOBAL BIT(1)
+/* The memdesc is frozen during a snapshot */
+#define KGSL_MEMDESC_FROZEN BIT(2)

 /* shared memory allocation */
 struct kgsl_memdesc {
@ -175,15 +179,10 @@ struct kgsl_memdesc {
 #define KGSL_MEM_ENTRY_ION    4
 #define KGSL_MEM_ENTRY_MAX    5

-/* List of flags */
-
-#define KGSL_MEM_ENTRY_FROZEN (1 << 0)
-
 struct kgsl_mem_entry {
 	struct kref refcount;
 	struct kgsl_memdesc memdesc;
 	int memtype;
-	int flags;
 	void *priv_data;
 	struct rb_node node;
 	unsigned int id;
@ -229,6 +228,14 @@ int kgsl_resume_driver(struct platform_device *pdev);
 void kgsl_early_suspend_driver(struct early_suspend *h);
 void kgsl_late_resume_driver(struct early_suspend *h);

+void kgsl_trace_regwrite(struct kgsl_device *device, unsigned int offset,
+		unsigned int value);
+
+void kgsl_trace_issueibcmds(struct kgsl_device *device, int id,
+		struct kgsl_cmdbatch *cmdbatch,
+		unsigned int timestamp, unsigned int flags,
+		int result, unsigned int type);
+
 #ifdef CONFIG_MSM_KGSL_DRM
 extern int kgsl_drm_init(struct platform_device *dev);
 extern void kgsl_drm_exit(void);
@ -246,6 +253,10 @@ static inline void kgsl_drm_exit(void)
 static inline int kgsl_gpuaddr_in_memdesc(const struct kgsl_memdesc *memdesc,
 				unsigned int gpuaddr, unsigned int size)
 {
+	/* set a minimum size to search for */
+	if (!size)
+		size = 1;
+
 	/* don't overflow */
 	if ((gpuaddr + size) < gpuaddr)
 		return 0;
--- a/drivers/gpu/msm/kgsl_cffdump.c
+++ b/drivers/gpu/msm/kgsl_cffdump.c
@ -28,6 +28,7 @@
 #include "kgsl_log.h"
 #include "kgsl_sharedmem.h"
 #include "adreno_pm4types.h"
+#include "adreno.h"

 static struct rchan	*chan;
 static struct dentry	*dir;
@ -334,7 +335,7 @@ void kgsl_cffdump_init()
 		return;
 	}

-	kgsl_cff_dump_enable = 1;
+	kgsl_cff_dump_enable = 0;

 	spin_lock_init(&cffdump_lock);

@ -356,60 +357,71 @@ void kgsl_cffdump_destroy()
 		debugfs_remove(dir);
 }

-void kgsl_cffdump_open(enum kgsl_deviceid device_id)
+void kgsl_cffdump_open(struct kgsl_device *device)
 {
-	kgsl_cffdump_memory_base(device_id, KGSL_PAGETABLE_BASE,
-			kgsl_mmu_get_ptsize(), SZ_256K);
+	struct adreno_device *adreno_dev = ADRENO_DEVICE(device);
+	if (!kgsl_cff_dump_enable)
+		return;
+
+	if (KGSL_MMU_TYPE_IOMMU == kgsl_mmu_get_mmutype()) {
+		kgsl_cffdump_memory_base(device->id,
+			kgsl_mmu_get_base_addr(&device->mmu),
+			kgsl_mmu_get_ptsize(&device->mmu) +
+			KGSL_IOMMU_GLOBAL_MEM_SIZE, adreno_dev->gmem_size);
+	} else {
+		kgsl_cffdump_memory_base(device->id,
+			kgsl_mmu_get_base_addr(&device->mmu),
+			kgsl_mmu_get_ptsize(&device->mmu),
+			adreno_dev->gmem_size);
+	}
 }

 void kgsl_cffdump_memory_base(enum kgsl_deviceid device_id, unsigned int base,
 			      unsigned int range, unsigned gmemsize)
 {
+	if (!kgsl_cff_dump_enable)
+		return;
 	cffdump_printline(device_id, CFF_OP_MEMORY_BASE, base,
 			range, gmemsize, 0, 0);
 }

 void kgsl_cffdump_hang(enum kgsl_deviceid device_id)
 {
+	if (!kgsl_cff_dump_enable)
+		return;
 	cffdump_printline(device_id, CFF_OP_HANG, 0, 0, 0, 0, 0);
 }

 void kgsl_cffdump_close(enum kgsl_deviceid device_id)
 {
+	if (!kgsl_cff_dump_enable)
+		return;
 	cffdump_printline(device_id, CFF_OP_EOF, 0, 0, 0, 0, 0);
 }

+
 void kgsl_cffdump_user_event(unsigned int cff_opcode, unsigned int op1,
 		unsigned int op2, unsigned int op3,
 		unsigned int op4, unsigned int op5)
 {
+	if (!kgsl_cff_dump_enable)
+		return;
 	cffdump_printline(-1, cff_opcode, op1, op2, op3, op4, op5);
 }

-void kgsl_cffdump_syncmem(struct kgsl_device_private *dev_priv,
-	const struct kgsl_memdesc *memdesc, uint gpuaddr, uint sizebytes,
-	bool clean_cache)
+void kgsl_cffdump_syncmem(struct kgsl_device *device,
+			  struct kgsl_memdesc *memdesc, uint gpuaddr,
+			  uint sizebytes, bool clean_cache)
 {
 	const void *src;

 	if (!kgsl_cff_dump_enable)
 		return;

+	BUG_ON(memdesc == NULL);
+
 	total_syncmem += sizebytes;

-	if (memdesc == NULL) {
-		struct kgsl_mem_entry *entry;
-		spin_lock(&dev_priv->process_priv->mem_lock);
-		entry = kgsl_sharedmem_find_region(dev_priv->process_priv,
-			gpuaddr, sizebytes);
-		spin_unlock(&dev_priv->process_priv->mem_lock);
-		if (entry == NULL) {
-			KGSL_CORE_ERR("did not find mapping "
-				"for gpuaddr: 0x%08x\n", gpuaddr);
-			return;
-		}
-		memdesc = &entry->memdesc;
-	}
 	src = (uint *)kgsl_gpuaddr_to_vaddr(memdesc, gpuaddr);
 	if (memdesc->hostptr == NULL) {
 		KGSL_CORE_ERR("no kernel mapping for "
@ -522,7 +534,7 @@ static int subbuf_start_handler(struct rchan_buf *buf,
 }

 static struct dentry *create_buf_file_handler(const char *filename,
-	struct dentry *parent, int mode, struct rchan_buf *buf,
+	struct dentry *parent, unsigned short mode, struct rchan_buf *buf,
 	int *is_global)
 {
 	return debugfs_create_file(filename, mode, parent, buf,
--- a/drivers/gpu/msm/kgsl_cffdump.h
+++ b/drivers/gpu/msm/kgsl_cffdump.h
@ -22,10 +22,10 @@

 void kgsl_cffdump_init(void);
 void kgsl_cffdump_destroy(void);
-void kgsl_cffdump_open(enum kgsl_deviceid device_id);
-void kgsl_cffdump_close(enum kgsl_deviceid device_id);
-void kgsl_cffdump_syncmem(struct kgsl_device_private *dev_priv,
-	const struct kgsl_memdesc *memdesc, uint physaddr, uint sizebytes,
+void kgsl_cffdump_open(struct kgsl_device *device);
+void kgsl_cffdump_close(struct kgsl_device *device);
+void kgsl_cffdump_syncmem(struct kgsl_device *,
+	struct kgsl_memdesc *memdesc, uint physaddr, uint sizebytes,
 	bool clean_cache);
 void kgsl_cffdump_setmem(uint addr, uint value, uint sizebytes);
 void kgsl_cffdump_regwrite(enum kgsl_deviceid device_id, uint addr,
@ -49,7 +49,7 @@ void kgsl_cffdump_hang(enum kgsl_deviceid device_id);

 #define kgsl_cffdump_init()					(void)0
 #define kgsl_cffdump_destroy()					(void)0
-#define kgsl_cffdump_open(device_id)				(void)0
+#define kgsl_cffdump_open(device)				(void)0
 #define kgsl_cffdump_close(device_id)				(void)0
 #define kgsl_cffdump_syncmem(dev_priv, memdesc, addr, sizebytes, clean_cache) \
 	(void) 0
--- a/drivers/gpu/msm/kgsl_debugfs.c
+++ b/drivers/gpu/msm/kgsl_debugfs.c
@ -123,7 +123,6 @@ KGSL_DEBUGFS_LOG(cmd_log);
 KGSL_DEBUGFS_LOG(ctxt_log);
 KGSL_DEBUGFS_LOG(mem_log);
 KGSL_DEBUGFS_LOG(pwr_log);
-KGSL_DEBUGFS_LOG(ft_log);

 static int memfree_hist_print(struct seq_file *s, void *unused)
 {
@ -185,7 +184,6 @@ void kgsl_device_debugfs_init(struct kgsl_device *device)
 	device->drv_log = KGSL_LOG_LEVEL_DEFAULT;
 	device->mem_log = KGSL_LOG_LEVEL_DEFAULT;
 	device->pwr_log = KGSL_LOG_LEVEL_DEFAULT;
-	device->ft_log = KGSL_LOG_LEVEL_DEFAULT;

 	debugfs_create_file("log_level_cmd", 0644, device->d_debugfs, device,
 			    &cmd_log_fops);
--- a/drivers/gpu/msm/kgsl_device.h
+++ b/drivers/gpu/msm/kgsl_device.h
@ -1,4 +1,4 @@
-/* Copyright (c) 2002,2007-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2002,2007-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -13,9 +13,11 @@
 #ifndef __KGSL_DEVICE_H
 #define __KGSL_DEVICE_H

+#include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/pm_qos.h>
 #include <linux/earlysuspend.h>
+#include <linux/sched.h>

 #include "kgsl.h"
 #include "kgsl_mmu.h"
@ -62,12 +64,21 @@
 #define KGSL_EVENT_TIMESTAMP_RETIRED 0
 #define KGSL_EVENT_CANCELLED 1

+/*
+ * "list" of event types for ftrace symbolic magic
+ */
+
+#define KGSL_EVENT_TYPES \
+	{ KGSL_EVENT_TIMESTAMP_RETIRED, "retired" }, \
+	{ KGSL_EVENT_CANCELLED, "cancelled" }
+
 struct kgsl_device;
 struct platform_device;
 struct kgsl_device_private;
 struct kgsl_context;
 struct kgsl_power_stats;
 struct kgsl_event;
+struct kgsl_cmdbatch;

 struct kgsl_functable {
 	/* Mandatory functions - these functions must be implemented
@ -79,9 +90,10 @@ struct kgsl_functable {
 	void (*regwrite) (struct kgsl_device *device,
 		unsigned int offsetwords, unsigned int value);
 	int (*idle) (struct kgsl_device *device);
-	unsigned int (*isidle) (struct kgsl_device *device);
+	bool (*isidle) (struct kgsl_device *device);
 	int (*suspend_context) (struct kgsl_device *device);
-	int (*start) (struct kgsl_device *device, unsigned int init_ram);
+	int (*init) (struct kgsl_device *device);
+	int (*start) (struct kgsl_device *device);
 	int (*stop) (struct kgsl_device *device);
 	int (*getproperty) (struct kgsl_device *device,
 		enum kgsl_property_type type, void *value,
@ -92,9 +104,8 @@ struct kgsl_functable {
 	unsigned int (*readtimestamp) (struct kgsl_device *device,
 		struct kgsl_context *context, enum kgsl_timestamp_type type);
 	int (*issueibcmds) (struct kgsl_device_private *dev_priv,
-		struct kgsl_context *context, struct kgsl_ibdesc *ibdesc,
-		unsigned int sizedwords, uint32_t *timestamp,
-		unsigned int flags);
+		struct kgsl_context *context, struct kgsl_cmdbatch *cmdbatch,
+		uint32_t *timestamps);
 	int (*setup_pt)(struct kgsl_device *device,
 		struct kgsl_pagetable *pagetable);
 	void (*cleanup_pt)(struct kgsl_device *device,
@ -106,16 +117,16 @@ struct kgsl_functable {
 	void * (*snapshot)(struct kgsl_device *device, void *snapshot,
 		int *remain, int hang);
 	irqreturn_t (*irq_handler)(struct kgsl_device *device);
+	int (*drain)(struct kgsl_device *device);
 	/* Optional functions - these functions are not mandatory.  The
 	   driver will check that the function pointer is not NULL before
 	   calling the hook */
-	void (*setstate) (struct kgsl_device *device, unsigned int context_id,
+	int (*setstate) (struct kgsl_device *device, unsigned int context_id,
 			uint32_t flags);
-	int (*drawctxt_create) (struct kgsl_device *device,
-		struct kgsl_pagetable *pagetable, struct kgsl_context *context,
-		uint32_t *flags);
-	void (*drawctxt_destroy) (struct kgsl_device *device,
-		struct kgsl_context *context);
+	struct kgsl_context *(*drawctxt_create) (struct kgsl_device_private *,
+						uint32_t *flags);
+	int (*drawctxt_detach) (struct kgsl_context *context);
+	void (*drawctxt_destroy) (struct kgsl_context *context);
 	long (*ioctl) (struct kgsl_device_private *dev_priv,
 		unsigned int cmd, void *data);
 	int (*setproperty) (struct kgsl_device *device,
@ -124,6 +135,8 @@ struct kgsl_functable {
 	int (*postmortem_dump) (struct kgsl_device *device, int manual);
 	int (*next_event)(struct kgsl_device *device,
 		struct kgsl_event *event);
+	void (*drawctxt_sched)(struct kgsl_device *device,
+		struct kgsl_context *context);
 };

 /* MH register values */
@ -147,6 +160,46 @@ struct kgsl_event {
 	unsigned int created;
 };

+/**
+ * struct kgsl_cmdbatch - KGSl command descriptor
+ * @device: KGSL GPU device that the command was created for
+ * @context: KGSL context that created the command
+ * @timestamp: Timestamp assigned to the command
+ * @flags: flags
+ * @priv: Internal flags
+ * @fault_policy: Internal policy describing how to handle this command in case
+ * of a fault
+ * @ibcount: Number of IBs in the command list
+ * @ibdesc: Pointer to the list of IBs
+ * @expires: Point in time when the cmdbatch is considered to be hung
+ * @invalid:  non-zero if the dispatcher determines the command and the owning
+ * context should be invalidated
+ * @refcount: kref structure to maintain the reference count
+ * @synclist: List of context/timestamp tuples to wait for before issuing
+ *
+ * This struture defines an atomic batch of command buffers issued from
+ * userspace.
+ */
+struct kgsl_cmdbatch {
+	struct kgsl_device *device;
+	struct kgsl_context *context;
+	spinlock_t lock;
+	uint32_t timestamp;
+	uint32_t flags;
+	uint32_t priv;
+	uint32_t fault_policy;
+	uint32_t ibcount;
+	struct kgsl_ibdesc *ibdesc;
+	unsigned long expires;
+	int invalid;
+	struct kref refcount;
+	struct list_head synclist;
+};
+
+/* Internal cmdbatch flags */
+
+#define CMDBATCH_FLAG_SKIP BIT(0)
+#define CMDBATCH_FLAG_FORCE_PREAMBLE BIT(1)

 struct kgsl_device {
 	struct device *dev;
@ -174,16 +227,16 @@ struct kgsl_device {
 	uint32_t state;
 	uint32_t requested_state;

-	unsigned int active_cnt;
+	atomic_t active_cnt;
 	struct completion suspend_gate;

 	wait_queue_head_t wait_queue;
 	struct workqueue_struct *work_queue;
 	struct device *parentdev;
-	struct completion ft_gate;
 	struct dentry *d_debugfs;
 	struct idr context_idr;
 	struct early_suspend display_off;
+	rwlock_t context_lock;

 	void *snapshot;		/* Pointer to the snapshot memory region */
 	int snapshot_maxsize;   /* Max size of the snapshot region */
@ -206,7 +259,6 @@ struct kgsl_device {
 	int drv_log;
 	int mem_log;
 	int pwr_log;
-	int ft_log;
 	int pm_dump_enable;
 	struct kgsl_pwrscale pwrscale;
 	struct kobject pwrscale_kobj;
@ -214,6 +266,7 @@ struct kgsl_device {
 	struct work_struct ts_expired_ws;
 	struct list_head events;
 	struct list_head events_pending_list;
+	unsigned int events_last_timestamp;
 	s64 on_time;

 	/* Postmortem Control switches */
@ -229,7 +282,6 @@ void kgsl_check_fences(struct work_struct *work);
 #define KGSL_DEVICE_COMMON_INIT(_dev) \
 	.hwaccess_gate = COMPLETION_INITIALIZER((_dev).hwaccess_gate),\
 	.suspend_gate = COMPLETION_INITIALIZER((_dev).suspend_gate),\
-	.ft_gate = COMPLETION_INITIALIZER((_dev).ft_gate),\
 	.idle_check_ws = __WORK_INITIALIZER((_dev).idle_check_ws,\
 			kgsl_idle_check),\
 	.ts_expired_ws  = __WORK_INITIALIZER((_dev).ts_expired_ws,\
@ -244,37 +296,56 @@ void kgsl_check_fences(struct work_struct *work);
 	.ver_minor = DRIVER_VERSION_MINOR


+/* bits for struct kgsl_context.priv */
+/* the context has been destroyed by userspace and is no longer using the gpu */
+#define KGSL_CONTEXT_DETACHED 0
+/* the context has caused a pagefault */
+#define KGSL_CONTEXT_PAGEFAULT 1
+
 /**
 * struct kgsl_context - Master structure for a KGSL context object
- * @refcount - kref object for reference counting the context
- * @id - integer identifier for the context
- * @dev_priv - pointer to the owning device instance
- * @devctxt - pointer to the device specific context information
- * @reset_status - status indication whether a gpu reset occured and whether
+ * @refcount: kref object for reference counting the context
+ * @id: integer identifier for the context
+ * @priv: in-kernel context flags, use KGSL_CONTEXT_* values
+ * @dev_priv: pointer to the owning device instance
+ * @reset_status: status indication whether a gpu reset occured and whether
 * this context was responsible for causing it
- * @wait_on_invalid_ts - flag indicating if this context has tried to wait on a
+ * @wait_on_invalid_ts: flag indicating if this context has tried to wait on a
 * bad timestamp
- * @timeline - sync timeline used to create fences that can be signaled when a
+ * @timeline: sync timeline used to create fences that can be signaled when a
 * sync_pt timestamp expires
- * @events - list head of pending events for this context
- * @events_list - list node for the list of all contexts that have pending events
+ * @events: list head of pending events for this context
+ * @events_list: list node for the list of all contexts that have pending events
+ * @pid: process that owns this context.
+ * @pagefault: flag set if this context caused a pagefault.
+ * @pagefault_ts: global timestamp of the pagefault, if KGSL_CONTEXT_PAGEFAULT
+ * is set.
 */
 struct kgsl_context {
 	struct kref refcount;
 	uint32_t id;
-	struct kgsl_device_private *dev_priv;
-	void *devctxt;
+	pid_t pid;
+	unsigned long priv;
+	struct kgsl_device *device;
+	struct kgsl_pagetable *pagetable;
 	unsigned int reset_status;
 	bool wait_on_invalid_ts;
 	struct sync_timeline *timeline;
 	struct list_head events;
 	struct list_head events_list;
+	unsigned int pagefault_ts;
 };

 struct kgsl_process_private {
 	unsigned int refcnt;
 	pid_t pid;
 	spinlock_t mem_lock;
+
+	/* General refcount for process private struct obj */
+	struct kref refcount;
+	/* Mutex to synchronize access to each process_private struct obj */
+	struct mutex process_private_mutex;
+
 	struct rb_root mem_rb;
 	struct idr mem_idr;
 	struct kgsl_pagetable *pagetable;
@ -303,6 +374,9 @@ struct kgsl_device *kgsl_get_device(int dev_idx);
 int kgsl_add_event(struct kgsl_device *device, u32 id, u32 ts,
 	kgsl_event_func func, void *priv, void *owner);

+void kgsl_cancel_event(struct kgsl_device *device, struct kgsl_context *context,
+		unsigned int timestamp, kgsl_event_func func, void *priv);
+
 static inline void kgsl_process_add_stats(struct kgsl_process_private *priv,
 	unsigned int type, size_t size)
 {
@ -390,8 +464,6 @@ static inline int kgsl_create_device_workqueue(struct kgsl_device *device)
 	return 0;
 }

-
-
 int kgsl_check_timestamp(struct kgsl_device *device,
 		struct kgsl_context *context, unsigned int timestamp);

@ -416,10 +488,15 @@ kgsl_device_get_drvdata(struct kgsl_device *dev)

 void kgsl_context_destroy(struct kref *kref);

+int kgsl_context_init(struct kgsl_device_private *, struct kgsl_context
+		*context);
+
 /**
- * kgsl_context_put - Release context reference count
- * @context
+ * kgsl_context_put() - Release context reference count
+ * @context: Pointer to the KGSL context to be released
 *
+ * Reduce the reference count on a KGSL context and destroy it if it is no
+ * longer needed
 */
 static inline void
 kgsl_context_put(struct kgsl_context *context)
@ -427,10 +504,26 @@ kgsl_context_put(struct kgsl_context *context)
 	if (context)
 		kref_put(&context->refcount, kgsl_context_destroy);
 }
+
 /**
- * kgsl_context_get - get a pointer to a KGSL context
- * @devicex - Pointer to the KGSL device that owns the context
- * @id - Context ID to return
+ * kgsl_context_detached() - check if a context is detached
+ * @context: the context
+ *
+ * Check if a context has been destroyed by userspace and is only waiting
+ * for reference counts to go away. This check is used to weed out
+ * contexts that shouldn't use the gpu, so NULL is considered detached.
+ */
+static inline bool kgsl_context_detached(struct kgsl_context *context)
+{
+	return (context == NULL || test_bit(KGSL_CONTEXT_DETACHED,
+						&context->priv));
+}
+
+
+/**
+ * kgsl_context_get() - get a pointer to a KGSL context
+ * @device: Pointer to the KGSL device that owns the context
+ * @id: Context ID
 *
 * Find the context associated with the given ID number, increase the reference
 * count on it and return it.  The caller must make sure that this call is
@ -438,26 +531,45 @@ kgsl_context_put(struct kgsl_context *context)
 * doesn't validate the ownership of the context with the calling process - use
 * kgsl_context_get_owner for that
 */
-
 static inline struct kgsl_context *kgsl_context_get(struct kgsl_device *device,
 		uint32_t id)
 {
 	struct kgsl_context *context = NULL;

-	rcu_read_lock();
+	read_lock(&device->context_lock);
+
 	context = idr_find(&device->context_idr, id);

-	if (context)
+	/* Don't return a context that has been detached */
+	if (kgsl_context_detached(context))
+		context = NULL;
+	else
 		kref_get(&context->refcount);

-	rcu_read_unlock();
+	read_unlock(&device->context_lock);
+
 	return context;
 }

 /**
- * kgsl_context_get_owner - get a pointer to a KGSL context
- * @dev_priv - Pointer to the owner of the requesting process
- * @id - Context ID to return
+* _kgsl_context_get() - lightweight function to just increment the ref count
+* @context: Pointer to the KGSL context
+*
+* Get a reference to the specified KGSL context structure. This is a
+* lightweight way to just increase the refcount on a known context rather than
+* walking through kgsl_context_get and searching the iterator
+*/
+static inline void _kgsl_context_get(struct kgsl_context *context)
+{
+	if (context)
+		kref_get(&context->refcount);
+}
+
+/**
+ * kgsl_context_get_owner() - get a pointer to a KGSL context in a specific
+ * process
+ * @dev_priv: Pointer to the process struct
+ * @id: Context ID to return
 *
 * Find the context associated with the given ID number, increase the reference
 * count on it and return it.  The caller must make sure that this call is
@ -472,8 +584,8 @@ static inline struct kgsl_context *kgsl_context_get_owner(

 	context = kgsl_context_get(dev_priv->device, id);

-	/* Verify that the context belongs to the dev_priv instance */
-	if (context && context->dev_priv != dev_priv) {
+	/* Verify that the context belongs to current calling process. */
+	if (context != NULL && context->pid != dev_priv->process_priv->pid) {
 		kgsl_context_put(context);
 		return NULL;
 	}
@ -482,24 +594,12 @@ static inline struct kgsl_context *kgsl_context_get_owner(
 }

 /**
- * kgsl_active_count_put - Decrease the device active count
- * @device: Pointer to a KGSL device
+ * kgsl_context_cancel_events() - Cancel all events for a context
+ * @device:  Pointer to the KGSL device structure for the GPU
+ * @context: Pointer to the KGSL context
 *
- * Decrease the active count for the KGSL device and trigger the suspend_gate
- * completion if it hits zero
+ * Signal all pending events on the context with KGSL_EVENT_CANCELLED
 */
-static inline void
-kgsl_active_count_put(struct kgsl_device *device)
-{
-	if (device->active_cnt == 1)
-		INIT_COMPLETION(device->suspend_gate);
-
-	device->active_cnt--;
-
-	if (device->active_cnt == 0)
-		complete(&device->suspend_gate);
-}
-
 static inline void kgsl_context_cancel_events(struct kgsl_device *device,
 	struct kgsl_context *context)
 {
@ -507,9 +607,9 @@ static inline void kgsl_context_cancel_events(struct kgsl_device *device,
 }

 /**
- * kgsl_context_cancel_events_timestamp - cancel events for a given timestamp
+ * kgsl_context_cancel_events_timestamp() - cancel events for a given timestamp
 * @device: Pointer to the KGSL device that owns the context
- * @cotnext: Pointer to the context that owns the event or NULL for global
+ * @context: Pointer to the context that owns the event or NULL for global
 * @timestamp: Timestamp to cancel events for
 *
 * Cancel events pending for a specific timestamp
@ -519,4 +619,30 @@ static inline void kgsl_cancel_events_timestamp(struct kgsl_device *device,
 {
 	kgsl_signal_event(device, context, timestamp, KGSL_EVENT_CANCELLED);
 }
+
+void kgsl_cmdbatch_destroy(struct kgsl_cmdbatch *cmdbatch);
+
+void kgsl_cmdbatch_destroy_object(struct kref *kref);
+
+/**
+ * kgsl_cmdbatch_put() - Decrement the refcount for a command batch object
+ * @cmdbatch: Pointer to the command batch object
+ */
+static inline void kgsl_cmdbatch_put(struct kgsl_cmdbatch *cmdbatch)
+{
+	kref_put(&cmdbatch->refcount, kgsl_cmdbatch_destroy_object);
+}
+
+/**
+ * kgsl_cmdbatch_sync_pending() - return true if the cmdbatch is waiting
+ * @cmdbatch: Pointer to the command batch object to check
+ *
+ * Return non-zero if the specified command batch is still waiting for sync
+ * point dependencies to be satisfied
+ */
+static inline int kgsl_cmdbatch_sync_pending(struct kgsl_cmdbatch *cmdbatch)
+{
+	return list_empty(&cmdbatch->synclist) ? 0 : 1;
+}
+
 #endif  /* __KGSL_DEVICE_H */
--- a/drivers/gpu/msm/kgsl_events.c
+++ b/drivers/gpu/msm/kgsl_events.c
@ -90,12 +90,12 @@ static struct kgsl_event *_find_event(struct kgsl_device *device,
 }

 /**
- * _signal_event - send a signal to a specific event in the list
- * @device - KGSL device
- * @head - Pointer to the event list to process
- * @timestamp - timestamp of the event to signal
- * @cur - timestamp value to send to the callback
- * @type - Signal ID to send to the callback
+ * _signal_event() - send a signal to a specific event in the list
+ * @device: Pointer to the KGSL device struct
+ * @head: Pointer to the event list to process
+ * @timestamp: timestamp of the event to signal
+ * @cur: timestamp value to send to the callback
+ * @type: Signal ID to send to the callback
 *
 * Send the specified signal to the events in the list with the specified
 * timestamp. The timestamp 'cur' is sent to the callback so it knows
@ -114,12 +114,12 @@ static void _signal_event(struct kgsl_device *device,
 }

 /**
- * _signal_events - send a signal to all the events in a list
- * @device - KGSL device
- * @head - Pointer to the event list to process
- * @timestamp - Timestamp to pass to the events (this should be the current
+ * _signal_events() - send a signal to all the events in a list
+ * @device: Pointer to the KGSL device struct
+ * @head: Pointer to the event list to process
+ * @timestamp: Timestamp to pass to the events (this should be the current
 * timestamp when the signal is sent)
- * @type - Signal ID to send to the callback
+ * @type: Signal ID to send to the callback
 *
 * Send the specified signal to all the events in the list and destroy them
 */
@ -134,6 +134,16 @@ static void _signal_events(struct kgsl_device *device,

 }

+/**
+ * kgsl_signal_event() - send a signal to a specific event in the context
+ * @device: Pointer to the KGSL device struct
+ * @context: Pointer to the KGSL context
+ * @timestamp: Timestamp of the event to signal
+ * @type: Signal ID to send to the callback
+ *
+ * Send the specified signal to all the events in the context with the given
+ * timestamp
+ */
 void kgsl_signal_event(struct kgsl_device *device,
 		struct kgsl_context *context, unsigned int timestamp,
 		unsigned int type)
@ -151,6 +161,14 @@ void kgsl_signal_event(struct kgsl_device *device,
 }
 EXPORT_SYMBOL(kgsl_signal_event);

+/**
+ * kgsl_signal_events() - send a signal to all events in the context
+ * @device: Pointer to the KGSL device struct
+ * @context: Pointer to the KGSL context
+ * @type: Signal ID to send to the callback function
+ *
+ * Send the specified signal to all the events in the context
+ */
 void kgsl_signal_events(struct kgsl_device *device,
 		struct kgsl_context *context, unsigned int type)
 {
@ -192,6 +210,7 @@ EXPORT_SYMBOL(kgsl_signal_events);
 int kgsl_add_event(struct kgsl_device *device, u32 id, u32 ts,
 	kgsl_event_func func, void *priv, void *owner)
 {
+	int ret;
 	struct kgsl_event *event;
 	unsigned int cur_ts;
 	struct kgsl_context *context = NULL;
@ -229,6 +248,17 @@ int kgsl_add_event(struct kgsl_device *device, u32 id, u32 ts,
 		return -ENOMEM;
 	}

+	/*
+	 * Increase the active count on the device to avoid going into power
+	 * saving modes while events are pending
+	 */
+	ret = kgsl_active_count_get_light(device);
+	if (ret < 0) {
+		kgsl_context_put(context);
+		kfree(event);
+		return ret;
+	}
+
 	event->context = context;
 	event->timestamp = ts;
 	event->priv = priv;
@ -255,23 +285,17 @@ int kgsl_add_event(struct kgsl_device *device, u32 id, u32 ts,
 	} else
 		_add_event_to_list(&device->events, event);

-	/*
-	 * Increase the active count on the device to avoid going into power
-	 * saving modes while events are pending
-	 */
-
-	device->active_cnt++;
-
 	queue_work(device->work_queue, &device->ts_expired_ws);
 	return 0;
 }
 EXPORT_SYMBOL(kgsl_add_event);

 /**
- * kgsl_cancel_events - Cancel all generic events for a process
- * @device - KGSL device for the events to cancel
- * @owner - driver instance that owns the events to cancel
+ * kgsl_cancel_events() - Cancel all global events owned by a process
+ * @device: Pointer to the KGSL device struct
+ * @owner: driver instance that owns the events to cancel
 *
+ * Cancel all global events that match the owner pointer
 */
 void kgsl_cancel_events(struct kgsl_device *device, void *owner)
 {
@ -291,6 +315,19 @@ void kgsl_cancel_events(struct kgsl_device *device, void *owner)
 }
 EXPORT_SYMBOL(kgsl_cancel_events);

+/**
+ * kgsl_cancel_event() - send a cancel signal to a specific event
+ * @device: Pointer to the KGSL device struct
+ * @context: Pointer to the KGSL context
+ * @timestamp: Timestamp of the event to cancel
+ * @func: Callback function of the event - this is used to match the actual
+ * event
+ * @priv: Private data for the callback function - this is used to match to the
+ * actual event
+ *
+ * Send the a cancel signal to a specific event that matches all the parameters
+ */
+
 void kgsl_cancel_event(struct kgsl_device *device, struct kgsl_context *context,
 		unsigned int timestamp, kgsl_event_func func,
 		void *priv)
@ -363,10 +400,19 @@ void kgsl_process_events(struct work_struct *work)
 	struct kgsl_context *context, *tmp;
 	uint32_t timestamp;

+	/*
+	 * Bail unless the global timestamp has advanced.  We can safely do this
+	 * outside of the mutex for speed
+	 */
+
+	timestamp = kgsl_readtimestamp(device, NULL, KGSL_TIMESTAMP_RETIRED);
+	if (timestamp == device->events_last_timestamp)
+		return;
+
 	mutex_lock(&device->mutex);

-	/* Process expired global events */
-	timestamp = kgsl_readtimestamp(device, NULL, KGSL_TIMESTAMP_RETIRED);
+	device->events_last_timestamp = timestamp;
+
 	_retire_events(device, &device->events, timestamp);
 	_mark_next_event(device, &device->events);

@ -374,6 +420,11 @@ void kgsl_process_events(struct work_struct *work)
 	list_for_each_entry_safe(context, tmp, &device->events_pending_list,
 		events_list) {

+		/*
+		 * Increment the refcount to make sure that the list_del_init
+		 * is called with a valid context's list
+		 */
+		_kgsl_context_get(context);
 		/*
 		 * If kgsl_timestamp_expired_context returns 0 then it no longer
 		 * has any pending events and can be removed from the list
@ -381,6 +432,7 @@ void kgsl_process_events(struct work_struct *work)

 		if (kgsl_process_context_events(device, context) == 0)
 			list_del_init(&context->events_list);
+		kgsl_context_put(context);
 	}

 	mutex_unlock(&device->mutex);
--- a/drivers/gpu/msm/kgsl_gpummu.c
+++ b/drivers/gpu/msm/kgsl_gpummu.c
@ -465,12 +465,12 @@ err_free_gpummu:
 	return NULL;
 }

-static void kgsl_gpummu_default_setstate(struct kgsl_mmu *mmu,
+static int kgsl_gpummu_default_setstate(struct kgsl_mmu *mmu,
 					uint32_t flags)
 {
 	struct kgsl_gpummu_pt *gpummu_pt;
 	if (!kgsl_mmu_enabled())
-		return;
+		return 0;

 	if (flags & KGSL_MMUFLAGS_PTUPDATE) {
 		kgsl_idle(mmu->device);
@ -483,12 +483,16 @@ static void kgsl_gpummu_default_setstate(struct kgsl_mmu *mmu,
 		/* Invalidate all and tc */
 		kgsl_regwrite(mmu->device, MH_MMU_INVALIDATE,  0x00000003);
 	}
+
+	return 0;
 }

-static void kgsl_gpummu_setstate(struct kgsl_mmu *mmu,
+static int kgsl_gpummu_setstate(struct kgsl_mmu *mmu,
 				struct kgsl_pagetable *pagetable,
 				unsigned int context_id)
 {
+	int ret = 0;
+
 	if (mmu->flags & KGSL_FLAGS_STARTED) {
 		/* page table not current, then setup mmu to use new
 		 *  specified page table
@ -501,10 +505,13 @@ static void kgsl_gpummu_setstate(struct kgsl_mmu *mmu,
 			kgsl_mmu_pt_get_flags(pagetable, mmu->device->id);

 			/* call device specific set page table */
-			kgsl_setstate(mmu, context_id, KGSL_MMUFLAGS_TLBFLUSH |
+			ret = kgsl_setstate(mmu, context_id,
+				KGSL_MMUFLAGS_TLBFLUSH |
 				KGSL_MMUFLAGS_PTUPDATE);
 		}
 	}
+
+	return ret;
 }

 static int kgsl_gpummu_init(struct kgsl_mmu *mmu)
@ -541,6 +548,7 @@ static int kgsl_gpummu_start(struct kgsl_mmu *mmu)

 	struct kgsl_device *device = mmu->device;
 	struct kgsl_gpummu_pt *gpummu_pt;
+	int ret;

 	if (mmu->flags & KGSL_FLAGS_STARTED)
 		return 0;
@ -552,9 +560,6 @@ static int kgsl_gpummu_start(struct kgsl_mmu *mmu)
 	/* setup MMU and sub-client behavior */
 	kgsl_regwrite(device, MH_MMU_CONFIG, mmu->config);

-	/* idle device */
-	kgsl_idle(device);
-
 	/* enable axi interrupts */
 	kgsl_regwrite(device, MH_INTERRUPT_MASK,
 			GSL_MMU_INT_MASK | MH_INTERRUPT_MASK__MMU_PAGE_FAULT);
@ -585,10 +590,12 @@ static int kgsl_gpummu_start(struct kgsl_mmu *mmu)
 	kgsl_regwrite(mmu->device, MH_MMU_VA_RANGE,
 		      (KGSL_PAGETABLE_BASE |
 		      (CONFIG_MSM_KGSL_PAGE_TABLE_SIZE >> 16)));
-	kgsl_setstate(mmu, KGSL_MEMSTORE_GLOBAL, KGSL_MMUFLAGS_TLBFLUSH);
-	mmu->flags |= KGSL_FLAGS_STARTED;

-	return 0;
+	ret = kgsl_setstate(mmu, KGSL_MEMSTORE_GLOBAL, KGSL_MMUFLAGS_TLBFLUSH);
+	if (!ret)
+		mmu->flags |= KGSL_FLAGS_STARTED;
+
+	return ret;
 }

 static int
@ -598,7 +605,7 @@ kgsl_gpummu_unmap(void *mmu_specific_pt,
 {
 	unsigned int numpages;
 	unsigned int pte, ptefirst, ptelast, superpte;
-	unsigned int range = kgsl_sg_size(memdesc->sg, memdesc->sglen);
+	unsigned int range = memdesc->size;
 	struct kgsl_gpummu_pt *gpummu_pt = mmu_specific_pt;

 	/* All GPU addresses as assigned are page aligned, but some
--- a/drivers/gpu/msm/kgsl_iommu.c
+++ b/drivers/gpu/msm/kgsl_iommu.c
@ -32,6 +32,7 @@
 #include "adreno.h"
 #include "kgsl_trace.h"
 #include "z180.h"
+#include "kgsl_cffdump.h"


 static struct kgsl_iommu_register_list kgsl_iommuv1_reg[KGSL_IOMMU_REG_MAX] = {
@ -62,6 +63,13 @@ static struct kgsl_iommu_register_list kgsl_iommuv2_reg[KGSL_IOMMU_REG_MAX] = {

 struct remote_iommu_petersons_spinlock kgsl_iommu_sync_lock_vars;

+/*
+ * One page allocation for a guard region to protect against over-zealous
+ * GPU pre-fetch
+ */
+
+static struct page *kgsl_guard_page;
+
 static int get_iommu_unit(struct device *dev, struct kgsl_mmu **mmu_out,
 			struct kgsl_iommu_unit **iommu_unit_out)
 {
@ -109,6 +117,170 @@ static struct kgsl_iommu_device *get_iommu_device(struct kgsl_iommu_unit *unit,
 	return NULL;
 }

+/* These functions help find the nearest allocated memory entries on either side
+ * of a faulting address. If we know the nearby allocations memory we can
+ * get a better determination of what we think should have been located in the
+ * faulting region
+ */
+
+/*
+ * A local structure to make it easy to store the interesting bits for the
+ * memory entries on either side of the faulting address
+ */
+
+struct _mem_entry {
+	unsigned int gpuaddr;
+	unsigned int size;
+	unsigned int flags;
+	unsigned int priv;
+	pid_t pid;
+};
+
+/*
+ * Find the closest alloated memory block with an smaller GPU address then the
+ * given address
+ */
+
+static void _prev_entry(struct kgsl_process_private *priv,
+	unsigned int faultaddr, struct _mem_entry *ret)
+{
+	struct rb_node *node;
+	struct kgsl_mem_entry *entry;
+
+	for (node = rb_first(&priv->mem_rb); node; ) {
+		entry = rb_entry(node, struct kgsl_mem_entry, node);
+
+		if (entry->memdesc.gpuaddr > faultaddr)
+			break;
+
+		/*
+		 * If this is closer to the faulting address, then copy
+		 * the entry
+		 */
+
+		if (entry->memdesc.gpuaddr > ret->gpuaddr) {
+			ret->gpuaddr = entry->memdesc.gpuaddr;
+			ret->size = entry->memdesc.size;
+			ret->flags = entry->memdesc.flags;
+			ret->priv = entry->memdesc.priv;
+			ret->pid = priv->pid;
+		}
+
+		node = rb_next(&entry->node);
+	}
+}
+
+/*
+ * Find the closest alloated memory block with a greater starting GPU address
+ * then the given address
+ */
+
+static void _next_entry(struct kgsl_process_private *priv,
+	unsigned int faultaddr, struct _mem_entry *ret)
+{
+	struct rb_node *node;
+	struct kgsl_mem_entry *entry;
+
+	for (node = rb_last(&priv->mem_rb); node; ) {
+		entry = rb_entry(node, struct kgsl_mem_entry, node);
+
+		if (entry->memdesc.gpuaddr < faultaddr)
+			break;
+
+		/*
+		 * If this is closer to the faulting address, then copy
+		 * the entry
+		 */
+
+		if (entry->memdesc.gpuaddr < ret->gpuaddr) {
+			ret->gpuaddr = entry->memdesc.gpuaddr;
+			ret->size = entry->memdesc.size;
+			ret->flags = entry->memdesc.flags;
+			ret->priv = entry->memdesc.priv;
+			ret->pid = priv->pid;
+		}
+
+		node = rb_prev(&entry->node);
+	}
+}
+
+static void _find_mem_entries(struct kgsl_mmu *mmu, unsigned int faultaddr,
+	unsigned int ptbase, struct _mem_entry *preventry,
+	struct _mem_entry *nextentry)
+{
+	struct kgsl_process_private *private;
+	int id = kgsl_mmu_get_ptname_from_ptbase(mmu, ptbase);
+
+	memset(preventry, 0, sizeof(*preventry));
+	memset(nextentry, 0, sizeof(*nextentry));
+
+	/* Set the maximum possible size as an initial value */
+	nextentry->gpuaddr = 0xFFFFFFFF;
+
+	mutex_lock(&kgsl_driver.process_mutex);
+
+	list_for_each_entry(private, &kgsl_driver.process_list, list) {
+
+		if (private->pagetable->name != id)
+			continue;
+
+		spin_lock(&private->mem_lock);
+		_prev_entry(private, faultaddr, preventry);
+		_next_entry(private, faultaddr, nextentry);
+		spin_unlock(&private->mem_lock);
+	}
+
+	mutex_unlock(&kgsl_driver.process_mutex);
+}
+
+static void _print_entry(struct kgsl_device *device, struct _mem_entry *entry)
+{
+	char name[32];
+	memset(name, 0, sizeof(name));
+
+	kgsl_get_memory_usage(name, sizeof(name) - 1, entry->flags);
+
+	KGSL_LOG_DUMP(device,
+		"[%8.8X - %8.8X] %s (pid = %d) (%s)\n",
+		entry->gpuaddr,
+		entry->gpuaddr + entry->size,
+		entry->priv & KGSL_MEMDESC_GUARD_PAGE ? "(+guard)" : "",
+		entry->pid, name);
+}
+
+static void _check_if_freed(struct kgsl_iommu_device *iommu_dev,
+	unsigned long addr, unsigned int pid)
+{
+	void *base = kgsl_driver.memfree_hist.base_hist_rb;
+	struct kgsl_memfree_hist_elem *wptr;
+	struct kgsl_memfree_hist_elem *p;
+
+	mutex_lock(&kgsl_driver.memfree_hist_mutex);
+	wptr = kgsl_driver.memfree_hist.wptr;
+	p = wptr;
+	for (;;) {
+		if (p->size && p->pid == pid)
+			if (addr >= p->gpuaddr &&
+				addr < (p->gpuaddr + p->size)) {
+
+				KGSL_LOG_DUMP(iommu_dev->kgsldev,
+					"---- premature free ----\n");
+				KGSL_LOG_DUMP(iommu_dev->kgsldev,
+					"[%8.8X-%8.8X] was already freed by pid %d\n",
+					p->gpuaddr,
+					p->gpuaddr + p->size,
+					p->pid);
+			}
+		p++;
+		if ((void *)p >= base + kgsl_driver.memfree_hist.size)
+			p = (struct kgsl_memfree_hist_elem *) base;
+
+		if (p == kgsl_driver.memfree_hist.wptr)
+			break;
+	}
+	mutex_unlock(&kgsl_driver.memfree_hist_mutex);
+}
+
 static int kgsl_iommu_fault_handler(struct iommu_domain *domain,
 	struct device *dev, unsigned long addr, int flags)
 {
@ -124,6 +296,7 @@ static int kgsl_iommu_fault_handler(struct iommu_domain *domain,
 	unsigned int pid;
 	unsigned int fsynr0, fsynr1;
 	int write;
+	struct _mem_entry prev, next;

 	ret = get_iommu_unit(dev, &mmu, &iommu_unit);
 	if (ret)
@ -168,6 +341,24 @@ static int kgsl_iommu_fault_handler(struct iommu_domain *domain,
 			write ? "write" : "read");
 	}

+	_check_if_freed(iommu_dev, addr, pid);
+
+	KGSL_LOG_DUMP(iommu_dev->kgsldev, "---- nearby memory ----\n");
+
+	_find_mem_entries(mmu, addr, ptbase, &prev, &next);
+
+	if (prev.gpuaddr)
+		_print_entry(iommu_dev->kgsldev, &prev);
+	else
+		KGSL_LOG_DUMP(iommu_dev->kgsldev, "*EMPTY*\n");
+
+	KGSL_LOG_DUMP(iommu_dev->kgsldev, " <- fault @ %8.8lX\n", addr);
+
+	if (next.gpuaddr != 0xFFFFFFFF)
+		_print_entry(iommu_dev->kgsldev, &next);
+	else
+		KGSL_LOG_DUMP(iommu_dev->kgsldev, "*EMPTY*\n");
+
 	mmu->fault = 1;
 	iommu_dev->fault = 1;

@ -648,13 +839,10 @@ static int kgsl_iommu_init_sync_lock(struct kgsl_mmu *mmu)
 		return status;

 	/* Map Lock variables to GPU pagetable */
-	iommu->sync_lock_desc.priv |= KGSL_MEMDESC_GLOBAL;
-
 	pagetable = mmu->priv_bank_table ? mmu->priv_bank_table :
 				mmu->defaultpagetable;

-	status = kgsl_mmu_map(pagetable, &iommu->sync_lock_desc,
-				     GSL_PT_PAGE_RV | GSL_PT_PAGE_WV);
+	status = kgsl_mmu_map_global(pagetable, &iommu->sync_lock_desc);

 	if (status) {
 		kgsl_mmu_unmap(pagetable, &iommu->sync_lock_desc);
@ -914,10 +1102,12 @@ static int kgsl_iommu_get_pt_lsb(struct kgsl_mmu *mmu,
 	return 0;
 }

-static void kgsl_iommu_setstate(struct kgsl_mmu *mmu,
+static int kgsl_iommu_setstate(struct kgsl_mmu *mmu,
 				struct kgsl_pagetable *pagetable,
 				unsigned int context_id)
 {
+	int ret = 0;
+
 	if (mmu->flags & KGSL_FLAGS_STARTED) {
 		/* page table not current, then setup mmu to use new
 		 *  specified page table
@ -928,10 +1118,12 @@ static void kgsl_iommu_setstate(struct kgsl_mmu *mmu,
 			flags |= kgsl_mmu_pt_get_flags(mmu->hwpagetable,
 							mmu->device->id) |
 							KGSL_MMUFLAGS_TLBFLUSH;
-			kgsl_setstate(mmu, context_id,
+			ret = kgsl_setstate(mmu, context_id,
 				KGSL_MMUFLAGS_PTUPDATE | flags);
 		}
 	}
+
+	return ret;
 }

 /*
@ -959,23 +1151,18 @@ static int kgsl_iommu_setup_regs(struct kgsl_mmu *mmu,
 		return 0;

 	for (i = 0; i < iommu->unit_count; i++) {
-		iommu->iommu_units[i].reg_map.priv |= KGSL_MEMDESC_GLOBAL;
-		status = kgsl_mmu_map(pt,
-				&(iommu->iommu_units[i].reg_map),
-				GSL_PT_PAGE_RV | GSL_PT_PAGE_WV);
-		if (status) {
-			iommu->iommu_units[i].reg_map.priv &=
-				~KGSL_MEMDESC_GLOBAL;
+		status = kgsl_mmu_map_global(pt,
+				&(iommu->iommu_units[i].reg_map));
+		if (status)
 			goto err;
-		}
 	}
+
 	return 0;
 err:
-	for (i--; i >= 0; i--) {
+	for (i--; i >= 0; i--)
 		kgsl_mmu_unmap(pt,
 				&(iommu->iommu_units[i].reg_map));
-		iommu->iommu_units[i].reg_map.priv &= ~KGSL_MEMDESC_GLOBAL;
-	}
+
 	return status;
 }

@ -1049,6 +1236,15 @@ static int kgsl_iommu_init(struct kgsl_mmu *mmu)
 		iommu_ops.mmu_cleanup_pt = kgsl_iommu_cleanup_regs;
 	}

+	if (kgsl_guard_page == NULL) {
+		kgsl_guard_page = alloc_page(GFP_KERNEL | __GFP_ZERO |
+				__GFP_HIGHMEM);
+		if (kgsl_guard_page == NULL) {
+			status = -ENOMEM;
+			goto done;
+		}
+	}
+
 	dev_info(mmu->device->dev, "|%s| MMU type set for device is IOMMU\n",
 			__func__);
 done:
@ -1242,8 +1438,6 @@ static int kgsl_iommu_start(struct kgsl_mmu *mmu)

 		kgsl_regwrite(mmu->device, MH_MMU_MPU_END,
 			mh->mpu_base + mh->mpu_range);
-	} else {
-		kgsl_regwrite(mmu->device, MH_MMU_CONFIG, 0x00000000);
 	}

 	mmu->hwpagetable = mmu->defaultpagetable;
@ -1282,6 +1476,10 @@ static int kgsl_iommu_start(struct kgsl_mmu *mmu)
 	kgsl_iommu_lock_rb_in_tlb(mmu);
 	msm_iommu_unlock();

+	/* For complete CFF */
+	kgsl_cffdump_setmem(mmu->setstate_memory.gpuaddr +
+				KGSL_IOMMU_SETSTATE_NOP_OFFSET,
+				cp_nop_packet(1), sizeof(unsigned int));

 	kgsl_iommu_disable_clk_on_ts(mmu, 0, false);
 	mmu->flags |= KGSL_FLAGS_STARTED;
@ -1300,7 +1498,7 @@ kgsl_iommu_unmap(void *mmu_specific_pt,
 		unsigned int *tlb_flags)
 {
 	int ret;
-	unsigned int range = kgsl_sg_size(memdesc->sg, memdesc->sglen);
+	unsigned int range = memdesc->size;
 	struct kgsl_iommu_pt *iommu_pt = mmu_specific_pt;

 	/* All GPU addresses as assigned are page aligned, but some
@ -1312,6 +1510,9 @@ kgsl_iommu_unmap(void *mmu_specific_pt,
 	if (range == 0 || gpuaddr == 0)
 		return 0;

+	if (kgsl_memdesc_has_guard_page(memdesc))
+		range += PAGE_SIZE;
+
 	ret = iommu_unmap_range(iommu_pt->domain, gpuaddr, range);
 	if (ret)
 		KGSL_CORE_ERR("iommu_unmap_range(%p, %x, %d) failed "
@ -1336,26 +1537,35 @@ kgsl_iommu_map(void *mmu_specific_pt,
 	int ret;
 	unsigned int iommu_virt_addr;
 	struct kgsl_iommu_pt *iommu_pt = mmu_specific_pt;
-	int size = kgsl_sg_size(memdesc->sg, memdesc->sglen);
-	unsigned int iommu_flags = IOMMU_READ;
+	int size = memdesc->size;

 	BUG_ON(NULL == iommu_pt);

-	if (protflags & GSL_PT_PAGE_WV)
-		iommu_flags |= IOMMU_WRITE;
-
 	iommu_virt_addr = memdesc->gpuaddr;

 	ret = iommu_map_range(iommu_pt->domain, iommu_virt_addr, memdesc->sg,
-				size, iommu_flags);
+				size, protflags);
 	if (ret) {
-		KGSL_CORE_ERR("iommu_map_range(%p, %x, %p, %d, %d) "
-				"failed with err: %d\n", iommu_pt->domain,
-				iommu_virt_addr, memdesc->sg, size,
-				iommu_flags, ret);
+		KGSL_CORE_ERR("iommu_map_range(%p, %x, %p, %d, %x) err: %d\n",
+			iommu_pt->domain, iommu_virt_addr, memdesc->sg, size,
+			protflags, ret);
 		return ret;
 	}
-
+	if (kgsl_memdesc_has_guard_page(memdesc)) {
+		ret = iommu_map(iommu_pt->domain, iommu_virt_addr + size,
+				page_to_phys(kgsl_guard_page), PAGE_SIZE,
+				protflags & ~IOMMU_WRITE);
+		if (ret) {
+			KGSL_CORE_ERR("iommu_map(%p, %x, %x, %x) err: %d\n",
+				iommu_pt->domain, iommu_virt_addr + size,
+				page_to_phys(kgsl_guard_page),
+				protflags & ~IOMMU_WRITE,
+				ret);
+			/* cleanup the partial mapping */
+			iommu_unmap_range(iommu_pt->domain, iommu_virt_addr,
+					  size);
+		}
+	}
 	return ret;
 }

@ -1424,6 +1634,11 @@ static int kgsl_iommu_close(struct kgsl_mmu *mmu)

 	kfree(iommu);

+	if (kgsl_guard_page != NULL) {
+		__free_page(kgsl_guard_page);
+		kgsl_guard_page = NULL;
+	}
+
 	return 0;
 }

@ -1459,19 +1674,22 @@ kgsl_iommu_get_current_ptbase(struct kgsl_mmu *mmu)
 * cpu
 * Return - void
 */
-static void kgsl_iommu_default_setstate(struct kgsl_mmu *mmu,
+static int kgsl_iommu_default_setstate(struct kgsl_mmu *mmu,
 					uint32_t flags)
 {
 	struct kgsl_iommu *iommu = mmu->priv;
 	int temp;
 	int i;
+	int ret = 0;
 	unsigned int pt_base = kgsl_iommu_get_pt_base_addr(mmu,
 						mmu->hwpagetable);
 	unsigned int pt_val;

-	if (kgsl_iommu_enable_clk(mmu, KGSL_IOMMU_CONTEXT_USER)) {
+	ret = kgsl_iommu_enable_clk(mmu, KGSL_IOMMU_CONTEXT_USER);
+
+	if (ret) {
 		KGSL_DRV_ERR(mmu->device, "Failed to enable iommu clocks\n");
-		return;
+		return ret;
 	}
 	/* Mask off the lsb of the pt base address since lsb will not change */
 	pt_base &= (iommu->iommu_reg_list[KGSL_IOMMU_CTX_TTBR0].reg_mask <<
@ -1514,6 +1732,7 @@ static void kgsl_iommu_default_setstate(struct kgsl_mmu *mmu,

 	/* Disable smmu clock */
 	kgsl_iommu_disable_clk_on_ts(mmu, 0, false);
+	return ret;
 }

 /*
@ -1555,6 +1774,7 @@ struct kgsl_mmu_ops iommu_ops = {
 	.mmu_pagefault = NULL,
 	.mmu_get_current_ptbase = kgsl_iommu_get_current_ptbase,
 	.mmu_enable_clk = kgsl_iommu_enable_clk,
+	.mmu_disable_clk = kgsl_iommu_disable_clk,
 	.mmu_disable_clk_on_ts = kgsl_iommu_disable_clk_on_ts,
 	.mmu_get_pt_lsb = kgsl_iommu_get_pt_lsb,
 	.mmu_get_reg_gpuaddr = kgsl_iommu_get_reg_gpuaddr,
--- a/drivers/gpu/msm/kgsl_log.h
+++ b/drivers/gpu/msm/kgsl_log.h
@ -103,15 +103,6 @@ KGSL_LOG_ERR(_dev->dev, _dev->pwr_log, fmt, ##args)
 #define KGSL_PWR_CRIT(_dev, fmt, args...) \
 KGSL_LOG_CRIT(_dev->dev, _dev->pwr_log, fmt, ##args)

-#define KGSL_FT_INFO(_dev, fmt, args...) \
-KGSL_LOG_INFO(_dev->dev, _dev->ft_log, fmt, ##args)
-#define KGSL_FT_WARN(_dev, fmt, args...) \
-KGSL_LOG_WARN(_dev->dev, _dev->ft_log, fmt, ##args)
-#define KGSL_FT_ERR(_dev, fmt, args...) \
-KGSL_LOG_ERR(_dev->dev, _dev->ft_log, fmt, ##args)
-#define KGSL_FT_CRIT(_dev, fmt, args...) \
-KGSL_LOG_CRIT(_dev->dev, _dev->ft_log, fmt, ##args)
-
 /* Core error messages - these are for core KGSL functions that have
   no device associated with them (such as memory) */

--- a/drivers/gpu/msm/kgsl_mmu.c
+++ b/drivers/gpu/msm/kgsl_mmu.c
@ -1,4 +1,4 @@
-/* Copyright (c) 2002,2007-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2002,2007-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -560,7 +560,7 @@ void kgsl_mmu_putpagetable(struct kgsl_pagetable *pagetable)
 }
 EXPORT_SYMBOL(kgsl_mmu_putpagetable);

-void kgsl_setstate(struct kgsl_mmu *mmu, unsigned int context_id,
+int kgsl_setstate(struct kgsl_mmu *mmu, unsigned int context_id,
 			uint32_t flags)
 {
 	struct kgsl_device *device = mmu->device;
@ -568,14 +568,16 @@ void kgsl_setstate(struct kgsl_mmu *mmu, unsigned int context_id,

 	if (!(flags & (KGSL_MMUFLAGS_TLBFLUSH | KGSL_MMUFLAGS_PTUPDATE))
 		&& !adreno_is_a2xx(adreno_dev))
-		return;
+		return 0;

 	if (KGSL_MMU_TYPE_NONE == kgsl_mmu_type)
-		return;
+		return 0;
 	else if (device->ftbl->setstate)
-		device->ftbl->setstate(device, context_id, flags);
+		return device->ftbl->setstate(device, context_id, flags);
 	else if (mmu->mmu_ops->mmu_device_setstate)
-		mmu->mmu_ops->mmu_device_setstate(mmu, flags);
+		return mmu->mmu_ops->mmu_device_setstate(mmu, flags);
+
+	return 0;
 }
 EXPORT_SYMBOL(kgsl_setstate);

@ -584,7 +586,6 @@ void kgsl_mh_start(struct kgsl_device *device)
 	struct kgsl_mh *mh = &device->mh;
 	/* force mmu off to for now*/
 	kgsl_regwrite(device, MH_MMU_CONFIG, 0);
-	kgsl_idle(device);

 	/* define physical memory range accessible by the core */
 	kgsl_regwrite(device, MH_MMU_MPU_BASE, mh->mpu_base);
@ -605,16 +606,17 @@ void kgsl_mh_start(struct kgsl_device *device)
 	 * kgsl_pwrctrl_irq() is called
 	 */
 }
+EXPORT_SYMBOL(kgsl_mh_start);

 int
 kgsl_mmu_map(struct kgsl_pagetable *pagetable,
-				struct kgsl_memdesc *memdesc,
-				unsigned int protflags)
+				struct kgsl_memdesc *memdesc)
 {
 	int ret;
 	struct gen_pool *pool = NULL;
 	int size;
 	int page_align = ilog2(PAGE_SIZE);
+	unsigned int protflags = kgsl_memdesc_protflags(memdesc);

 	if (kgsl_mmu_type == KGSL_MMU_TYPE_NONE) {
 		if (memdesc->sglen == 1) {
@ -634,7 +636,10 @@ kgsl_mmu_map(struct kgsl_pagetable *pagetable,
 		}
 	}

-	size = kgsl_sg_size(memdesc->sg, memdesc->sglen);
+	/* Add space for the guard page when allocating the mmu VA. */
+	size = memdesc->size;
+	if (kgsl_memdesc_has_guard_page(memdesc))
+		size += PAGE_SIZE;

 	pool = pagetable->pool;

@ -732,7 +737,10 @@ kgsl_mmu_unmap(struct kgsl_pagetable *pagetable,
 		return 0;
 	}

-	size = kgsl_sg_size(memdesc->sg, memdesc->sglen);
+	/* Add space for the guard page when freeing the mmu VA. */
+	size = memdesc->size;
+	if (kgsl_memdesc_has_guard_page(memdesc))
+		size += PAGE_SIZE;

 	start_addr = memdesc->gpuaddr;
 	end_addr = (memdesc->gpuaddr + size);
@ -777,7 +785,7 @@ kgsl_mmu_unmap(struct kgsl_pagetable *pagetable,
 EXPORT_SYMBOL(kgsl_mmu_unmap);

 int kgsl_mmu_map_global(struct kgsl_pagetable *pagetable,
-			struct kgsl_memdesc *memdesc, unsigned int protflags)
+			struct kgsl_memdesc *memdesc)
 {
 	int result = -EINVAL;
 	unsigned int gpuaddr = 0;
@ -789,11 +797,10 @@ int kgsl_mmu_map_global(struct kgsl_pagetable *pagetable,
 	/* Not all global mappings are needed for all MMU types */
 	if (!memdesc->size)
 		return 0;
-
 	gpuaddr = memdesc->gpuaddr;
 	memdesc->priv |= KGSL_MEMDESC_GLOBAL;

-	result = kgsl_mmu_map(pagetable, memdesc, protflags);
+	result = kgsl_mmu_map(pagetable, memdesc);
 	if (result)
 		goto error;

--- a/drivers/gpu/msm/kgsl_mmu.h
+++ b/drivers/gpu/msm/kgsl_mmu.h
@ -125,10 +125,10 @@ struct kgsl_mmu_ops {
 	int (*mmu_close) (struct kgsl_mmu *mmu);
 	int (*mmu_start) (struct kgsl_mmu *mmu);
 	void (*mmu_stop) (struct kgsl_mmu *mmu);
-	void (*mmu_setstate) (struct kgsl_mmu *mmu,
+	int (*mmu_setstate) (struct kgsl_mmu *mmu,
 		struct kgsl_pagetable *pagetable,
 		unsigned int context_id);
-	void (*mmu_device_setstate) (struct kgsl_mmu *mmu,
+	int (*mmu_device_setstate) (struct kgsl_mmu *mmu,
 					uint32_t flags);
 	void (*mmu_pagefault) (struct kgsl_mmu *mmu);
 	unsigned int (*mmu_get_current_ptbase)
@ -137,6 +137,8 @@ struct kgsl_mmu_ops {
 		(struct kgsl_mmu *mmu, uint32_t ts, bool ts_valid);
 	int (*mmu_enable_clk)
 		(struct kgsl_mmu *mmu, int ctx_id);
+	void (*mmu_disable_clk)
+		(struct kgsl_mmu *mmu);
 	int (*mmu_get_pt_lsb)(struct kgsl_mmu *mmu,
 				unsigned int unit_id,
 				enum kgsl_iommu_context_id ctx_id);
@ -204,14 +206,13 @@ int kgsl_mmu_init(struct kgsl_device *device);
 int kgsl_mmu_start(struct kgsl_device *device);
 int kgsl_mmu_close(struct kgsl_device *device);
 int kgsl_mmu_map(struct kgsl_pagetable *pagetable,
-		 struct kgsl_memdesc *memdesc,
-		 unsigned int protflags);
+		 struct kgsl_memdesc *memdesc);
 int kgsl_mmu_map_global(struct kgsl_pagetable *pagetable,
-			struct kgsl_memdesc *memdesc, unsigned int protflags);
+			struct kgsl_memdesc *memdesc);
 int kgsl_mmu_unmap(struct kgsl_pagetable *pagetable,
 		    struct kgsl_memdesc *memdesc);
 unsigned int kgsl_virtaddr_to_physaddr(void *virtaddr);
-void kgsl_setstate(struct kgsl_mmu *mmu, unsigned int context_id,
+int kgsl_setstate(struct kgsl_mmu *mmu, unsigned int context_id,
 			uint32_t flags);
 int kgsl_mmu_get_ptname_from_ptbase(struct kgsl_mmu *mmu,
 					unsigned int pt_base);
@ -240,19 +241,23 @@ static inline unsigned int kgsl_mmu_get_current_ptbase(struct kgsl_mmu *mmu)
 		return 0;
 }

-static inline void kgsl_mmu_setstate(struct kgsl_mmu *mmu,
+static inline int kgsl_mmu_setstate(struct kgsl_mmu *mmu,
 			struct kgsl_pagetable *pagetable,
 			unsigned int context_id)
 {
 	if (mmu->mmu_ops && mmu->mmu_ops->mmu_setstate)
-		mmu->mmu_ops->mmu_setstate(mmu, pagetable, context_id);
+		return mmu->mmu_ops->mmu_setstate(mmu, pagetable, context_id);
+
+	return 0;
 }

-static inline void kgsl_mmu_device_setstate(struct kgsl_mmu *mmu,
+static inline int kgsl_mmu_device_setstate(struct kgsl_mmu *mmu,
 						uint32_t flags)
 {
 	if (mmu->mmu_ops && mmu->mmu_ops->mmu_device_setstate)
-		mmu->mmu_ops->mmu_device_setstate(mmu, flags);
+		return mmu->mmu_ops->mmu_device_setstate(mmu, flags);
+
+	return 0;
 }

 static inline void kgsl_mmu_stop(struct kgsl_mmu *mmu)
@ -299,6 +304,12 @@ static inline int kgsl_mmu_enable_clk(struct kgsl_mmu *mmu,
 		return 0;
 }

+static inline void kgsl_mmu_disable_clk(struct kgsl_mmu *mmu)
+{
+	if (mmu->mmu_ops && mmu->mmu_ops->mmu_disable_clk)
+		mmu->mmu_ops->mmu_disable_clk(mmu);
+}
+
 static inline void kgsl_mmu_disable_clk_on_ts(struct kgsl_mmu *mmu,
 						unsigned int ts, bool ts_valid)
 {
--- a/drivers/gpu/msm/kgsl_pwrctrl.c
+++ b/drivers/gpu/msm/kgsl_pwrctrl.c
@ -1012,6 +1012,14 @@ void kgsl_pwrctrl_close(struct kgsl_device *device)
 	pwr->power_flags = 0;
 }

+/**
+ * kgsl_idle_check() - Work function for GPU interrupts and idle timeouts.
+ * @device: The device
+ *
+ * This function is called for work that is queued by the interrupt
+ * handler or the idle timer. It attempts to transition to a clocks
+ * off state if the active_cnt is 0 and the hardware is idle.
+ */
 void kgsl_idle_check(struct work_struct *work)
 {
 	struct kgsl_device *device = container_of(work, struct kgsl_device,
@ -1021,15 +1029,24 @@ void kgsl_idle_check(struct work_struct *work)
 		return;

 	mutex_lock(&device->mutex);
-	if (device->state & (KGSL_STATE_ACTIVE | KGSL_STATE_NAP)) {
-		kgsl_pwrscale_idle(device);

+	kgsl_pwrscale_idle(device);
+
+	if (device->state == KGSL_STATE_ACTIVE
+		|| device->state == KGSL_STATE_NAP) {
+
+		/* If we failed to sleep then reset the timer and try again */
 		if (kgsl_pwrctrl_sleep(device) != 0) {
+
+			kgsl_pwrctrl_request_state(device, KGSL_STATE_NONE);
+
 			mod_timer(&device->idle_timer,
 					jiffies +
 					device->pwrctrl.interval_timeout);
-			/* If the GPU has been too busy to sleep, make sure *
-			 * that is acurately reflected in the % busy numbers. */
+			/*
+			 * If the GPU has been too busy to sleep, make sure
+			 * that is acurately reflected in the % busy numbers.
+			 */
 			device->pwrctrl.clk_stats.no_nap_cnt++;
 			if (device->pwrctrl.clk_stats.no_nap_cnt >
 							 UPDATE_BUSY) {
@ -1037,13 +1054,11 @@ void kgsl_idle_check(struct work_struct *work)
 				device->pwrctrl.clk_stats.no_nap_cnt = 0;
 			}
 		}
-	} else if (device->state & (KGSL_STATE_HUNG |
-					KGSL_STATE_DUMP_AND_FT)) {
-		kgsl_pwrctrl_request_state(device, KGSL_STATE_NONE);
 	}

 	mutex_unlock(&device->mutex);
 }
+EXPORT_SYMBOL(kgsl_idle_check);

 void kgsl_timer(unsigned long data)
 {
@ -1061,54 +1076,26 @@ void kgsl_timer(unsigned long data)
 	}
 }

+
+/**
+ * kgsl_pre_hwaccess - Enforce preconditions for touching registers
+ * @device: The device
+ *
+ * This function ensures that the correct lock is held and that the GPU
+ * clock is on immediately before a register is read or written. Note
+ * that this function does not check active_cnt because the registers
+ * must be accessed during device start and stop, when the active_cnt
+ * may legitimately be 0.
+ */
 void kgsl_pre_hwaccess(struct kgsl_device *device)
 {
+	/* In order to touch a register you must hold the device mutex...*/
 	BUG_ON(!mutex_is_locked(&device->mutex));
-	switch (device->state) {
-	case KGSL_STATE_ACTIVE:
-		return;
-	case KGSL_STATE_NAP:
-	case KGSL_STATE_SLEEP:
-	case KGSL_STATE_SLUMBER:
-		kgsl_pwrctrl_wake(device);
-		break;
-	case KGSL_STATE_SUSPEND:
-		kgsl_check_suspended(device);
-		break;
-	case KGSL_STATE_INIT:
-	case KGSL_STATE_HUNG:
-	case KGSL_STATE_DUMP_AND_FT:
-		if (test_bit(KGSL_PWRFLAGS_CLK_ON,
-					 &device->pwrctrl.power_flags))
-			break;
-		else
-			KGSL_PWR_ERR(device,
-					"hw access while clocks off from state %d\n",
-					device->state);
-		break;
-	default:
-		KGSL_PWR_ERR(device, "hw access while in unknown state %d\n",
-					 device->state);
-		break;
-	}
+	/* and have the clock on! */
+	BUG_ON(!test_bit(KGSL_PWRFLAGS_CLK_ON, &device->pwrctrl.power_flags));
 }
 EXPORT_SYMBOL(kgsl_pre_hwaccess);

-void kgsl_check_suspended(struct kgsl_device *device)
-{
-	if (device->requested_state == KGSL_STATE_SUSPEND ||
-				device->state == KGSL_STATE_SUSPEND) {
-		mutex_unlock(&device->mutex);
-		wait_for_completion(&device->hwaccess_gate);
-		mutex_lock(&device->mutex);
-	} else if (device->state == KGSL_STATE_DUMP_AND_FT) {
-		mutex_unlock(&device->mutex);
-		wait_for_completion(&device->ft_gate);
-		mutex_lock(&device->mutex);
-	} else if (device->state == KGSL_STATE_SLUMBER)
-		kgsl_pwrctrl_wake(device);
-}
-
 static int
 _nap(struct kgsl_device *device)
 {
@ -1187,6 +1174,8 @@ _slumber(struct kgsl_device *device)
 	case KGSL_STATE_NAP:
 	case KGSL_STATE_SLEEP:
 		del_timer_sync(&device->idle_timer);
+		/* make sure power is on to stop the device*/
+		kgsl_pwrctrl_enable(device);
 		device->ftbl->suspend_context(device);
 		device->ftbl->stop(device);
 		_sleep_accounting(device);
@ -1236,9 +1225,9 @@ EXPORT_SYMBOL(kgsl_pwrctrl_sleep);

 /******************************************************************/
 /* Caller must hold the device mutex. */
-void kgsl_pwrctrl_wake(struct kgsl_device *device)
+int kgsl_pwrctrl_wake(struct kgsl_device *device)
 {
-	int status;
+	int status = 0;
 	unsigned int context_id;
 	unsigned int state = device->state;
 	unsigned int ts_processed = 0xdeaddead;
@ -1247,7 +1236,7 @@ void kgsl_pwrctrl_wake(struct kgsl_device *device)
 	kgsl_pwrctrl_request_state(device, KGSL_STATE_ACTIVE);
 	switch (device->state) {
 	case KGSL_STATE_SLUMBER:
-		status = device->ftbl->start(device, 0);
+		status = device->ftbl->start(device);
 		if (status) {
 			kgsl_pwrctrl_request_state(device, KGSL_STATE_NONE);
 			KGSL_DRV_ERR(device, "start failed %d\n", status);
@ -1276,9 +1265,6 @@ void kgsl_pwrctrl_wake(struct kgsl_device *device)
 		/* Enable state before turning on irq */
 		kgsl_pwrctrl_set_state(device, KGSL_STATE_ACTIVE);
 		kgsl_pwrctrl_irq(device, KGSL_PWRFLAGS_ON);
-		/* Re-enable HW access */
-		mod_timer(&device->idle_timer,
-				jiffies + device->pwrctrl.interval_timeout);
 		pm_qos_update_request(&device->pm_qos_req_dma,
 					GPU_SWFI_LATENCY);
 	case KGSL_STATE_ACTIVE:
@ -1288,8 +1274,10 @@ void kgsl_pwrctrl_wake(struct kgsl_device *device)
 		KGSL_PWR_WARN(device, "unhandled state %s\n",
 				kgsl_pwrstate_to_str(device->state));
 		kgsl_pwrctrl_request_state(device, KGSL_STATE_NONE);
+		status = -EINVAL;
 		break;
 	}
+	return status;
 }
 EXPORT_SYMBOL(kgsl_pwrctrl_wake);

@ -1342,10 +1330,6 @@ const char *kgsl_pwrstate_to_str(unsigned int state)
 		return "SLEEP";
 	case KGSL_STATE_SUSPEND:
 		return "SUSPEND";
-	case KGSL_STATE_HUNG:
-		return "HUNG";
-	case KGSL_STATE_DUMP_AND_FT:
-		return "DNR";
 	case KGSL_STATE_SLUMBER:
 		return "SLUMBER";
 	default:
@ -1355,3 +1339,118 @@ const char *kgsl_pwrstate_to_str(unsigned int state)
 }
 EXPORT_SYMBOL(kgsl_pwrstate_to_str);

+
+/**
+ * kgsl_active_count_get() - Increase the device active count
+ * @device: Pointer to a KGSL device
+ *
+ * Increase the active count for the KGSL device and turn on
+ * clocks if this is the first reference. Code paths that need
+ * to touch the hardware or wait for the hardware to complete
+ * an operation must hold an active count reference until they
+ * are finished. An error code will be returned if waking the
+ * device fails. The device mutex must be held while *calling
+ * this function.
+ */
+int kgsl_active_count_get(struct kgsl_device *device)
+{
+	int ret = 0;
+	BUG_ON(!mutex_is_locked(&device->mutex));
+
+	if (atomic_read(&device->active_cnt) == 0) {
+		if (device->requested_state == KGSL_STATE_SUSPEND ||
+				device->state == KGSL_STATE_SUSPEND) {
+			mutex_unlock(&device->mutex);
+			wait_for_completion(&device->hwaccess_gate);
+			mutex_lock(&device->mutex);
+		}
+
+		/* Stop the idle timer */
+		del_timer_sync(&device->idle_timer);
+
+		ret = kgsl_pwrctrl_wake(device);
+	}
+	if (ret == 0)
+		atomic_inc(&device->active_cnt);
+	trace_kgsl_active_count(device,
+		(unsigned long) __builtin_return_address(0));
+	return ret;
+}
+EXPORT_SYMBOL(kgsl_active_count_get);
+
+/**
+ * kgsl_active_count_get_light() - Increase the device active count
+ * @device: Pointer to a KGSL device
+ *
+ * Increase the active count for the KGSL device WITHOUT
+ * turning on the clocks based on the assumption that the clocks are already
+ * on from a previous active_count_get(). Currently this is only used for
+ * creating kgsl_events.
+ */
+int kgsl_active_count_get_light(struct kgsl_device *device)
+{
+	if (atomic_inc_not_zero(&device->active_cnt) == 0) {
+		dev_WARN_ONCE(device->dev, 1, "active count is 0!\n");
+		return -EINVAL;
+	}
+
+	trace_kgsl_active_count(device,
+		(unsigned long) __builtin_return_address(0));
+	return 0;
+}
+EXPORT_SYMBOL(kgsl_active_count_get_light);
+
+/**
+ * kgsl_active_count_put() - Decrease the device active count
+ * @device: Pointer to a KGSL device
+ *
+ * Decrease the active count for the KGSL device and turn off
+ * clocks if there are no remaining references. This function will
+ * transition the device to NAP if there are no other pending state
+ * changes. It also completes the suspend gate.  The device mutex must
+ * be held while calling this function.
+ */
+void kgsl_active_count_put(struct kgsl_device *device)
+{
+	BUG_ON(!mutex_is_locked(&device->mutex));
+	BUG_ON(atomic_read(&device->active_cnt) == 0);
+
+	kgsl_pwrscale_idle(device);
+
+	if (atomic_dec_and_test(&device->active_cnt)) {
+		INIT_COMPLETION(device->suspend_gate);
+
+		if (device->pwrctrl.nap_allowed == true) {
+			/* Request nap */
+			kgsl_pwrctrl_request_state(device, KGSL_STATE_NAP);
+			kgsl_pwrctrl_sleep(device);
+		}
+
+		mod_timer(&device->idle_timer,
+			jiffies + device->pwrctrl.interval_timeout);
+
+		complete(&device->suspend_gate);
+	}
+
+	trace_kgsl_active_count(device,
+		(unsigned long) __builtin_return_address(0));
+}
+EXPORT_SYMBOL(kgsl_active_count_put);
+
+/**
+ * kgsl_active_count_wait() - Wait for activity to finish.
+ * @device: Pointer to a KGSL device
+ *
+ * Block until all active_cnt users put() their reference.
+ */
+void kgsl_active_count_wait(struct kgsl_device *device)
+{
+	BUG_ON(!mutex_is_locked(&device->mutex));
+
+	if (atomic_read(&device->active_cnt) != 0) {
+		mutex_unlock(&device->mutex);
+		wait_for_completion(&device->suspend_gate);
+		mutex_lock(&device->mutex);
+	}
+}
+EXPORT_SYMBOL(kgsl_active_count_wait);
--- a/drivers/gpu/msm/kgsl_pwrctrl.h
+++ b/drivers/gpu/msm/kgsl_pwrctrl.h
@ -93,9 +93,8 @@ void kgsl_pwrctrl_close(struct kgsl_device *device);
 void kgsl_timer(unsigned long data);
 void kgsl_idle_check(struct work_struct *work);
 void kgsl_pre_hwaccess(struct kgsl_device *device);
-void kgsl_check_suspended(struct kgsl_device *device);
 int kgsl_pwrctrl_sleep(struct kgsl_device *device);
-void kgsl_pwrctrl_wake(struct kgsl_device *device);
+int kgsl_pwrctrl_wake(struct kgsl_device *device);
 void kgsl_pwrctrl_pwrlevel_change(struct kgsl_device *device,
 	unsigned int level);
 int kgsl_pwrctrl_init_sysfs(struct kgsl_device *device);
@ -109,4 +108,10 @@ static inline unsigned long kgsl_get_clkrate(struct clk *clk)

 void kgsl_pwrctrl_set_state(struct kgsl_device *device, unsigned int state);
 void kgsl_pwrctrl_request_state(struct kgsl_device *device, unsigned int state);
+
+int kgsl_active_count_get(struct kgsl_device *device);
+int kgsl_active_count_get_light(struct kgsl_device *device);
+void kgsl_active_count_put(struct kgsl_device *device);
+void kgsl_active_count_wait(struct kgsl_device *device);
+
 #endif /* __KGSL_PWRCTRL_H */
--- a/drivers/gpu/msm/kgsl_pwrscale.c
+++ b/drivers/gpu/msm/kgsl_pwrscale.c
@ -1,4 +1,4 @@
-/* Copyright (c) 2010-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2010-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -241,6 +241,7 @@ void kgsl_pwrscale_busy(struct kgsl_device *device)
 			device->pwrscale.policy->busy(device,
 					&device->pwrscale);
 }
+EXPORT_SYMBOL(kgsl_pwrscale_busy);

 void kgsl_pwrscale_idle(struct kgsl_device *device)
 {
--- a/drivers/gpu/msm/kgsl_sharedmem.c
+++ b/drivers/gpu/msm/kgsl_sharedmem.c
@ -65,14 +65,6 @@ struct mem_entry_stats {
 		mem_entry_max_show), \
 }

-
-/*
- * One page allocation for a guard region to protect against over-zealous
- * GPU pre-fetch
- */
-
-static struct page *kgsl_guard_page;
-
 /**
 * Given a kobj, find the process structure attached to it
 */
@ -244,6 +236,29 @@ static int kgsl_drv_histogram_show(struct device *dev,
 	return len;
 }

+static int kgsl_drv_full_cache_threshold_store(struct device *dev,
+					 struct device_attribute *attr,
+					 const char *buf, size_t count)
+{
+	int ret;
+	unsigned int thresh;
+	ret = sscanf(buf, "%d", &thresh);
+	if (ret != 1)
+		return count;
+
+	kgsl_driver.full_cache_threshold = thresh;
+
+	return count;
+}
+
+static int kgsl_drv_full_cache_threshold_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			kgsl_driver.full_cache_threshold);
+}
+
 DEVICE_ATTR(vmalloc, 0444, kgsl_drv_memstat_show, NULL);
 DEVICE_ATTR(vmalloc_max, 0444, kgsl_drv_memstat_show, NULL);
 DEVICE_ATTR(page_alloc, 0444, kgsl_drv_memstat_show, NULL);
@ -253,6 +268,9 @@ DEVICE_ATTR(coherent_max, 0444, kgsl_drv_memstat_show, NULL);
 DEVICE_ATTR(mapped, 0444, kgsl_drv_memstat_show, NULL);
 DEVICE_ATTR(mapped_max, 0444, kgsl_drv_memstat_show, NULL);
 DEVICE_ATTR(histogram, 0444, kgsl_drv_histogram_show, NULL);
+DEVICE_ATTR(full_cache_threshold, 0644,
+		kgsl_drv_full_cache_threshold_show,
+		kgsl_drv_full_cache_threshold_store);

 static const struct device_attribute *drv_attr_list[] = {
 	&dev_attr_vmalloc,
@ -264,6 +282,7 @@ static const struct device_attribute *drv_attr_list[] = {
 	&dev_attr_mapped,
 	&dev_attr_mapped_max,
 	&dev_attr_histogram,
+	&dev_attr_full_cache_threshold,
 	NULL
 };

@ -366,10 +385,6 @@ static void kgsl_page_alloc_free(struct kgsl_memdesc *memdesc)
 	struct scatterlist *sg;
 	int sglen = memdesc->sglen;

-	/* Don't free the guard page if it was used */
-	if (memdesc->priv & KGSL_MEMDESC_GUARD_PAGE)
-		sglen--;
-
 	kgsl_driver.stats.page_alloc -= memdesc->size;

 	if (memdesc->hostptr) {
@ -407,10 +422,6 @@ static int kgsl_page_alloc_map_kernel(struct kgsl_memdesc *memdesc)
 		int sglen = memdesc->sglen;
 		int i, count = 0;

-		/* Don't map the guard page if it exists */
-		if (memdesc->priv & KGSL_MEMDESC_GUARD_PAGE)
-			sglen--;
-
 		/* create a list of pages to call vmap */
 		pages = vmalloc(npages * sizeof(struct page *));
 		if (!pages) {
@ -568,14 +579,6 @@ _kgsl_sharedmem_page_alloc(struct kgsl_memdesc *memdesc,

 	sglen_alloc = PAGE_ALIGN(size) >> PAGE_SHIFT;

-	/*
-	 * Add guard page to the end of the allocation when the
-	 * IOMMU is in use.
-	 */
-
-	if (kgsl_mmu_get_mmutype() == KGSL_MMU_TYPE_IOMMU)
-		sglen_alloc++;
-
 	memdesc->size = size;
 	memdesc->pagetable = pagetable;
 	memdesc->ops = &kgsl_page_alloc_ops;
@ -648,26 +651,6 @@ _kgsl_sharedmem_page_alloc(struct kgsl_memdesc *memdesc,
 		len -= page_size;
 	}

-	/* Add the guard page to the end of the sglist */
-
-	if (kgsl_mmu_get_mmutype() == KGSL_MMU_TYPE_IOMMU) {
-		/*
-		 * It doesn't matter if we use GFP_ZERO here, this never
-		 * gets mapped, and we only allocate it once in the life
-		 * of the system
-		 */
-
-		if (kgsl_guard_page == NULL)
-			kgsl_guard_page = alloc_page(GFP_KERNEL | __GFP_ZERO |
-				__GFP_HIGHMEM);
-
-		if (kgsl_guard_page != NULL) {
-			sg_set_page(&memdesc->sg[sglen++], kgsl_guard_page,
-				PAGE_SIZE, 0);
-			memdesc->priv |= KGSL_MEMDESC_GUARD_PAGE;
-		}
-	}
-
 	memdesc->sglen = sglen;

 	/*
--- a/drivers/gpu/msm/kgsl_sharedmem.h
+++ b/drivers/gpu/msm/kgsl_sharedmem.h
@ -1,4 +1,4 @@
-/* Copyright (c) 2002,2007-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2002,2007-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -19,6 +19,7 @@
 #include "kgsl_mmu.h"
 #include <linux/slab.h>
 #include <linux/kmemleak.h>
+#include <linux/iommu.h>

 #include "kgsl_log.h"

@ -200,15 +201,24 @@ kgsl_memdesc_has_guard_page(const struct kgsl_memdesc *memdesc)
 /*
 * kgsl_memdesc_protflags - get mmu protection flags
 * @memdesc - the memdesc
- * Returns a mask of GSL_PT_PAGE* values based on the
- * memdesc flags.
+ * Returns a mask of GSL_PT_PAGE* or IOMMU* values based
+ * on the memdesc flags.
 */
 static inline unsigned int
 kgsl_memdesc_protflags(const struct kgsl_memdesc *memdesc)
 {
-	unsigned int protflags = GSL_PT_PAGE_RV;
-	if (!(memdesc->flags & KGSL_MEMFLAGS_GPUREADONLY))
-		protflags |= GSL_PT_PAGE_WV;
+	unsigned int protflags = 0;
+	enum kgsl_mmutype mmutype = kgsl_mmu_get_mmutype();
+
+	if (mmutype == KGSL_MMU_TYPE_GPU) {
+		protflags = GSL_PT_PAGE_RV;
+		if (!(memdesc->flags & KGSL_MEMFLAGS_GPUREADONLY))
+			protflags |= GSL_PT_PAGE_WV;
+	} else if (mmutype == KGSL_MMU_TYPE_IOMMU) {
+		protflags = IOMMU_READ;
+		if (!(memdesc->flags & KGSL_MEMFLAGS_GPUREADONLY))
+			protflags |= IOMMU_WRITE;
+	}
 	return protflags;
 }

@ -253,8 +263,7 @@ kgsl_allocate(struct kgsl_memdesc *memdesc,
 	ret = kgsl_sharedmem_page_alloc(memdesc, pagetable, size);
 	if (ret)
 		return ret;
-	ret = kgsl_mmu_map(pagetable, memdesc,
-			   kgsl_memdesc_protflags(memdesc));
+	ret = kgsl_mmu_map(pagetable, memdesc);
 	if (ret)
 		kgsl_sharedmem_free(memdesc);
 	return ret;
@ -291,15 +300,4 @@ kgsl_allocate_contiguous(struct kgsl_memdesc *memdesc, size_t size)
 	return ret;
 }

-static inline int kgsl_sg_size(struct scatterlist *sg, int sglen)
-{
-	int i, size = 0;
-	struct scatterlist *s;
-
-	for_each_sg(sg, s, sglen, i) {
-		size += s->length;
-	}
-
-	return size;
-}
 #endif /* __KGSL_SHAREDMEM_H */
--- a/drivers/gpu/msm/kgsl_snapshot.c
+++ b/drivers/gpu/msm/kgsl_snapshot.c
@ -1,4 +1,4 @@
-/* Copyright (c) 2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2012-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -106,7 +106,12 @@ static int snapshot_context_info(int id, void *ptr, void *data)
 {
 	struct kgsl_snapshot_linux_context *header = _ctxtptr;
 	struct kgsl_context *context = ptr;
-	struct kgsl_device *device = context->dev_priv->device;
+	struct kgsl_device *device;
+
+	if (context)
+		device = context->device;
+	else
+		device = (struct kgsl_device *)data;

 	header->id = id;

@ -139,9 +144,12 @@ static int snapshot_os(struct kgsl_device *device,
 	/* Figure out how many active contexts there are - these will
 	 * be appended on the end of the structure */

-	rcu_read_lock();
+	read_lock(&device->context_lock);
 	idr_for_each(&device->context_idr, snapshot_context_count, &ctxtcount);
-	rcu_read_unlock();
+	read_unlock(&device->context_lock);
+
+	/* Increment ctxcount for the global memstore */
+	ctxtcount++;

 	size += ctxtcount * sizeof(struct kgsl_snapshot_linux_context);

@ -171,8 +179,9 @@ static int snapshot_os(struct kgsl_device *device,
 	header->grpclk = kgsl_get_clkrate(pwr->grp_clks[0]);
 	header->busclk = kgsl_get_clkrate(pwr->ebi1_clk);

-	/* Future proof for per-context timestamps */
-	header->current_context = -1;
+	/* Save the last active context */
+	kgsl_sharedmem_readl(&device->memstore, &header->current_context,
+		KGSL_MEMSTORE_OFFSET(KGSL_MEMSTORE_GLOBAL, current_context));

 	/* Get the current PT base */
 	header->ptbase = kgsl_mmu_get_current_ptbase(&device->mmu);
@ -187,11 +196,17 @@ static int snapshot_os(struct kgsl_device *device,

 	header->ctxtcount = ctxtcount;

-	/* append information for each context */
 	_ctxtptr = snapshot + sizeof(*header);
-	rcu_read_lock();
+
+	/* append information for the global context */
+	snapshot_context_info(KGSL_MEMSTORE_GLOBAL, NULL, device);
+
+	/* append information for each context */
+
+	read_lock(&device->context_lock);
 	idr_for_each(&device->context_idr, snapshot_context_info, NULL);
-	rcu_read_unlock();
+	read_unlock(&device->context_lock);
+
 	/* Return the size of the data segment */
 	return size;
 }
@ -286,7 +301,7 @@ static void kgsl_snapshot_put_object(struct kgsl_device *device,
 {
 	list_del(&obj->node);

-	obj->entry->flags &= ~KGSL_MEM_ENTRY_FROZEN;
+	obj->entry->memdesc.priv &= ~KGSL_MEMDESC_FROZEN;
 	kgsl_mem_entry_put(obj->entry);

 	kfree(obj);
@ -317,6 +332,7 @@ int kgsl_snapshot_have_object(struct kgsl_device *device, unsigned int ptbase,

 	return 0;
 }
+EXPORT_SYMBOL(kgsl_snapshot_have_object);

 /* kgsl_snapshot_get_object - Mark a GPU buffer to be frozen
 * @device - the device that is being snapshotted
@ -336,6 +352,10 @@ int kgsl_snapshot_get_object(struct kgsl_device *device, unsigned int ptbase,
 	struct kgsl_mem_entry *entry;
 	struct kgsl_snapshot_object *obj;
 	int offset;
+	int ret = -EINVAL;
+
+	if (!gpuaddr)
+		return 0;

 	entry = kgsl_get_mem_entry(device, ptbase, gpuaddr, size);

@ -349,7 +369,7 @@ int kgsl_snapshot_get_object(struct kgsl_device *device, unsigned int ptbase,
 	if (entry->memtype != KGSL_MEM_ENTRY_KERNEL) {
 		KGSL_DRV_ERR(device,
 			"Only internal GPU buffers can be frozen\n");
-		return -EINVAL;
+		goto err_put;
 	}

 	/*
@ -372,36 +392,33 @@ int kgsl_snapshot_get_object(struct kgsl_device *device, unsigned int ptbase,
 	if (size + offset > entry->memdesc.size) {
 		KGSL_DRV_ERR(device, "Invalid size for GPU buffer %8.8X\n",
 				gpuaddr);
-		return -EINVAL;
+		goto err_put;
 	}

 	/* If the buffer is already on the list, skip it */
 	list_for_each_entry(obj, &device->snapshot_obj_list, node) {
 		if (obj->gpuaddr == gpuaddr && obj->ptbase == ptbase) {
-			/* If the size is different, use the new size */
-			if (obj->size != size)
+			/* If the size is different, use the bigger size */
+			if (obj->size < size)
 				obj->size = size;
-
-			return 0;
+			ret = 0;
+			goto err_put;
 		}
 	}

 	if (kgsl_memdesc_map(&entry->memdesc) == NULL) {
 		KGSL_DRV_ERR(device, "Unable to map GPU buffer %X\n",
 				gpuaddr);
-		return -EINVAL;
+		goto err_put;
 	}

 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);

 	if (obj == NULL) {
 		KGSL_DRV_ERR(device, "Unable to allocate memory\n");
-		return -EINVAL;
+		goto err_put;
 	}

-	/* Ref count the mem entry */
-	kgsl_mem_entry_get(entry);
-
 	obj->type = type;
 	obj->entry = entry;
 	obj->gpuaddr = gpuaddr;
@ -419,12 +436,15 @@ int kgsl_snapshot_get_object(struct kgsl_device *device, unsigned int ptbase,
 	 * 0 so it doesn't get counted twice
 	 */

-	if (entry->flags & KGSL_MEM_ENTRY_FROZEN)
-		return 0;
+	ret = (entry->memdesc.priv & KGSL_MEMDESC_FROZEN) ? 0
+		: entry->memdesc.size;

-	entry->flags |= KGSL_MEM_ENTRY_FROZEN;
+	entry->memdesc.priv |= KGSL_MEMDESC_FROZEN;

-	return entry->memdesc.size;
+	return ret;
+err_put:
+	kgsl_mem_entry_put(entry);
+	return ret;
 }
 EXPORT_SYMBOL(kgsl_snapshot_get_object);

--- a/drivers/gpu/msm/kgsl_sync.c
+++ b/drivers/gpu/msm/kgsl_sync.c
@ -11,6 +11,7 @@
 *
 */

+#include <linux/err.h>
 #include <linux/file.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
@ -225,3 +226,65 @@ void kgsl_sync_timeline_destroy(struct kgsl_context *context)
 {
 	sync_timeline_destroy(context->timeline);
 }
+
+static void kgsl_sync_callback(struct sync_fence *fence,
+	struct sync_fence_waiter *waiter)
+{
+	struct kgsl_sync_fence_waiter *kwaiter =
+		(struct kgsl_sync_fence_waiter *) waiter;
+	kwaiter->func(kwaiter->priv);
+	sync_fence_put(kwaiter->fence);
+	kfree(kwaiter);
+}
+
+struct kgsl_sync_fence_waiter *kgsl_sync_fence_async_wait(int fd,
+	void (*func)(void *priv), void *priv)
+{
+	struct kgsl_sync_fence_waiter *kwaiter;
+	struct sync_fence *fence;
+	int status;
+
+	fence = sync_fence_fdget(fd);
+	if (fence == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* create the waiter */
+	kwaiter = kzalloc(sizeof(*kwaiter), GFP_KERNEL);
+	if (kwaiter == NULL) {
+		sync_fence_put(fence);
+		return ERR_PTR(-ENOMEM);
+	}
+	kwaiter->fence = fence;
+	kwaiter->priv = priv;
+	kwaiter->func = func;
+	sync_fence_waiter_init((struct sync_fence_waiter *) kwaiter,
+		kgsl_sync_callback);
+
+	/* if status then error or signaled */
+	status = sync_fence_wait_async(fence,
+		(struct sync_fence_waiter *) kwaiter);
+	if (status) {
+		kfree(kwaiter);
+		sync_fence_put(fence);
+		if (status < 0)
+			kwaiter = ERR_PTR(status);
+		else
+			kwaiter = NULL;
+	}
+
+	return kwaiter;
+}
+
+int kgsl_sync_fence_async_cancel(struct kgsl_sync_fence_waiter *kwaiter)
+{
+	if (kwaiter == NULL)
+		return 0;
+
+	if(sync_fence_cancel_async(kwaiter->fence,
+		(struct sync_fence_waiter *) kwaiter) == 0) {
+		sync_fence_put(kwaiter->fence);
+		kfree(kwaiter);
+		return 1;
+	}
+	return 0;
+}
--- a/drivers/gpu/msm/kgsl_sync.h
+++ b/drivers/gpu/msm/kgsl_sync.h
@ -1,4 +1,4 @@
-/* Copyright (c) 2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2012-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -26,6 +26,13 @@ struct kgsl_sync_pt {
 	unsigned int timestamp;
 };

+struct kgsl_sync_fence_waiter {
+	struct sync_fence_waiter waiter;
+	struct sync_fence *fence;
+	void (*func)(void *priv);
+	void *priv;
+};
+
 #if defined(CONFIG_SYNC)
 struct sync_pt *kgsl_sync_pt_create(struct sync_timeline *timeline,
 	unsigned int timestamp);
@ -37,6 +44,9 @@ int kgsl_sync_timeline_create(struct kgsl_context *context);
 void kgsl_sync_timeline_signal(struct sync_timeline *timeline,
 	unsigned int timestamp);
 void kgsl_sync_timeline_destroy(struct kgsl_context *context);
+struct kgsl_sync_fence_waiter *kgsl_sync_fence_async_wait(int fd,
+	void (*func)(void *priv), void *priv);
+int kgsl_sync_fence_async_cancel(struct kgsl_sync_fence_waiter *waiter);
 #else
 static inline struct sync_pt
 *kgsl_sync_pt_create(struct sync_timeline *timeline, unsigned int timestamp)
@ -70,6 +80,20 @@ kgsl_sync_timeline_signal(struct sync_timeline *timeline,
 static inline void kgsl_sync_timeline_destroy(struct kgsl_context *context)
 {
 }
+
+static inline struct
+kgsl_sync_fence_waiter *kgsl_sync_fence_async_wait(int fd,
+	void (*func)(void *priv), void *priv)
+{
+	return NULL;
+}
+
+static inline int
+kgsl_sync_fence_async_cancel(struct kgsl_sync_fence_waiter *waiter)
+{
+	return 1;
+}
+
 #endif

 #endif /* __KGSL_SYNC_H */
--- a/drivers/gpu/msm/kgsl_trace.h
+++ b/drivers/gpu/msm/kgsl_trace.h
@ -37,14 +37,13 @@ TRACE_EVENT(kgsl_issueibcmds,

 	TP_PROTO(struct kgsl_device *device,
 			int drawctxt_id,
-			struct kgsl_ibdesc *ibdesc,
-			int numibs,
+			struct kgsl_cmdbatch *cmdbatch,
 			int timestamp,
 			int flags,
 			int result,
 			unsigned int type),

-	TP_ARGS(device, drawctxt_id, ibdesc, numibs, timestamp, flags,
+	TP_ARGS(device, drawctxt_id, cmdbatch, timestamp, flags,
 		result, type),

 	TP_STRUCT__entry(
@ -61,8 +60,8 @@ TRACE_EVENT(kgsl_issueibcmds,
 	TP_fast_assign(
 		__assign_str(device_name, device->name);
 		__entry->drawctxt_id = drawctxt_id;
-		__entry->ibdesc_addr = ibdesc[0].gpuaddr;
-		__entry->numibs = numibs;
+		__entry->ibdesc_addr = cmdbatch->ibdesc[0].gpuaddr;
+		__entry->numibs = cmdbatch->ibcount;
 		__entry->timestamp = timestamp;
 		__entry->flags = flags;
 		__entry->result = result;
@ -479,6 +478,67 @@ TRACE_EVENT(kgsl_mem_free,
 	)
 );

+TRACE_EVENT(kgsl_mem_sync_cache,
+
+	TP_PROTO(struct kgsl_mem_entry *mem_entry, unsigned int op),
+
+	TP_ARGS(mem_entry, op),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, gpuaddr)
+		__field(unsigned int, size)
+		__array(char, usage, 16)
+		__field(unsigned int, tgid)
+		__field(unsigned int, id)
+		__field(unsigned int, op)
+	),
+
+	TP_fast_assign(
+		__entry->gpuaddr = mem_entry->memdesc.gpuaddr;
+		__entry->size = mem_entry->memdesc.size;
+		__entry->tgid = mem_entry->priv->pid;
+		__entry->id = mem_entry->id;
+		kgsl_get_memory_usage(__entry->usage, sizeof(__entry->usage),
+				     mem_entry->memdesc.flags);
+		__entry->op = op;
+	),
+
+	TP_printk(
+		"gpuaddr=0x%08x size=%d tgid=%d usage=%s id=%d op=%c%c",
+		__entry->gpuaddr, __entry->size, __entry->tgid, __entry->usage,
+		__entry->id,
+		(__entry->op & KGSL_GPUMEM_CACHE_CLEAN) ? 'c' : '.',
+		(__entry->op & KGSL_GPUMEM_CACHE_INV) ? 'i' : '.'
+	)
+);
+
+TRACE_EVENT(kgsl_mem_sync_full_cache,
+
+	TP_PROTO(unsigned int num_bufs, unsigned int bulk_size,
+		unsigned int op),
+
+	TP_ARGS(num_bufs, bulk_size, op),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, num_bufs)
+		__field(unsigned int, bulk_size)
+		__field(unsigned int, op)
+	),
+
+	TP_fast_assign(
+		__entry->num_bufs = num_bufs;
+		__entry->bulk_size = bulk_size;
+		__entry->op = op;
+	),
+
+	TP_printk(
+		"num_bufs=%d bulk_size=%d op=%c%c",
+		__entry->num_bufs, __entry->bulk_size,
+		(__entry->op & KGSL_GPUMEM_CACHE_CLEAN) ? 'c' : '.',
+		(__entry->op & KGSL_GPUMEM_CACHE_INV) ? 'i' : '.'
+	)
+);
+
 DECLARE_EVENT_CLASS(kgsl_mem_timestamp_template,

 	TP_PROTO(struct kgsl_device *device, struct kgsl_mem_entry *mem_entry,
@ -591,6 +651,28 @@ TRACE_EVENT(kgsl_context_detach,
 	)
 );

+TRACE_EVENT(kgsl_context_destroy,
+
+	TP_PROTO(struct kgsl_device *device, struct kgsl_context *context),
+
+	TP_ARGS(device, context),
+
+	TP_STRUCT__entry(
+		__string(device_name, device->name)
+		__field(unsigned int, id)
+	),
+
+	TP_fast_assign(
+		__assign_str(device_name, device->name);
+		__entry->id = context->id;
+	),
+
+	TP_printk(
+		"d_name=%s ctx=%u",
+		__get_str(device_name), __entry->id
+	)
+);
+
 TRACE_EVENT(kgsl_mmu_pagefault,

 	TP_PROTO(struct kgsl_device *device, unsigned int page,
@ -681,6 +763,30 @@ TRACE_EVENT(kgsl_fire_event,
 			__entry->id, __entry->ts, __entry->type, __entry->age)
 );

+TRACE_EVENT(kgsl_active_count,
+
+	TP_PROTO(struct kgsl_device *device, unsigned long ip),
+
+	TP_ARGS(device, ip),
+
+	TP_STRUCT__entry(
+		__string(device_name, device->name)
+		__field(unsigned int, count)
+		__field(unsigned long, ip)
+	),
+
+	TP_fast_assign(
+		__assign_str(device_name, device->name);
+		__entry->count = atomic_read(&device->active_cnt);
+		__entry->ip = ip;
+	),
+
+	TP_printk(
+		"d_name=%s active_cnt=%x func=%pf",
+		__get_str(device_name), __entry->count, (void *) __entry->ip
+	)
+);
+
 #endif /* _KGSL_TRACE_H */

 /* This part must be outside protection */
--- a/drivers/gpu/msm/z180.c
+++ b/drivers/gpu/msm/z180.c
@ -17,7 +17,6 @@
 #include "kgsl.h"
 #include "kgsl_cffdump.h"
 #include "kgsl_sharedmem.h"
-#include "kgsl_trace.h"

 #include "z180.h"
 #include "z180_reg.h"
@ -94,7 +93,8 @@ enum z180_cmdwindow_type {
 #define Z180_CMDWINDOW_TARGET_SHIFT		0
 #define Z180_CMDWINDOW_ADDR_SHIFT		8

-static int z180_start(struct kgsl_device *device, unsigned int init_ram);
+static int z180_init(struct kgsl_device *device);
+static int z180_start(struct kgsl_device *device);
 static int z180_stop(struct kgsl_device *device);
 static int z180_wait(struct kgsl_device *device,
 				struct kgsl_context *context,
@ -245,20 +245,17 @@ static int z180_setup_pt(struct kgsl_device *device,
 	int result = 0;
 	struct z180_device *z180_dev = Z180_DEVICE(device);

-	result = kgsl_mmu_map_global(pagetable, &device->mmu.setstate_memory,
-				     GSL_PT_PAGE_RV | GSL_PT_PAGE_WV);
+	result = kgsl_mmu_map_global(pagetable, &device->mmu.setstate_memory);

 	if (result)
 		goto error;

-	result = kgsl_mmu_map_global(pagetable, &device->memstore,
-				     GSL_PT_PAGE_RV | GSL_PT_PAGE_WV);
+	result = kgsl_mmu_map_global(pagetable, &device->memstore);
 	if (result)
 		goto error_unmap_dummy;

 	result = kgsl_mmu_map_global(pagetable,
-				     &z180_dev->ringbuffer.cmdbufdesc,
-				     GSL_PT_PAGE_RV);
+				     &z180_dev->ringbuffer.cmdbufdesc);
 	if (result)
 		goto error_unmap_memstore;
 	/*
@ -323,16 +320,11 @@ static void addcmd(struct z180_ringbuffer *rb, unsigned int timestamp,
 	*p++ = ADDR_VGV3_LAST << 24;
 }

-static void z180_cmdstream_start(struct kgsl_device *device, int init_ram)
+static void z180_cmdstream_start(struct kgsl_device *device)
 {
 	struct z180_device *z180_dev = Z180_DEVICE(device);
 	unsigned int cmd = VGV3_NEXTCMD_JUMP << VGV3_NEXTCMD_NEXTCMD_FSHIFT;

-	if (init_ram) {
-		z180_dev->timestamp = 0;
-		z180_dev->current_timestamp = 0;
-	}
-
 	addmarker(&z180_dev->ringbuffer, 0);

 	z180_cmdwindow_write(device, ADDR_VGV3_MODE, 4);
@ -362,7 +354,13 @@ static int room_in_rb(struct z180_device *device)
 	return ts_diff < Z180_PACKET_COUNT;
 }

-static int z180_idle(struct kgsl_device *device)
+/**
+ * z180_idle() - Idle the 2D device
+ * @device: Pointer to the KGSL device struct for the Z180
+ *
+ * wait until the z180 submission queue is idle
+ */
+int z180_idle(struct kgsl_device *device)
 {
 	int status = 0;
 	struct z180_device *z180_dev = Z180_DEVICE(device);
@ -382,10 +380,8 @@ static int z180_idle(struct kgsl_device *device)
 int
 z180_cmdstream_issueibcmds(struct kgsl_device_private *dev_priv,
 			struct kgsl_context *context,
-			struct kgsl_ibdesc *ibdesc,
-			unsigned int numibs,
-			uint32_t *timestamp,
-			unsigned int ctrl)
+			struct kgsl_cmdbatch *cmdbatch,
+			uint32_t *timestamp)
 {
 	long result = 0;
 	unsigned int ofs        = PACKETSIZE_STATESTREAM * sizeof(unsigned int);
@ -398,6 +394,20 @@ z180_cmdstream_issueibcmds(struct kgsl_device_private *dev_priv,
 	struct kgsl_pagetable *pagetable = dev_priv->process_priv->pagetable;
 	struct z180_device *z180_dev = Z180_DEVICE(device);
 	unsigned int sizedwords;
+	unsigned int numibs;
+	struct kgsl_ibdesc *ibdesc;
+
+	mutex_lock(&device->mutex);
+
+	kgsl_active_count_get(device);
+
+	if (cmdbatch == NULL) {
+		result = EINVAL;
+		goto error;
+	}
+
+	ibdesc = cmdbatch->ibdesc;
+	numibs = cmdbatch->ibcount;

 	if (device->state & KGSL_STATE_HUNG) {
 		result = -EINVAL;
@ -439,7 +449,7 @@ z180_cmdstream_issueibcmds(struct kgsl_device_private *dev_priv,
 		context->id, cmd, sizedwords);
 	/* context switch */
 	if ((context->id != (int)z180_dev->ringbuffer.prevctx) ||
-	    (ctrl & KGSL_CONTEXT_CTX_SWITCH)) {
+	    (cmdbatch->flags & KGSL_CONTEXT_CTX_SWITCH)) {
 		KGSL_CMD_INFO(device, "context switch %d -> %d\n",
 			context->id, z180_dev->ringbuffer.prevctx);
 		kgsl_mmu_setstate(&device->mmu, pagetable,
@ -447,10 +457,13 @@ z180_cmdstream_issueibcmds(struct kgsl_device_private *dev_priv,
 		cnt = PACKETSIZE_STATESTREAM;
 		ofs = 0;
 	}
-	kgsl_setstate(&device->mmu,
+
+	result = kgsl_setstate(&device->mmu,
 			KGSL_MEMSTORE_GLOBAL,
 			kgsl_mmu_pt_get_flags(device->mmu.hwpagetable,
 			device->id));
+	if (result < 0)
+		goto error;

 	result = wait_event_interruptible_timeout(device->wait_queue,
 				  room_in_rb(z180_dev),
@ -491,9 +504,12 @@ z180_cmdstream_issueibcmds(struct kgsl_device_private *dev_priv,
 	z180_cmdwindow_write(device, ADDR_VGV3_CONTROL, cmd);
 	z180_cmdwindow_write(device, ADDR_VGV3_CONTROL, 0);
 error:
+	kgsl_trace_issueibcmds(device, context->id, cmdbatch,
+		*timestamp, cmdbatch->flags, result, 0);

-	trace_kgsl_issueibcmds(device, context->id, ibdesc, numibs,
-		*timestamp, ctrl, result, 0);
+	kgsl_active_count_put(device);
+
+	mutex_unlock(&device->mutex);

 	return (int)result;
 }
@ -503,6 +519,7 @@ static int z180_ringbuffer_init(struct kgsl_device *device)
 	struct z180_device *z180_dev = Z180_DEVICE(device);
 	memset(&z180_dev->ringbuffer, 0, sizeof(struct z180_ringbuffer));
 	z180_dev->ringbuffer.prevctx = Z180_INVALID_CONTEXT;
+	z180_dev->ringbuffer.cmdbufdesc.flags = KGSL_MEMFLAGS_GPUREADONLY;
 	return kgsl_allocate_contiguous(&z180_dev->ringbuffer.cmdbufdesc,
 		Z180_RB_SIZE);
 }
@ -559,7 +576,17 @@ static int __devexit z180_remove(struct platform_device *pdev)
 	return 0;
 }

-static int z180_start(struct kgsl_device *device, unsigned int init_ram)
+static int z180_init(struct kgsl_device *device)
+{
+	struct z180_device *z180_dev = Z180_DEVICE(device);
+
+	z180_dev->timestamp = 0;
+	z180_dev->current_timestamp = 0;
+
+	return 0;
+}
+
+static int z180_start(struct kgsl_device *device)
 {
 	int status = 0;

@ -576,7 +603,7 @@ static int z180_start(struct kgsl_device *device, unsigned int init_ram)
 	if (status)
 		goto error_clk_off;

-	z180_cmdstream_start(device, init_ram);
+	z180_cmdstream_start(device);

 	mod_timer(&device->idle_timer, jiffies + FIRST_TIMEOUT);
 	kgsl_pwrctrl_irq(device, KGSL_PWRFLAGS_ON);
@ -661,7 +688,7 @@ static int z180_getproperty(struct kgsl_device *device,
 	return status;
 }

-static unsigned int z180_isidle(struct kgsl_device *device)
+static bool z180_isidle(struct kgsl_device *device)
 {
 	struct z180_device *z180_dev = Z180_DEVICE(device);

@ -822,9 +849,9 @@ static int z180_waittimestamp(struct kgsl_device *device,
 {
 	int status = -EINVAL;

-	/* Don't wait forever, set a max (10 sec) value for now */
+	/* Don't wait forever, set a max of Z180_IDLE_TIMEOUT */
 	if (msecs == -1)
-		msecs = 10 * MSEC_PER_SEC;
+		msecs = Z180_IDLE_TIMEOUT;

 	mutex_unlock(&device->mutex);
 	status = z180_wait(device, context, timestamp, msecs);
@ -858,11 +885,30 @@ static int z180_wait(struct kgsl_device *device,
 	return status;
 }

-static void
-z180_drawctxt_destroy(struct kgsl_device *device,
-			  struct kgsl_context *context)
+struct kgsl_context *
+z180_drawctxt_create(struct kgsl_device_private *dev_priv,
+			uint32_t *flags)
 {
-	struct z180_device *z180_dev = Z180_DEVICE(device);
+	int ret;
+	struct kgsl_context *context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (context == NULL)
+		return ERR_PTR(-ENOMEM);
+	ret = kgsl_context_init(dev_priv, context);
+	if (ret != 0) {
+		kfree(context);
+		return ERR_PTR(ret);
+	}
+	return context;
+}
+
+static int
+z180_drawctxt_detach(struct kgsl_context *context)
+{
+	struct kgsl_device *device;
+	struct z180_device *z180_dev;
+
+	device = context->device;
+	z180_dev = Z180_DEVICE(device);

 	z180_idle(device);

@ -872,6 +918,14 @@ z180_drawctxt_destroy(struct kgsl_device *device,
 		kgsl_setstate(&device->mmu, KGSL_MEMSTORE_GLOBAL,
 				KGSL_MMUFLAGS_PTUPDATE);
 	}
+
+	return 0;
+}
+
+static void
+z180_drawctxt_destroy(struct kgsl_context *context)
+{
+	kfree(context);
 }

 static void z180_power_stats(struct kgsl_device *device,
@ -926,6 +980,7 @@ static const struct kgsl_functable z180_functable = {
 	.idle = z180_idle,
 	.isidle = z180_isidle,
 	.suspend_context = z180_suspend_context,
+	.init = z180_init,
 	.start = z180_start,
 	.stop = z180_stop,
 	.getproperty = z180_getproperty,
@ -938,8 +993,10 @@ static const struct kgsl_functable z180_functable = {
 	.irqctrl = z180_irqctrl,
 	.gpuid = z180_gpuid,
 	.irq_handler = z180_irq_handler,
+	.drain = z180_idle, /* drain == idle for the z180 */
 	/* Optional functions */
-	.drawctxt_create = NULL,
+	.drawctxt_create = z180_drawctxt_create,
+	.drawctxt_detach = z180_drawctxt_detach,
 	.drawctxt_destroy = z180_drawctxt_destroy,
 	.ioctl = NULL,
 	.postmortem_dump = z180_dump,
--- a/drivers/gpu/msm/z180.h
+++ b/drivers/gpu/msm/z180.h
@ -1,4 +1,4 @@
-/* Copyright (c) 2008-2012, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2008-2013, The Linux Foundation. All rights reserved.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
@ -29,7 +29,7 @@
 #define Z180_DEFAULT_PWRSCALE_POLICY  NULL

 /* Wait a maximum of 10 seconds when trying to idle the core */
-#define Z180_IDLE_TIMEOUT (10 * 1000)
+#define Z180_IDLE_TIMEOUT (20 * 1000)

 struct z180_ringbuffer {
 	unsigned int prevctx;
@ -45,5 +45,6 @@ struct z180_device {
 };

 int z180_dump(struct kgsl_device *, int);
+int z180_idle(struct kgsl_device *);

 #endif /* __Z180_H */
--- a/drivers/gpu/msm/z180_postmortem.c
+++ b/drivers/gpu/msm/z180_postmortem.c
@ -58,6 +58,8 @@ static void z180_dump_regs(struct kgsl_device *device)
 	unsigned int i;
 	unsigned int reg_val;

+	z180_idle(device);
+
 	KGSL_LOG_DUMP(device, "Z180 Register Dump\n");
 	for (i = 0; i < ARRAY_SIZE(regs_to_dump); i++) {
 		kgsl_regread(device,
@ -168,6 +170,7 @@ static void z180_dump_ib(struct kgsl_device *device)
 				KGSL_LOG_DUMP(device,
 				"Could not map IB to kernel memory, Ringbuffer Slot: %d\n",
 				rb_slot_num);
+				kgsl_mem_entry_put(entry);
 				continue;
 			}

@ -190,6 +193,7 @@ static void z180_dump_ib(struct kgsl_device *device)
 						linebuf);
 			}
 			KGSL_LOG_DUMP(device, "IB Dump Finished\n");
+			kgsl_mem_entry_put(entry);
 		}
 	}
 }
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@ -1,5 +1,4 @@
-/* Copyright (c) 2010-2012, The Linux Foundation. All rights reserved.
- *
+/* Copyright (c) 2010-2012, The Linux Foundation. All rights reserved.  *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 and
 * only version 2 as published by the Free Software Foundation.
@ -50,6 +49,9 @@ __asm__ __volatile__ (							\
 #define MSM_IOMMU_ATTR_CACHED_WT	0x3


+static int msm_iommu_unmap_range(struct iommu_domain *domain, unsigned int va,
+				 unsigned int len);
+
 static inline void clean_pte(unsigned long *start, unsigned long *end,
 			     int redirect)
 {
@ -907,6 +909,7 @@ static int msm_iommu_map_range(struct iommu_domain *domain, unsigned int va,
 			       int prot)
 {
 	unsigned int pa;
+	unsigned int start_va = va;
 	unsigned int offset = 0;
 	unsigned long *fl_table;
 	unsigned long *fl_pte;
@ -978,12 +981,6 @@ static int msm_iommu_map_range(struct iommu_domain *domain, unsigned int va,
 				chunk_offset = 0;
 				sg = sg_next(sg);
 				pa = get_phys_addr(sg);
-				if (pa == 0) {
-					pr_debug("No dma address for sg %p\n",
-							sg);
-					ret = -EINVAL;
-					goto fail;
-				}
 			}
 			continue;
 		}
@ -1037,12 +1034,6 @@ static int msm_iommu_map_range(struct iommu_domain *domain, unsigned int va,
 				chunk_offset = 0;
 				sg = sg_next(sg);
 				pa = get_phys_addr(sg);
-				if (pa == 0) {
-					pr_debug("No dma address for sg %p\n",
-							sg);
-					ret = -EINVAL;
-					goto fail;
-				}
 			}
 		}

@ -1055,6 +1046,8 @@ static int msm_iommu_map_range(struct iommu_domain *domain, unsigned int va,
 	__flush_iotlb(domain);
 fail:
 	mutex_unlock(&msm_iommu_lock);
+	if (ret && offset > 0)
+		msm_iommu_unmap_range(domain, start_va, offset);
 	return ret;
 }

--- a/drivers/iommu/msm_iommu_pagetable.c
+++ b/drivers/iommu/msm_iommu_pagetable.c
@ -351,11 +351,6 @@ int msm_iommu_pagetable_map_range(struct iommu_pt *pt, unsigned int va,
 	sl_offset = SL_OFFSET(va);

 	chunk_pa = get_phys_addr(sg);
-	if (chunk_pa == 0) {
-		pr_debug("No dma address for sg %p\n", sg);
-		ret = -EINVAL;
-		goto fail;
-	}

 	while (offset < len) {
 		/* Set up a 2nd level page table if one doesn't exist */
@ -399,12 +394,6 @@ int msm_iommu_pagetable_map_range(struct iommu_pt *pt, unsigned int va,
 				chunk_offset = 0;
 				sg = sg_next(sg);
 				chunk_pa = get_phys_addr(sg);
-				if (chunk_pa == 0) {
-					pr_debug("No dma address for sg %p\n",
-						sg);
-					ret = -EINVAL;
-					goto fail;
-				}
 			}
 		}

--- a/include/linux/msm_kgsl.h
+++ b/include/linux/msm_kgsl.h
@ -12,25 +12,27 @@
 #define KGSL_VERSION_MINOR        14

 /*context flags */
-#define KGSL_CONTEXT_SAVE_GMEM		  0x00000001
-#define KGSL_CONTEXT_NO_GMEM_ALLOC	  0x00000002
-#define KGSL_CONTEXT_SUBMIT_IB_LIST	  0x00000004
-#define KGSL_CONTEXT_CTX_SWITCH		  0x00000008
-#define KGSL_CONTEXT_PREAMBLE		  0x00000010
-#define KGSL_CONTEXT_TRASH_STATE	  0x00000020
-#define KGSL_CONTEXT_PER_CONTEXT_TS	  0x00000040
-#define KGSL_CONTEXT_USER_GENERATED_TS	  0x00000080
-#define KGSL_CONTEXT_END_OF_FRAME         0x00000100
-#define KGSL_CONTEXT_NO_FAULT_TOLERANCE	  0x00000200
-/* bits [12:15] are reserved for future use */
-#define KGSL_CONTEXT_TYPE_MASK            0x01F00000
-#define KGSL_CONTEXT_TYPE_SHIFT           20
+#define KGSL_CONTEXT_SAVE_GMEM		0x00000001
+#define KGSL_CONTEXT_NO_GMEM_ALLOC	0x00000002
+#define KGSL_CONTEXT_SUBMIT_IB_LIST	0x00000004
+#define KGSL_CONTEXT_CTX_SWITCH		0x00000008
+#define KGSL_CONTEXT_PREAMBLE		0x00000010
+#define KGSL_CONTEXT_TRASH_STATE	0x00000020
+#define KGSL_CONTEXT_PER_CONTEXT_TS	0x00000040
+#define KGSL_CONTEXT_USER_GENERATED_TS	0x00000080
+#define KGSL_CONTEXT_END_OF_FRAME	0x00000100

-#define KGSL_CONTEXT_TYPE_ANY		  0
-#define KGSL_CONTEXT_TYPE_GL		  1
-#define KGSL_CONTEXT_TYPE_CL		  2
-#define KGSL_CONTEXT_TYPE_C2D		  3
-#define KGSL_CONTEXT_TYPE_RS		  4
+#define KGSL_CONTEXT_NO_FAULT_TOLERANCE 0x00000200
+#define KGSL_CONTEXT_SYNC               0x00000400
+/* bits [12:15] are reserved for future use */
+#define KGSL_CONTEXT_TYPE_MASK          0x01F00000
+#define KGSL_CONTEXT_TYPE_SHIFT         20
+
+#define KGSL_CONTEXT_TYPE_ANY		0
+#define KGSL_CONTEXT_TYPE_GL		1
+#define KGSL_CONTEXT_TYPE_CL		2
+#define KGSL_CONTEXT_TYPE_C2D		3
+#define KGSL_CONTEXT_TYPE_RS		4

 #define KGSL_CONTEXT_INVALID 0xffffffff

@ -194,31 +196,6 @@ enum kgsl_property_type {
 	KGSL_PROP_VERSION         = 0x00000008,
 	KGSL_PROP_GPU_RESET_STAT  = 0x00000009,
 	KGSL_PROP_PWRCTRL         = 0x0000000E,
-	KGSL_PROP_FAULT_TOLERANCE = 0x00000011,
-};
-
-/* Fault Tolerance policy flags */
-#define  KGSL_FT_DISABLE                  0x00000001
-#define  KGSL_FT_REPLAY                   0x00000002
-#define  KGSL_FT_SKIPIB                   0x00000004
-#define  KGSL_FT_SKIPFRAME                0x00000008
-#define  KGSL_FT_DEFAULT_POLICY           (KGSL_FT_REPLAY + KGSL_FT_SKIPIB)
-
-/* Pagefault policy flags */
-#define KGSL_FT_PAGEFAULT_INT_ENABLE         0x00000001
-#define KGSL_FT_PAGEFAULT_GPUHALT_ENABLE     0x00000002
-#define KGSL_FT_PAGEFAULT_LOG_ONE_PER_PAGE   0x00000004
-#define KGSL_FT_PAGEFAULT_LOG_ONE_PER_INT    0x00000008
-#define KGSL_FT_PAGEFAULT_DEFAULT_POLICY     (KGSL_FT_PAGEFAULT_INT_ENABLE + \
-					KGSL_FT_PAGEFAULT_LOG_ONE_PER_PAGE)
-
-/* Fault tolerance config */
-struct kgsl_ft_config {
-	unsigned int ft_policy;    /* Fault Tolerance policy flags */
-	unsigned int ft_pf_policy; /* Pagefault policy flags */
-	unsigned int ft_pm_dump;   /* KGSL enable postmortem dump */
-	unsigned int ft_detect_ms;
-	unsigned int ft_dos_timeout_ms;
 };

 struct kgsl_shadowprop {
@ -234,6 +211,26 @@ struct kgsl_version {
 	unsigned int dev_minor;
 };

+/* Performance counter groups */
+
+#define KGSL_PERFCOUNTER_GROUP_CP 0x0
+#define KGSL_PERFCOUNTER_GROUP_RBBM 0x1
+#define KGSL_PERFCOUNTER_GROUP_PC 0x2
+#define KGSL_PERFCOUNTER_GROUP_VFD 0x3
+#define KGSL_PERFCOUNTER_GROUP_HLSQ 0x4
+#define KGSL_PERFCOUNTER_GROUP_VPC 0x5
+#define KGSL_PERFCOUNTER_GROUP_TSE 0x6
+#define KGSL_PERFCOUNTER_GROUP_RAS 0x7
+#define KGSL_PERFCOUNTER_GROUP_UCHE 0x8
+#define KGSL_PERFCOUNTER_GROUP_TP 0x9
+#define KGSL_PERFCOUNTER_GROUP_SP 0xA
+#define KGSL_PERFCOUNTER_GROUP_RB 0xB
+#define KGSL_PERFCOUNTER_GROUP_PWR 0xC
+#define KGSL_PERFCOUNTER_GROUP_VBIF 0xD
+#define KGSL_PERFCOUNTER_GROUP_VBIF_PWR 0xE
+
+#define KGSL_PERFCOUNTER_NOT_USED 0xFFFFFFFF
+
 /* structure holds list of ibs */
 struct kgsl_ibdesc {
 	unsigned int gpuaddr;
@ -287,7 +284,7 @@ struct kgsl_device_waittimestamp_ctxtid {
 #define IOCTL_KGSL_DEVICE_WAITTIMESTAMP_CTXTID \
 	_IOW(KGSL_IOC_TYPE, 0x7, struct kgsl_device_waittimestamp_ctxtid)

-/* issue indirect commands to the GPU.
+/* DEPRECATED: issue indirect commands to the GPU.
 * drawctxt_id must have been created with IOCTL_KGSL_DRAWCTXT_CREATE
 * ibaddr and sizedwords must specify a subset of a buffer created
 * with IOCTL_KGSL_SHAREDMEM_FROM_PMEM
@ -295,6 +292,9 @@ struct kgsl_device_waittimestamp_ctxtid {
 * timestamp is a returned counter value which can be passed to
 * other ioctls to determine when the commands have been executed by
 * the GPU.
+ *
+ * This fucntion is deprecated - consider using IOCTL_KGSL_SUBMIT_COMMANDS
+ * instead
 */
 struct kgsl_ringbuffer_issueibcmds {
 	unsigned int drawctxt_id;
@ -684,6 +684,202 @@ struct kgsl_gpumem_sync_cache {
 #define IOCTL_KGSL_GPUMEM_SYNC_CACHE \
 	_IOW(KGSL_IOC_TYPE, 0x37, struct kgsl_gpumem_sync_cache)

+/**
+ * struct kgsl_perfcounter_get - argument to IOCTL_KGSL_PERFCOUNTER_GET
+ * @groupid: Performance counter group ID
+ * @countable: Countable to select within the group
+ * @offset: Return offset of the reserved counter
+ *
+ * Get an available performance counter from a specified groupid.  The offset
+ * of the performance counter will be returned after successfully assigning
+ * the countable to the counter for the specified group.  An error will be
+ * returned and an offset of 0 if the groupid is invalid or there are no
+ * more counters left.  After successfully getting a perfcounter, the user
+ * must call kgsl_perfcounter_put(groupid, contable) when finished with
+ * the perfcounter to clear up perfcounter resources.
+ *
+ */
+struct kgsl_perfcounter_get {
+	unsigned int groupid;
+	unsigned int countable;
+	unsigned int offset;
+/* private: reserved for future use */
+	unsigned int __pad[2]; /* For future binary compatibility */
+};
+
+#define IOCTL_KGSL_PERFCOUNTER_GET \
+	_IOWR(KGSL_IOC_TYPE, 0x38, struct kgsl_perfcounter_get)
+
+/**
+ * struct kgsl_perfcounter_put - argument to IOCTL_KGSL_PERFCOUNTER_PUT
+ * @groupid: Performance counter group ID
+ * @countable: Countable to release within the group
+ *
+ * Put an allocated performance counter to allow others to have access to the
+ * resource that was previously taken.  This is only to be called after
+ * successfully getting a performance counter from kgsl_perfcounter_get().
+ *
+ */
+struct kgsl_perfcounter_put {
+	unsigned int groupid;
+	unsigned int countable;
+/* private: reserved for future use */
+	unsigned int __pad[2]; /* For future binary compatibility */
+};
+
+#define IOCTL_KGSL_PERFCOUNTER_PUT \
+	_IOW(KGSL_IOC_TYPE, 0x39, struct kgsl_perfcounter_put)
+
+/**
+ * struct kgsl_perfcounter_query - argument to IOCTL_KGSL_PERFCOUNTER_QUERY
+ * @groupid: Performance counter group ID
+ * @countable: Return active countables array
+ * @size: Size of active countables array
+ * @max_counters: Return total number counters for the group ID
+ *
+ * Query the available performance counters given a groupid.  The array
+ * *countables is used to return the current active countables in counters.
+ * The size of the array is passed in so the kernel will only write at most
+ * size or counter->size for the group id.  The total number of available
+ * counters for the group ID is returned in max_counters.
+ * If the array or size passed in are invalid, then only the maximum number
+ * of counters will be returned, no data will be written to *countables.
+ * If the groupid is invalid an error code will be returned.
+ *
+ */
+struct kgsl_perfcounter_query {
+	unsigned int groupid;
+	/* Array to return the current countable for up to size counters */
+	unsigned int *countables;
+	unsigned int count;
+	unsigned int max_counters;
+/* private: reserved for future use */
+	unsigned int __pad[2]; /* For future binary compatibility */
+};
+
+#define IOCTL_KGSL_PERFCOUNTER_QUERY \
+	_IOWR(KGSL_IOC_TYPE, 0x3A, struct kgsl_perfcounter_query)
+
+/**
+ * struct kgsl_perfcounter_query - argument to IOCTL_KGSL_PERFCOUNTER_QUERY
+ * @groupid: Performance counter group IDs
+ * @countable: Performance counter countable IDs
+ * @value: Return performance counter reads
+ * @size: Size of all arrays (groupid/countable pair and return value)
+ *
+ * Read in the current value of a performance counter given by the groupid
+ * and countable.
+ *
+ */
+
+struct kgsl_perfcounter_read_group {
+	unsigned int groupid;
+	unsigned int countable;
+	unsigned long long value;
+};
+
+struct kgsl_perfcounter_read {
+	struct kgsl_perfcounter_read_group *reads;
+	unsigned int count;
+/* private: reserved for future use */
+	unsigned int __pad[2]; /* For future binary compatibility */
+};
+
+#define IOCTL_KGSL_PERFCOUNTER_READ \
+	_IOWR(KGSL_IOC_TYPE, 0x3B, struct kgsl_perfcounter_read)
+/*
+ * struct kgsl_gpumem_sync_cache_bulk - argument to
+ * IOCTL_KGSL_GPUMEM_SYNC_CACHE_BULK
+ * @id_list: list of GPU buffer ids of the buffers to sync
+ * @count: number of GPU buffer ids in id_list
+ * @op: a mask of KGSL_GPUMEM_CACHE_* values
+ *
+ * Sync the cache for memory headed to and from the GPU. Certain
+ * optimizations can be made on the cache operation based on the total
+ * size of the working set of memory to be managed.
+ */
+struct kgsl_gpumem_sync_cache_bulk {
+	unsigned int *id_list;
+	unsigned int count;
+	unsigned int op;
+/* private: reserved for future use */
+	unsigned int __pad[2]; /* For future binary compatibility */
+};
+
+#define IOCTL_KGSL_GPUMEM_SYNC_CACHE_BULK \
+	_IOWR(KGSL_IOC_TYPE, 0x3C, struct kgsl_gpumem_sync_cache_bulk)
+
+/*
+ * struct kgsl_cmd_syncpoint_timestamp
+ * @context_id: ID of a KGSL context
+ * @timestamp: GPU timestamp
+ *
+ * This structure defines a syncpoint comprising a context/timestamp pair. A
+ * list of these may be passed by IOCTL_KGSL_SUBMIT_COMMANDS to define
+ * dependencies that must be met before the command can be submitted to the
+ * hardware
+ */
+struct kgsl_cmd_syncpoint_timestamp {
+	unsigned int context_id;
+	unsigned int timestamp;
+};
+
+#define KGSL_CMD_SYNCPOINT_TYPE_TIMESTAMP 0
+
+struct kgsl_cmd_syncpoint_fence {
+	int fd;
+};
+
+#define KGSL_CMD_SYNCPOINT_TYPE_FENCE 1
+
+/**
+ * struct kgsl_cmd_syncpoint - Define a sync point for a command batch
+ * @type: type of sync point defined here
+ * @priv: Pointer to the type specific buffer
+ * @size: Size of the type specific buffer
+ *
+ * This structure contains pointers defining a specific command sync point.
+ * The pointer and size should point to a type appropriate structure.
+ */
+struct kgsl_cmd_syncpoint {
+	int type;
+	void __user *priv;
+	unsigned int size;
+};
+
+/**
+ * struct kgsl_submit_commands - Argument to IOCTL_KGSL_SUBMIT_COMMANDS
+ * @context_id: KGSL context ID that owns the commands
+ * @flags:
+ * @cmdlist: User pointer to a list of kgsl_ibdesc structures
+ * @numcmds: Number of commands listed in cmdlist
+ * @synclist: User pointer to a list of kgsl_cmd_syncpoint structures
+ * @numsyncs: Number of sync points listed in synclist
+ * @timestamp: On entry the a user defined timestamp, on exist the timestamp
+ * assigned to the command batch
+ *
+ * This structure specifies a command to send to the GPU hardware.  This is
+ * similar to kgsl_issueibcmds expect that it doesn't support the legacy way to
+ * submit IB lists and it adds sync points to block the IB until the
+ * dependencies are satisified.  This entry point is the new and preferred way
+ * to submit commands to the GPU.
+ */
+
+struct kgsl_submit_commands {
+	unsigned int context_id;
+	unsigned int flags;
+	struct kgsl_ibdesc __user *cmdlist;
+	unsigned int numcmds;
+	struct kgsl_cmd_syncpoint __user *synclist;
+	unsigned int numsyncs;
+	unsigned int timestamp;
+/* private: reserved for future use */
+	unsigned int __pad[4];
+};
+
+#define IOCTL_KGSL_SUBMIT_COMMANDS \
+	_IOWR(KGSL_IOC_TYPE, 0x3D, struct kgsl_submit_commands)
+
 #ifdef __KERNEL__
 #ifdef CONFIG_MSM_KGSL_DRM
 int kgsl_gem_obj_addr(int drm_fd, int handle, unsigned long *start,