iommu/exynos: apply workaround of caching fault page table entries

This patch contains 2 workaround for the System MMU v3.x.

System MMU v3.2 and v3.3 has FLPD cache that caches first level page
table entries to reduce page table walking latency. However, the
FLPD cache is filled with a first level page table entry even though
it is not accessed by a master H/W because System MMU v3.3
speculatively prefetches page table entries that may be accessed
in the near future by the master H/W.
The prefetched FLPD cache entries are not invalidated by iommu_unmap()
because iommu_unmap() only unmaps and invalidates the page table
entries that is mapped.

Because exynos-iommu driver discards a second level page table when
it needs to be replaced with another second level page table or
a first level page table entry with 1MB mapping, It is required to
invalidate FLPD cache that may contain the first level page table
entry that points to the second level page table.

Another workaround of System MMU v3.3 is initializing the first level
page table entries with the second level page table which is filled
with all zeros. This prevents System MMU prefetches 'fault' first
level page table entry which may lead page fault on access to 16MiB
wide.

System MMU 3.x fetches consecutive page table entries by a page
table walking to maximize bus utilization and to minimize TLB miss
panelty.
Unfortunately, functional problem is raised with the fetching behavior
because it fetches 'fault' page table entries that specifies no
translation information and that a valid translation information will
be written to in the near future. The logic in the System MMU generates
page fault with the cached fault entries that is no longer coherent
with the page table which is updated.

There is another workaround that must be implemented by I/O virtual
memory manager: any two consecutive I/O virtual memory area must have
a hole between the two that is larger than or equal to 128KiB.
Also, next I/O virtual memory area must be started from the next
128KiB boundary.

0            128K           256K               384K             512K
|-------------|---------------|-----------------|----------------|
|area1---------------->|.........hole...........|<--- area2 -----

The constraint is depicted above.
The size is selected by the calculation followed:
 - System MMU can fetch consecutive 64 page table entries at once
   64 * 4KiB = 256KiB. This is the size between 128K ~ 384K of the
   above picture. This style of fetching is 'block fetch'. It fetches
   the page table entries predefined consecutive page table entries
   including the entry that is the reason of the page table walking.
 - System MMU can prefetch upto consecutive 32 page table entries.
   This is the size between 256K ~ 384K.

Signed-off-by: Cho KyongHo <pullip.cho@samsung.com>
1 file changed