In processor architectures that support paging, there are usually one or more TLBs or pagewalk caches to cache address translations. On x86, these translation caches are not coherent with memory accesses that modify the page tables. Add in prefetching, out-of-order speculative execution, and a desire to minimize the software overhead, and you end up with very tricky rules regarding when and how the various paging caches need to be invalidated.
Sections 22.214.171.124 and 126.96.36.199 of Intel’s manual details the recommended invalidations and specific cases where invalidations may be omitted.
Right now, I’m interested in just one particular case:
Invalidation needed after modifying a valid page table entry
The case relevant here involves changing a page table mapping from one valid mapping to another, then using it without invalidation. Because a processor can prefetch anything at any time, it is impossible to guarantee that the old mapping is not cached in the TLB without invalidating after the page table update.
Section 7.3.1 of AMD’s manual makes an explicit mention of this case:
An example of this type of a situation is a page-table update followed by accesses to the physical pages referenced by the updated page tables. The following sequence of events shows what can happen when software changes the translation of virtual-page A from physical-page M to physical-page N:
1. Software invalidates the TLB entry. The tables that translate virtual-page A to physical-page M are now held only in main memory. They are not cached by the TLB.
2. Software changes the page-table entry for virtual-page A in main memory to point to physical-page N rather than physical-page M.
3. Software accesses data in virtual-page A.
During Step 3, software expects the processor to access the data from physical-page N. However, it is possible for the processor to prefetch the data from physical-page M before the page table for virtual-page A is updated in Step 2. This is because the physical-memory references for the page tables are different than the physical-memory references for the data. Because the physical-memory references are different, the processor does not recognize them as requiring coherency checking and believes it is safe to prefetch the data from virtual-page A, which is translated into a read from physical page M. Similar behavior can occur when instructions are prefetched from beyond the page table update instruction.
To prevent this problem, software must use an INVLPG or MOV CR3 instruction immediately after the page-table update to ensure that subsequent instruction fetches and data accesses use the correct virtual-page-to-physical-page translation. It is not necessary to perform a TLB invalidation operation preceding the table update.
Well, Windows 9x does this…
There is code in Windows 95 through Me that performs the incorrect sequence of operations mentioned in the AMD manual. Here is an example from VMM.VXD from Windows 98:
This code changes a page table entry mapping without changing its metadata [11:0] bits, then immediately uses it in the string copy (rep movs) instruction’s source string (ds:[esi]). Because there was no invalidation (nor serializing instruction) between the page table update and using the new mapping, it is possible for the old mapping to be used in the string copy on processors that do not have coherent pagewalks. This causes the wrong data to be copied, and results in random crashes. This begs the question of whether real processors have coherent pagewalks even though the instruction set specification does not require coherence.
The above code is found in VMM.VXD. Note that VMM.VXD is one component inside VMM32.VXD, which is an archive of a bunch of VXDs, of which VMM.VXD is one.