Windows 9x TLB Invalidation Bug

In processor architectures that support paging, there are usually one or more TLBs or pagewalk caches to cache address translations. On x86, these translation caches are not coherent with memory accesses that modify the page tables. Add in prefetching, out-of-order speculative execution, and a desire to minimize the software overhead, and you end up with very tricky rules regarding when and how the various paging caches need to be invalidated.

Sections and of Intel’s manual details the recommended invalidations and specific cases where invalidations may be omitted.

Right now, I’m interested in just one particular case:

Invalidation needed after modifying a valid page table entry

The case relevant here involves changing a page table mapping from one valid mapping to another, then using it without invalidation. Because a processor can prefetch anything at any time, it is impossible to guarantee that the old mapping is not cached in the TLB without invalidating after the page table update.

Section 7.3.1 of AMD’s manual makes an explicit mention of this case:

An example of this type of a situation is a page-table update followed by accesses to the physical pages referenced by the updated page tables. The following sequence of events shows what can happen when software changes the translation of virtual-page A from physical-page M to physical-page N:

1. Software invalidates the TLB entry. The tables that translate virtual-page A to physical-page M are now held only in main memory. They are not cached by the TLB.

2. Software changes the page-table entry for virtual-page A in main memory to point to physical-page N rather than physical-page M.

3. Software accesses data in virtual-page A.

During Step 3, software expects the processor to access the data from physical-page N. However, it is possible for the processor to prefetch the data from physical-page M before the page table for virtual-page A is updated in Step 2. This is because the physical-memory references for the page tables are different than the physical-memory references for the data. Because the physical-memory references are different, the processor does not recognize them as requiring coherency checking and believes it is safe to prefetch the data from virtual-page A, which is translated into a read from physical page M. Similar behavior can occur when instructions are prefetched from beyond the page table update instruction.

To prevent this problem, software must use an INVLPG or MOV CR3 instruction immediately after the page-table update to ensure that subsequent instruction fetches and data accesses use the correct virtual-page-to-physical-page translation. It is not necessary to perform a TLB invalidation operation preceding the table update.

Well, Windows 9x does this…

There is code in Windows 95 through Me that performs the incorrect sequence of operations mentioned in the AMD manual. Here is an example from VMM.VXD from Windows 98:

Excerpt of code from VMM.VXD showing remapping of a valid page table entry, then immediately using it without an invalidation.

This code changes a page table entry mapping without changing its metadata [11:0] bits, then immediately uses it in the string copy (rep movs) instruction’s source string (ds:[esi]). Because there was no invalidation (nor serializing instruction) between the page table update and using the new mapping, it is possible for the old mapping to be used in the string copy on processors that do not have coherent pagewalks. This causes the wrong data to be copied, and results in random crashes. This begs the question of whether real processors have coherent pagewalks even though the instruction set specification does not require coherence.

The above code is found in VMM.VXD. Note that VMM.VXD is one component inside VMM32.VXD, which is an archive of a bunch of VXDs, of which VMM.VXD is one.

2 comments to Windows 9x TLB Invalidation Bug

    • Henry

      Very interesting… I wonder why I didn’t notice it in FreeBSD 9.3 though. Is pmap_copy_pages() used rarely enough that it might never have executed when booting up to the text login prompt? It seems like pmap_copy_page() is used more often?

      I looked at the code before the change (r268327 for i386). I see the remapping bug in pmap_copy_pages (if the loop runs for two or more iterations), but I think pmap_copy_page was actually bug-free. The key difference is that the page mappings are not present (they’re 0) at the moment invlpg occurs, and invalid mappings can’t be prefetched after invlpg.

      (AMD manual Section 5.5.2 “If a table entry is updated to remove a permission violation,
      …an invalidation is not required”; Intel manual “If a paging-structure entry is modified to change the P flag from 0 to 1, no invalidation is necessary.”)

      Also, I think it might run slightly faster if the two invlpgs were bunched together (after both remappings are done, but before using them). There’s more opportunity for ILP if the computation for the second PTE weren’t caught in between two pipeline flushes. I haven’t benchmarked this though.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>