Microbenchmarking Return Address Branch Prediction

Modern processors use branch predictors to predict a program’s control flow in order to execute further ahead in the instruction stream. Function return instructions use a specialized branch predictor called a Return Address Stack (RAS), Return Stack Buffer (RSB), return stack, or other various names. This article presents a series of increasingly complex microbenchmarks to measure the behaviour of the RAS found in several Intel and AMD processor microarchitectures. . . . → Read More: Microbenchmarking Return Address Branch Prediction

AMD Polaris (RX 550) on Mageia Linux 6

I recently bought an AMD RX 550 graphics card to drive a 4K display for use with Mageia Linux 6. Not surprisingly, it did not work out of the box. The Mageia 6 kernel (4.9.43 as of Aug 2017) needs to be updated to avoid the amdgpu module from crashing, and needs further kernel updates . . . → Read More: AMD Polaris (RX 550) on Mageia Linux 6

Cyclone/Stratix V Carry-Lookahead…bug?

New in the Cyclone V and Stratix V families, there seems to be carry lookahead at the LAB level (10 ALMs or 20 bits of addition), which can be seen by looking at the timing path through a big adder. But if there are two unrelated adders in a circuit, the carry lookahead is disabled if the length difference is near a multiple of 10, causing 70% higher delay for long adders. Bug? . . . → Read More: Cyclone/Stratix V Carry-Lookahead…bug?

BIND patch for Mandriva 2010.2

Yay, another security vulnerability in old software, this time in many versions of BIND. Here’s an RPM for Mandriva 2010.2 that upgrades BIND to 9.9.7 P2 (from 9.7.6). . . . → Read More: BIND patch for Mandriva 2010.2

TLB and Pagewalk Coherence in x86 Processors

Processors that support paging use TLBs to cache translations. On x86, translation caches are not coherent and requires software to explicitly invalidate a TLB entry after updating a page table entry. Similarly, pagewalks are not guaranteed to be coherent, so modifying a page table entry must be followed by an invalidation even if the page table entry is not cached in the TLB.

Real processor implementations do not provide TLB coherence, but it turns out many (but not all) processors actually do provide pagewalk coherence. Most provide pagewalk coherence by detecting when page table entry update conflicts with a pagewalk’s memory accesses, but some provide coherence by disallowing speculative pagewalks, at some performance cost. I show a microbenchmark that can test for TLB and pagewalk coherence and whether speculative pagewalks are used.

. . . → Read More: TLB and Pagewalk Coherence in x86 Processors

Windows 9x TLB Invalidation Bug

In processor architectures that support paging, there are usually one or more TLBs or pagewalk caches to cache address translations. On x86, these translation caches are not coherent with memory accesses that modify the page tables, and need invalidating after a page table entry is modified.

The Windows 9x kernel contains code that modifies a page table entry, then immediately uses it without an invalidation. This causes crashes if the processor strictly follows the instruction set specification and does not provide pagewalk coherence.

. . . → Read More: Windows 9x TLB Invalidation Bug

ASRock H81M-ITX Overclocking

The ASRock H81M-ITX does support adjusting multiplier ratios for K series Haswell processors. Oddly, this works in BIOS version 1.90, but not version 2.00. My boards came with version 2.00, and I had to downgrade to 1.90.

The four-phase VRM does make a small amount of noise and gets rather hot. There are no heatsinks . . . → Read More: ASRock H81M-ITX Overclocking

Bash bug patch for older Mandriva distros

Bash has bugs. Unfortunately, the bash parser was exposed through environment function importing, which had the potential for remotely exploiting the parser bugs. There’s been a series of patches for these issues. I’ve compiled packages of bash 4.3 for older versions of Mandriva Linux. . . . → Read More: Bash bug patch for older Mandriva distros

RTL8192CU and Linux 3.13.10

Hardware: TP-Link TL-WN823N v1.1

Status: As of kernel 3.13.10, the in-tree rtl8192cu driver (in drivers/net/wireless/rtlwifi) is still broken. It will work and connect, but will silently disconnect after some time (and light traffic?). There is also some packet loss (around 1%). The driver provided by Realtek (8192cu) works much better. . . . → Read More: RTL8192CU and Linux 3.13.10

Store-to-Load Forwarding and Memory Disambiguation in x86 Processors

In pipelined processors, instruction are executed speculatively and are not permitted to modify system state until instruction commit. For stores to memory, speculative stores write into a store queue at execution time and only write into cache after the store instructions have committed. Out of order memory execution requires hardware that learns dependencies between stores and loads, and also the ability to forward stored values from the store queue to loads that depend on them. I describe two variations of a microbenchmark that can measure some aspects of store-to-load forwarding and the memory execution hardware. These showed that AMD’s Bulldozer and Piledriver processors likely do not use a dynamic memory dependence predictor. They were also used to generate interesting 2D charts that can reveal some details about how the memory execution hardware might be designed. . . . → Read More: Store-to-Load Forwarding and Memory Disambiguation in x86 Processors