A Cardboard Haswell Box

This post is about a cardboard computer I built in 2015. It served as half of my simulation cluster during the latter half of my Ph.D. work. This is a continuation of a series of cardboard computer cases I’ve built (2014/2012, 2010). Compared to the previous boxes, this one packs even more systems (8 quad-core Haswells) into a box, while still sharing one power supply. . . . → Read More: A Cardboard Haswell Box

The Microarchitecture Behind Meltdown

Since the recent (Jan. 2018) disclosure of the Meltdown vulnerability, there has been a lot of interest, speculation, and hysteria, but not a particularly good understanding of the processor microarchitecture feature responsible for it. Understanding of the root cause of the vulnerability allows one to understand why only some microarchitectures are affected, and allows reliably testing for the existence (or, even harder, the non-existence) of the vulnerability on various processors, instead of relying solely on vendor self-reporting (or worse, speculation…).

This article first defines the microarchitectural mechanism that allows Meltdown to work, then develops a microbenchmark to specifically test for this behaviour on multiple microarchitectures.

. . . → Read More: The Microarchitecture Behind Meltdown

PET is hygroscopic: Water diffuses out of a Sprite bottle

Polyethylene terephthalate (PET) is a plastic commonly used to make beverage bottles. PET is hygroscopic . Therefore, if you have a bottle of Sprite, you would expect the water to be absorbed by the plastic, diffuse through the bottle, then evaporate outside the bottle.

And that’s exactly what happens.

. . . → Read More: PET is hygroscopic: Water diffuses out of a Sprite bottle

Microbenchmarking Return Address Branch Prediction

Modern processors use branch predictors to predict a program’s control flow in order to execute further ahead in the instruction stream. Function return instructions use a specialized branch predictor called a Return Address Stack (RAS), Return Stack Buffer (RSB), return stack, or other various names. This article presents a series of increasingly complex microbenchmarks to measure the behaviour of the RAS found in several Intel and AMD processor microarchitectures. . . . → Read More: Microbenchmarking Return Address Branch Prediction

AMD Polaris (RX 550) on Mageia Linux 6

I recently bought an AMD RX 550 graphics card to drive a 4K display for use with Mageia Linux 6. Not surprisingly, it did not work out of the box. The Mageia 6 kernel (4.9.43 as of Aug 2017) needs to be updated to avoid the amdgpu module from crashing, and needs further kernel updates . . . → Read More: AMD Polaris (RX 550) on Mageia Linux 6

Cyclone/Stratix V Carry-Lookahead…bug?

New in the Cyclone V and Stratix V families, there seems to be carry lookahead at the LAB level (10 ALMs or 20 bits of addition), which can be seen by looking at the timing path through a big adder. But if there are two unrelated adders in a circuit, the carry lookahead is disabled if the length difference is near a multiple of 10, causing 70% higher delay for long adders. Bug? . . . → Read More: Cyclone/Stratix V Carry-Lookahead…bug?

BIND patch for Mandriva 2010.2

Yay, another security vulnerability in old software, this time in many versions of BIND. Here’s an RPM for Mandriva 2010.2 that upgrades BIND to 9.9.7 P2 (from 9.7.6). . . . → Read More: BIND patch for Mandriva 2010.2

TLB and Pagewalk Coherence in x86 Processors

Processors that support paging use TLBs to cache translations. On x86, translation caches are not coherent and requires software to explicitly invalidate a TLB entry after updating a page table entry. Similarly, pagewalks are not guaranteed to be coherent, so modifying a page table entry must be followed by an invalidation even if the page table entry is not cached in the TLB.

Real processor implementations do not provide TLB coherence, but it turns out many (but not all) processors actually do provide pagewalk coherence. Most provide pagewalk coherence by detecting when page table entry update conflicts with a pagewalk’s memory accesses, but some provide coherence by disallowing speculative pagewalks, at some performance cost. I show a microbenchmark that can test for TLB and pagewalk coherence and whether speculative pagewalks are used.

. . . → Read More: TLB and Pagewalk Coherence in x86 Processors

Windows 9x TLB Invalidation Bug

In processor architectures that support paging, there are usually one or more TLBs or pagewalk caches to cache address translations. On x86, these translation caches are not coherent with memory accesses that modify the page tables, and need invalidating after a page table entry is modified.

The Windows 9x kernel contains code that modifies a page table entry, then immediately uses it without an invalidation. This causes crashes if the processor strictly follows the instruction set specification and does not provide pagewalk coherence.

. . . → Read More: Windows 9x TLB Invalidation Bug

ASRock H81M-ITX Overclocking

The ASRock H81M-ITX does support adjusting multiplier ratios for K series Haswell processors. Oddly, this works in BIOS version 1.90, but not version 2.00. My boards came with version 2.00, and I had to downgrade to 1.90.

The four-phase VRM does make a small amount of noise and gets rather hot. There are no heatsinks . . . → Read More: ASRock H81M-ITX Overclocking