AMD Bulldozer/Piledriver Modules and Hyper-Threading

Ever since Intel’s Hyper-Threading and AMD’s Bulldozer modules, there has been much debate on what qualifies as a real CPU “core”. Unfortunately, I don’t think “core” is easy to define, so marketing tends to name things for their own benefit. In the end, it’s the performance that matters, not the name. Two-way Hyper-Threading gives around 23% improvement over one thread, while two-way multithreading in a “module” gives 54%. This is still quite far from >90% that replicating the entire CPU core would achieve . . . → Read More: AMD Bulldozer/Piledriver Modules and Hyper-Threading

More Cardboard Boxes

After having built small cardboard cases for single computers, I tried building single cases for multiple machines. The idea is to share the power supply between four systems to reduce cost and increase packing density. Here are some pictures of two such systems I built . . . → Read More: More Cardboard Boxes

Measuring Reorder Buffer Capacity

On conventional out of order processors, instructions are not necessarily executed in “program order”, although the processor must give the same results as though execution occurred in program order. The instruction window contains a small window of instructions that are allowed to execute out of order, before the instructions are committed in program order as they leave the instruction window. This article describes a microbenchmark that can measure the size of the instruction window, demonstrated on several x86 microarchitectures, then extends the microbenchmark to measure the speculative register file size. . . . → Read More: Measuring Reorder Buffer Capacity

Intel Ivy Bridge Cache Replacement Policy

Caches are used to store a subset of a larger memory space in a smaller, faster memory, with the hope that future memory accesses will find their data in the cache. Traditionally, caches have used (approximations of) the least-recently used (LRU) replacement policy, but LRU performs poorly with cyclic access patterns with working sets larger than the cache. Intel Ivy Bridge’s L3 cache uses an improved adaptive replacement policy, and is no longer purely pseudo-LRU . . . → Read More: Intel Ivy Bridge Cache Replacement Policy

A Comparison of Intel’s 32nm and 22nm Core i5 CPUs: Power, Voltage, Temperature, and Frequency

Each new generation of CMOS manufacturing processes brings about a new set of trade-offs. Intel’s recent tradition of manufacturing the same processor microarchitecture across two processes provides an opportunity to measure some of the voltage-delay-power scaling trends. The 22nm Ivy Bridge significantly improves on static (leakage) power over 32nm Sandy Bridge, but only shows small reductions in dynamic power. Ivy Bridge also requires higher voltage increases for the same frequency increase. Also, thermal resistance of Ivy Bridge increased over Sandy Bridge, likely due to the change from solder to polymer thermal interface material. . . . → Read More: A Comparison of Intel’s 32nm and 22nm Core i5 CPUs: Power, Voltage, Temperature, and Frequency

JMicron JMB363 Add-on Card AHCI mode

The JMicron JMB363 is a 2-port SATA + 1-port PATA controller chip often found embedded in motherboards and in low-cost add-on cards. The chip supports operating in IDE, AHCI, and RAID controller modes. Motherboard BIOSes allow choosing the operating mode, but add-on cards are stuck in RAID mode. I attempt to solve this problem by hacking the JMB363 option ROM to put the card into AHCI mode . . . → Read More: JMicron JMB363 Add-on Card AHCI mode

Compiling a Contrived Chunk of Code

While crafting some C code to stress integer ALU bandwidth, I decided I would compile the code through various compilers to see what would come out. The code is a hand-unrolled loop with 5 independent chains of dependent ALU operations (add, and) designed to provide many independent ALU instructions for the integer core to execute. Even for this simple repetitive code, the best (Intel C Compiler) compiler produces code that runs 26% faster than the worst (llvm-clang). . . . → Read More: Compiling a Contrived Chunk of Code

OS X Process Scheduling

Earlier, I wrote about the SMT-awareness of the CFQ and BFS schedulers on Linux. Here, I do a similar test on the Mac OS X process scheduler.

System Core i7-3770K 3.5 GHz, 4 cores, 8 threads (2-way SMT) Mageia Linux, kernel 3.4.4, CFQ scheduler Mac OS X 10.7 (Update: Also 10.8) Workload: Independent ALU instructions . . . → Read More: OS X Process Scheduling

Replacing VIA HD Audio Codec Chip

Gigabyte’s new UEFI BIOS is particularly well-suited for building Hackintoshes. However, many of Gigabyte’s recent motherboards, including all of the MicroATX Z77 and H77 boards, use the VIA VT2021 HD Audio codec chip, which is not well-supported. Since I’m building a Hackintosh with a GA-Z77M-D3H with VIA VT2021 chip, I decided to work around the audio issues by swapping the VT2021 with a Realtek ALC885 chip. . . . → Read More: Replacing VIA HD Audio Codec Chip

Intel HD4000 QE/CI Acceleration

Graphics acceleration (Core Image, Quartz Extreme) for Intel HD Graphics 4000 (on Ivy Bridge processors) works in Mac OS X! Setting the AAPL,ig-platform-id device property is required to get the drivers to load. . . . → Read More: Intel HD4000 QE/CI Acceleration