Each new generation of CMOS manufacturing processes brings about a new set of trade-offs. Intel’s recent tradition of manufacturing the same processor microarchitecture across two processes provides an opportunity to look at some of the voltage-delay-power scaling trends. Intel’s new Ivy Bridge processor manufactured on a 22nm tri-gate CMOS process, which is a significant change from the planar transistors used in previous processes. Intel’s previous-generation Sandy Bridge processor made on their 32nm planar CMOS process uses a similar architecture, and can be used as a point of comparison.
In a complete system, a processor’s power consumption, voltage, temperature, and operating frequency can be observed, while the latter three can be controlled. Using those tools, we can measure static and dynamic power as a function of temperature, frequency, and voltage, create shmoo plots (voltage vs. operating frequency), and compare overall thermal resistance.
There have been some rumblings that Ivy Bridge does not overclock as well as Sandy Bridge. On the other hand, Intel claims the 22nm process improves performance over 32nm. Another difference between the two processors is the switch from using solder thermal interface material (STIM) to polymer (PTIM), resulting in increased thermal resistance and higher junction temperatures on Ivy Bridge for the same power. A comparison of the measurements across Sandy Bridge and Ivy Bridge can quantify some of these observations.
- Core i5-2500K (32nm Sandy Bridge) and i5-3570K (22nm Ivy Bridge)
- Biostar TZ77MXE motherboard
The TZ77MXE motherboard allows adjustment of processor frequency (multiplier) and voltage, although it does not allow manual adjustment of voltage below 1.0V, or negative voltage offsets.
Power consumption is measured using multimeters on the 12V power connector to measure current and voltage. On modern ATX motherboards, CPU and GPU power regulators are powered by the 12V power connector, which gives a convenient place to measure the current and voltage consumed by the processor package excluding the rest of the system. Power is measured before the processor’s voltage converter. DC-DC converters are typically efficient, so I make no attempt to compensate for it. Voltage is measured at the connector after the ammeter’s voltage drop to reduce the skew caused by the ammeter’s resistance. Switching converters are generally tolerant of varying input voltages (10.9V to 12.2V observed) with minimal impact on efficiency.
The processor’s operating voltage is measured using the on-board IT8728F chip. Temperature is measured using coretemp (CPU on-die temperature sensors), reporting the temperature of the hottest core (usually core 2, cores numbered 0-3).
To control processor operating frequency, I changed only the multiplier while leaving BCLK at 100 MHz. Core voltage is controlled by setting a fixed voltage in the BIOS. I rely on the measured voltage rather than the voltage setting because the actual voltage can vary based on the load line (a mechanism that lowers supply voltage under high load to reduce the peak voltage swing) or “load line calibration” (a mechanism to defeat the load line). Processor temperature is controlled by lowering the cooling fan speed to raise the temperature.
Power consumption (switching activity?) depends strongly on the choice of workload. Power and temperature measurements are made when all four cores of the processor are active running the Prime95 torture test. Prime95 is able to sanity-check its own calculations, so it is also used to check for processor stability when generating a shmoo plot.
Power and Temperature
I measure power consumption vs. temperature first, since its results can be used to compensate for varying temperature in later measurements. For both processors, I measure total power at 1.26V at both 1.6 GHz and 2.4 GHz. Total power can be broken down into two components: Static power that does not vary with switching frequency, and dynamic power that varies with switching frequency. Assuming dynamic power scales linearly with frequency, measuring at two frequencies allows extrapolating power consumption down to 0 Hz to separate out dynamic power and static power.
Figure 1a and 1b shows power vs. temperature for Sandy Bridge (SNB) and Ivy Bridge (IVB), respectively. Total power is plotted, as well as the extrapolated static power. Figure 1b plots both Ivy Bridge’s and Sandy Bridge’s static power for comparison. Dynamic power does not depend on temperature, since the 1.6 GHz and 2.4 GHz curves are parallel. The extrapolated static power curve includes data points from both curves translated downwards by twice and three times the difference in power between the two curves. The extrapolated static power data points fits an exponential function very well, which agrees with theory that says leakage power typically grows exponentially with temperature. Ivy Bridge shows a significant improvement in static (leakage) power. One of the claimed benefits of multi-gate transistors is better channel control resulting in a better subthreshold slope and lower subthreshold leakage, and this measurement agrees.
Dynamic Power vs. Frequency
Another classic textbook result is that dynamic power scales linearly with frequency. Figures 2a and 2b show measurements of total power and dynamic power for Sandy Bridge and Ivy Bridge, at 1.26V. Total power consumption is measured, while dynamic power is calculated by subtracting out the temperature-dependent static power found in the previous section.
The dynamic power curve fits a linear trendline very well. The intercept of the dynamic power trendline is expected to be zero (no dynamic power when no switching activity). A non-zero intercept for the trendline indicates some amount of experimental error, around half a watt in these plots. The red curves of total power has a slight upwards curve because total power (static power, but not dynamic power) increases with temperature.
Figure 2b includes the dynamic power curves for both processors for comparison. At 1.26V (an arbitrary voltage somewhat higher than the typical operating point), dynamic power for Ivy Bridge is only slightly lower (~6%). The main objective of this graph was to show that dynamic power increases linearly with frequency. The next section shows how dynamic power scales with processor supply voltage.
Power vs. Supply Voltage
The textbook formula says that dynamic power should be proportional to the square of the supply voltage. This section describes the same measurement. I vary the processor supply voltage while keeping frequency and temperature constant. Like earlier, dynamic and static power is separated by measuring power consumption at 1.6 and 2.4 GHz. I keep temperature constant at 90°C because it is easy to raise the operating temperature by slowing down the cooling fan, but very difficult to lower it. The resulting measurements will show how dynamic power scales with supply voltage and how static power scales with supply voltage at a fixed 90°C temperature.
Figures 3a and 3b show the results of these measurements for Sandy Bridge and Ivy Bridge, respectively.
The top two curves in each figure are direct measurements of total processor power at 1.6 and 2.4 GHz. Since total power includes both static and dynamic power, we need to break total power into static and dynamic components before curve fitting. Because temperature is kept constant, each pair of data points at a given voltage have the same static power, so static power can be computed as above, by taking the difference between total power at 1.6 and 2.4 GHz, independently for each voltage, giving the green static power curve. Dynamic power is then computed by subtracting static power from the total power.
For Sandy Bridge (Fig. 3a), the dynamic power fits a power curve well, and comes surprisingly close to the expected quadratic relation, Pdynamic ∝ V2. Static power also fits a power curve (although I’m not aware of theory that requires it), where static power increases roughly as the cube of the voltage.
On Ivy Bridge (Fig. 3b), the curve fits are somewhat unexpected. Static power grows much slower than on Sandy Bridge (roughly Pstatic ∝ V1.85 instead of V3), but dynamic power grows slightly more quickly with voltage (Pdynamic ∝ V2.3 compared to V2). A comparison of just the 2.4 GHz dynamic power and static power is plotted in Fig. 3c. Dynamic power on Ivy Bridge is lower for all practical voltages (the curve fit suggests Ivy Bridge dynamic power will exceed Sandy Bridge above 1.9V).
I speculate that these differences (slower static power increase, but slightly higher dynamic power increase with voltage) are properties of tri-gate processes, but I don’t know enough about the differences between planar and tri-gate to know whether these observations match with theory.
Voltage-Frequency Shmoo Plot
The primary knob for increasing the frequency of a processor is increasing its operating voltage. A shmoo plot characterizes the voltage-frequency relationship by testing a processor at various voltage and frequencies and recording which points function correctly (“pass”) and which do not (“fail”). The boundary between the pass and fail points indicate the lowest voltage at a given frequency (or, alternatively, highest frequency at a given voltage) at which that the processor can still operate, which would correlate to how easily one can overclock the processor.
Unlike the rest of the measurements, the shmoo plots are made while only using one processor core with three cores idle. Prime95 was run on the slowest of the four cores, and a particular voltage and frequency is considered “pass” if Prime95 runs for around 10 minutes without error. The shmoo plots are slightly optimistic: A real-world usage scenario with four active cores instead of one usually requires higher voltage and causes higher temperatures, further reducing achievable frequency. Although running just one active core reduces the effect of temperature (by reducing the temperature change), I do not measure or compensate for the impact of temperature on maximum frequency.
Figures 4a and 4b show the shmoo plots for Sandy Bridge and Ivy Bridge, respectively. Additionally, a line was drawn that connects the lowest voltage that passes at each frequency, which approximates the boundary between the “pass” and “fail” points. Figure 4c shows a comparison of Sandy Bridge and Ivy Bridge. The two boundary lines from Figures 4a and 4b are plotted in Figure 4c. It is interesting that the slope of the Ivy Bridge curve (blue) is higher than for Sandy Bridge. Although Ivy Bridge is significantly faster than Sandy Bridge at low voltages, increasing the operating frequency requires a larger voltage increase on Ivy Bridge, such that the two chips require the same voltage (1.32V) to run at 4.5 GHz. This would suggest that overclocking Ivy Bridge beyond this point is somewhat more difficult, even though Ivy Bridge is faster/lower voltage at the lower non-overclocked frequency (below 3.8-3.9 GHz).
One might recall Intel’s initial presentations on their 22nm process showing charts showing performance and/or voltage improvements over their 32nm process. One such graph is reproduced in the left half of Figure 5. Intel’s chart is interesting: The performance and voltage gains claimed are indeed impressive, but the gain decreases at nigher voltages (37% faster at 0.7V, 18% faster at 1.0V), but the typical operating point for the desktop processors is beyond the right edge of the chart (even before overclocking). Is there something unpleasant about the higher (typical!) voltages that Intel didn’t want to mention?
Subject to a few important caveats, Intel’s chart of voltage vs. gate delay is equivalent to a shmoo plot. One caveat is that Intel’s chart shows low-level transistor delays, while a shmoo plot shows the delay of a more complex circuit. In addition, a complex circuit consists of both transistor delay and interconnect delay, so it is expected that performance gains seen at the transistor level will be smaller when applied to a whole processor because interconnect delays are expected to become worse with each process shrink.
Given the above caveats, I have attempted to transform the shmoo plot (by plotting delay instead of its reciprocal, frequency) and overlay that onto Intel’s chart in Figure 5. Notice that the voltage range I was able to test is actually entirely off the right edge of Intel’s chart. My shmoo plot seems to match up reasonably well with Intel’s plot. Although performance improvements at low-voltage are high, the improvement shrinks to around 5 percent at typical operating voltages, and performance improve even turns into a performance loss at higher voltages seen when overclocking.
The ability to cool a processor is determined by its thermal resistance. Power is dissipated at the bottom side of the chip, with most of the heat being dissipated through the top side. Most of the heat must pass through the silicon die, heat spreader, heatsink, then out to air, with some form of thermal interface material in the interface between each of those. The overall thermal resistance can be measured by measuring the power dissipation and total temperature difference between the on-die temperature sensors and ambient air.
There are two main reasons why Sandy Bridge and Ivy Bridge may have different thermal resistance. First, as chips are scaled smaller, power dissipation does not scale as much, leading to higher power density. Ivy Bridge’s die size (160 mm2) is 26% smaller than Sandy Bridge’s (216 mm2), reducing the contact surface area between the die and heat spreader. Second, Intel has switched from using solder between the die and heat spreader (solder thermal interface material, STIM, ~87 W/mK) to a polymer material (PTIM, 3-4 W/mK), presumably because Ivy Bridge’s reduced power dissipation is now comfortably within the range suitable for using PTIM (See Figure 16).
Thermal resistance is measured with all four cores active (fewer active cores results in a hot spot). The stock thermal paste, heatsink, and cooling fan are used on both processors. The cooling fan is kept at its maximum constant speed (around 2050 RPM), and power dissipation is varied by changing the CPU supply voltage.
Figure 6 shows a measurement of the thermal resistance on both processors. On both processors, thermal resistance improves somewhat at higher power. The thermal resistance of Ivy Bridge is around 0.15 °C/W worse than Sandy Bridge. Although it’s not possible to break down the contribution of the two reasons, it seems likely that most of the increase in thermal resistance is due to the change in TIM. An increase of 0.15 °C/W roughly corresponds to the bulk thermal resistance of a ~90 μm layer of PTIM over the die area of 160 mm2.
The above measurements attempt to characterize some of the changes when moving from Intel’s 32nm planar to 22nm Tri-Gate process. The 22nm Ivy Bridge significantly improves on static (leakage) power over 32nm Sandy Bridge, but only shows small reductions in dynamic power. Ivy Bridge also requires higher voltage increases for the same frequency increase, leading to more difficult overclocking but power savings at lower (standard) speeds.
In addition to the CMOS process changes, the thermal resistance of Ivy Bridge increased over Sandy Bridge, likely due to the change from solder to polymer thermal interface material between the die and heat spreader.