I recently got a used 1U server:
- Tyan GT62D-B7106
- S7106GMR-CGN motherboard (Dual LGA 3647)
I noticed that the CPUs were running abnormally slowly, not even reaching base clock frequency when all cores are active, even when the temperature isn’t particularly high. Why?
There seems to be a limit of around 180 W for both CPUs (or 90 W per CPU if both are active), and this limits is independent of the TDP rating of the CPU. So if I have two 125 W TDP CPUs, they will only use around 90 W each and not reach base clocks. This gap is even bigger with 165 W and 205 W CPUs, which also will run at ~90 W. But if I have only one 165 W CPU installed, it will happily run at 165 W and run at the expected frequencies, which suggests that CPU power delivery (which isn’t shared between sockets) isn’t the limit.
So where is this power limit coming from?
It turns out the limit is due to the 500 W power supply that comes with the Tyan GT62D-B7106 system. The power supply communicates via PMBus to the motherobard, which results (somehow) in limiting the system power consumption, which results in the CPUs getting a rather conservative limit of 180 W. But this is a system power limit, not a per-CPU power limit.
This power limit can be bypassed by disabling PMBus in the BIOS settings. (Disabling PMBus means I also lose voltage/current/power/temperature/fan monitoring of the power supply.)
But of course, limiting the CPU power to fit the power supply does have a purpose. With PMBus turned off, there is no longer a reasonable limit on CPU power usage, so it’s very easy for the CPUs to consume enough power to reset the machine by overloading it. For example, with two Xeon 8173M CPUs with a PL2 of 363 W each, I can’t successfully boot into the OS without the system resetting itself.
Now that I know where the power limit is coming from, there are two ways to deal with this:
- Replace power supply with one of higher power
- Limit power consumption
On this system, I can manually set the PL1 and PL2 CPU power limits to limit power consumption. For a system with two Xeon 8173M CPUs (28-core 165 W TDP) the maximum setting that does not cause spontaneous reboots is 145W per CPU (for both PL1 and PL2). I’m giving up a little bit of sustained performance (145 W PL1 vs. 165 W) and some more burst speed (145W PL2 vs. 363 W), but this is the best I can do without replacing the power supply. This is still running the CPUs slower than the capability of the CPUs, but 145 W is a lot better than 90 W.
Leave a Reply