FPGAs have long had dedicated carry chains to speed up ripple-carry adder circuits. They often also have some form of mostly user-invisible carry lookahead to improve on delay. New in the Cyclone V and Stratix V families, there seems to be carry lookahead at the LAB level (10 ALMs or 20 bits of addition), which can be seen by looking at the timing path through a big adder (in TimeQuest). Timing paths through long adders show hops that skip over 20 bits at a time in Cyclone V and Stratix V, rather than one hop per bit.
Here’s the delay (as reported by the timing analyzer) of a circuit consisting of registers before and after an adder that adds one (increment) and computes the carry-out (logically equivalent to a wide AND gate). There are big speed improvements on long adders due to the new carry lookahead, but not much gain for small adders. Below 16 bits, the Cyclone V adders are actually slower than Cyclone IV despite being two process generations ahead.
But what if we have a circuit with two adders? The two adders have their own input and output registers and aren’t connected except for the clock (see code below). I expect that the longest-path delay should be the delay through the longer adder, and the presence of the shorter adder shouldn’t change anything.
module twoadd #( parameter W1=120, parameter W2=20 ) ( input clk, input [W1-1:0] in1, input [W2-1:0] in2, output reg out1, output reg out2 ); reg [W1-1:0] r1; reg [W2-1:0] r2; wire[W1:0] sum1 = r1+1; wire[W2:0] sum2 = r2+1; always @(posedge clk) begin r1 <= in1; r2 <= in2; out1 <= sum1[W1]; out2 <= sum2[W2]; end endmodule
This example has the two adders sharing the same clock domain, but results are the same even if they are on separate clock domains.
Here is the longest-path delay of two independent adders on a Stratix IV, with the lengths of each adder swept independently (yes, that's 2673 Quartus compiles). The longer adder has length (W1) 80 through 160, and the shorter adder has length (W2) 8 through 40. The delay increases with the size of the longer adder, and isn't affected by the length of the shorter adder, as expected.
This looks quite different, and unexpected. Diagonal lines of higher delay mean that the presence of an unrelated short adder on the chip affects the delay of the long adder, but only when the difference in their lengths is approximately a multiple of 10. This doesn't look right, because the adders should not interfere, particularly when they are not even placed nearby on the chip. This effect isn't small: The delay of long adders increases 70-80% as a result.
The timing analysis report says that the abnormally high delays for points along the diagonal line are caused by the new LAB-level carry lookahead not being used for those combinations of adder lengths. I can't imagine a reason why the carry lookahead for one adder would have to be disabled if another adder of just the right length exists somewhere else on the chip. This is so strange that I suspect it's a (software, Quartus) bug.
The same thing happens on a Cyclone V. Probably common code due to the similar architectures.
Same thing happens for Arria V.
But this issue does not occur for Arria 10.
- Quartus 15.0 Build 145, Linux
- Fastest speed grade of each FPGA:
- Stratix IV: EP4SGX70HF35C2
- Cyclone V: 5CEFA7F23C6
- Stratix V: 5SGSMD3E1H29C1
- Delays as reported by timing analysis, not on silicon.