Cyclone/Stratix V Carry-Lookahead…bug?

FPGAs have long had dedicated carry chains to speed up ripple-carry adder circuits. They often also have some form of mostly user-invisible carry lookahead to improve on delay. New in the Cyclone V and Stratix V families, there seems to be carry lookahead at the LAB level (10 ALMs or 20 bits of addition), which can be seen by looking at the timing path through a big adder (in TimeQuest). Timing paths through long adders show hops that skip over 20 bits at a time in Cyclone V and Stratix V, rather than one hop per bit.

Here’s the delay (as reported by the timing analyzer) of a circuit consisting of registers before and after an adder that adds one (increment) and computes the carry-out (logically equivalent to a wide AND gate). There are big speed improvements on long adders due to the new carry lookahead, but not much gain for small adders. Below 16 bits, the Cyclone V adders are actually slower than Cyclone IV despite being two process generations ahead.

Adder delay vs. length for Cyclone IV and V, and Stratix IV and V, showing big improvements for long adders.

Adder delay for Cyclone IV and V, and Stratix IV and V

Two adders

But what if we have a circuit with two adders? The two adders have their own input and output registers and aren’t connected except for the clock (see code below). I expect that the longest-path delay should be the delay through the longer adder, and the presence of the shorter adder shouldn’t change anything.

module twoadd
#(
	parameter W1=120,
	parameter W2=20
)
(
	input clk,
	input [W1-1:0] in1,
	input [W2-1:0] in2,
	output reg out1, 
	output reg out2
);
	reg [W1-1:0] r1;
	reg [W2-1:0] r2;
	wire[W1:0] sum1 = r1+1;
	wire[W2:0] sum2 = r2+1;
	always @(posedge clk) begin
		r1 <= in1;
		r2 <= in2;
		out1 <= sum1[W1];
		out2 <= sum2[W2];
	end
endmodule

This example has the two adders sharing the same clock domain, but results are the same even if they are on separate clock domains.

Stratix IV

Colour map plot of longest-path delay, Stratix IV

Colour map plot for a circuit with two independent adders of different lengths on a Stratix IV

Here is the longest-path delay of two independent adders on a Stratix IV, with the lengths of each adder swept independently (yes, that's 2673 Quartus compiles). The longer adder has length (W1) 80 through 160, and the shorter adder has length (W2) 8 through 40. The delay increases with the size of the longer adder, and isn't affected by the length of the shorter adder, as expected.

Stratix V

Colour map plot of longest-path delay, Stratix V

Colour map plot for a circuit with two independent adders of different lengths on a Stratix V

This looks quite different, and unexpected. Diagonal lines of higher delay mean that the presence of an unrelated short adder on the chip affects the delay of the long adder, but only when the difference in their lengths is approximately a multiple of 10. This doesn't look right, because the adders should not interfere, particularly when they are not even placed nearby on the chip. This effect isn't small: The delay of long adders increases 70-80% as a result.

The timing analysis report says that the abnormally high delays for points along the diagonal line are caused by the new LAB-level carry lookahead not being used for those combinations of adder lengths. I can't imagine a reason why the carry lookahead for one adder would have to be disabled if another adder of just the right length exists somewhere else on the chip. This is so strange that I suspect it's a (software, Quartus) bug.

Cyclone V

Colour map plot of longest-path delay, Cyclone V

Colour map plot for a circuit with two independent adders of different lengths on a Cyclone V

The same thing happens on a Cyclone V. Probably common code due to the similar architectures.

Also...

Same thing happens for Arria V.
But this issue does not occur for Arria 10.

Test Setup

  • Quartus 15.0 Build 145, Linux
  • Fastest speed grade of each FPGA:
    • Stratix IV: EP4SGX70HF35C2
    • Cyclone V: 5CEFA7F23C6
    • Stratix V: 5SGSMD3E1H29C1
  • Delays as reported by timing analysis, not on silicon.

4 comments to Cyclone/Stratix V Carry-Lookahead…bug?

  • Kokomojoe

    Cool! What compelled you to measure two adders?

    • Henry

      Hi!

      I can’t remember exactly, but it was when collecting data for a paper, where I wanted to know the speed of using the hard carry chains to implement various functions (e.g., AND). It’s something similar to Figure 7: http://www.stuffedcow.net/files/fpgacustom-fpga2011.pdf

      I was lazy and put three independent adders per compile to reduce the number of compiles I had to do, and got really strange results with Cyclone/Stratix V. Two adders was the simplest case where I could still trigger this weird behaviour.

      Also, I don’t think we’ve met, but I remember Jonathan Rose has mentioned you a few times. 🙂

  • Eric

    Very weird. Does it still occur with newer Quartus releases? (I know, I know, but it has to be asked.)
    That really looks like a STA bug, and not H/W.

    • Henry

      Fine, make me test it 😛
      The same (buggy) behaviour persists in Quartus 16.1 and 17.0, for Cyclone V and Stratix V (I didn’t re-test Arria V).

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>