VLSI SoC Design: The Timing Optimization Problem

November 22, 2015

The Timing Optimization Problem

Puzzle:

Tomorrow is the scheduled tape-out of your SoC. The target clock frequency for this SoC is 100 MHz (Time Period of 10 ns). However, there's only one setup violating path and you need to fix the timing by doing ECOs. Area is not a constraint.

Here's the circuit:

Points to note:

My tape-out is tomorrow, I don't have the liberty of asking the RTL design team to change the architecture of the design.
I have used the highest possible drive strength cells, and perhaps the lowest Vt flavor cells available in my standard cell library.
There's no redundant logic in the path, it's been optimized well.
I cannot add delay to the clock path of FF2 because doing so, the hold time of the scan chain connecting flops FF1 and FF2 would fail.

Please suggest ways to solve this timing violation. A rough image would be really helpful. I shall post my solution in a couple of days time.

Mike posted the correct answer, and I'll just add a figure explaining the solution:

28 comments:

KumarNovember 22, 2015 11:39 AM
1. Look at the slack margin at the D pin of FF1, if there is 2ns of +'ve slack, then you can early the clock to the FF1.

2. Collect all the nets in the FF1-FF2 path and route them using the higher-metal layer along with NDR ( this will reduce your net-delay ).

3. Is the FF2 LVT flavour? If not, swap it to LVT flavour ( the Tsetup of FF2 will be much less for the LVT cells ).

4. Look at the slack of FF3 D pin, if its positive, then delay the clock of FF2 ( You said you'll end up in Hold VIol, add the delay buffer in the scan-path ).
ReplyDelete
Replies
prudhviNovember 22, 2015 1:30 PM
since in FF1 and FF2 there is 2 combinational logic with 5ns and 7ns , we can add a FLOP between them. so we will have 5ns combinational logic between 2 flops and 7ns combinational logic in other it solves the setup problem
ReplyDelete
Replies
MikeNovember 24, 2015 9:53 PM
There is only a one bit wire between the 5 and 7 ns blocks. Thus, there are only two possible input vectors to the 7ns block.
Thus, as (or even before!) the 5ns block starts to process its inputs, two parallel 7ns blocks starts to work in parallel to that block: one with 1 as input, one with a 0 as input. By the time the 7ns are done, the correct 7ns-block's answer gets muxed into FF2, chosen by the answer from the 5ns block.
Area was after all not a constraint, so the extra logic needed (one 7ns block, and one 2-to-1 mux) is not a concern.
Neither is timing: the 7ns blocks can, assuming they don't have additional inputs not displayed here, after all process their responses at any time (each has a constant output!)
ReplyDelete
Replies
UnknownJanuary 04, 2016 2:26 AM
Hi Naman,
still i'm not getting the answer because setup violation means data is slow at FF2 so we have to make it fast by removing extra circuit delay, and mike gave the answer (i'm not opposing) the total delay is still 5ns+7ns so no effect on circuit indirectly.
could pls explain it would be very helpful for me.
thanks,
Vijay
ReplyDelete
Replies
PMFebruary 28, 2016 8:30 PM
How about replacing FF2 with a neg type latch?
ReplyDelete
Replies
UnknownMarch 14, 2016 1:58 PM
Very nice puzzle. Very interesting to work on your puzzles. please keep up the great work!
ReplyDelete
Replies
VJAugust 21, 2016 10:12 AM
good one. Keep them coming.
ReplyDelete
Replies
all4chipNovember 14, 2016 11:20 PM
good puzzle, we can study new method from it,thanks
ReplyDelete
Replies
UnknownJanuary 21, 2017 8:04 PM
Thanks Naman for the question and Mike for the answer. This question has led me to see setup violation's addressing techinique from a different angle.
ReplyDelete
Replies
AnonymousFebruary 14, 2017 11:41 PM
Hi Naman,

The carry select approach is brilliant. But since there is only one input to the combinational circuit, cant we just use a mux and a 1 bit LUT? for the outputs by hardcoding them?
ReplyDelete
Replies
Mitu RajAugust 30, 2019 4:41 AM
Assume Mike's solution may bring the path delay to 9 ns or so satisfying setup. But it pushes hold requirement by a margin of 3 ns. Ok, if it still satisfies hold in that path, then why dont we simply push this 3 ns marginal delay in clock route instead of putting those mux and all. Your 4th point in the question hence contradicts the solution. If hold has no significant margin, Mikes solution fail as well.
ReplyDelete
Replies
UnknownSeptember 18, 2019 12:19 PM
By setting multi cycle path in between FF1&FF2?
ReplyDelete
Replies
AnonymousAugust 19, 2020 10:42 PM
Has this solution ever been tried in real scenario? Just curious.
ReplyDelete
Replies

Add comment

Pages

November 22, 2015

The Timing Optimization Problem

28 comments:

Pages