A buffer is nothing but two inverters connected back to back. Does it make any difference if the CTS (Clock Tree Synthesis) is done using buffers or inverters? What exactly are the pros and cons and what factors would the backend design engineers take into account while deciding how to build their clock trees? I'm gonna answer these questions in this post.
An inverter based clock tree:
To keep things simple and pertinent to the discussion, let's assume that we are using only a single kind of inverter (i.e. of let's say drive X) to build our clock trees. And all the inverters are placed equidistant from each other. The scenario is shown in Fig. 1. Advantage of using an inverter based clock tree is that the high pulse width and the low pulse width would be symmetrical. For the clock signal, this is a critical requirement, especially for SoCs which have a high interaction between the positive and negative edge triggered flip-flops.
Figure 1: Inverter Based Clock Tree giving equal rise and fall times |
A buffer based clock tree:
While theoretically, one can create a buffer using two identical inverters connected back to back, that is generally not the way buffers are designed while designing the standard cell libraries. To save area, the first buffer is typically of a lower drive strength and is placed very closed to the second inverter. The second inverter, however, is of higher drive strength.
Figure 2: Buffer Based Clock Tree. Buffer is formed by connecting two invertes back to back |
One must also notice that the delay of first inverter is dominated by the load of the second inverter because the wire length between these two inverters is very small, hence one can neglect the wire cap. But for the second inverter, the load comprises of the wire cap as well as the input cap of the next buffer. This introduces an asymmetry in the rise and fall delays, and hence the high and low pulse widths of the clock signal.
Figure 3: Difference in high and low pulse widths |
For applications which have a very stringent requirement on the clock high and low pulse widths, one might prefer to use an inverter based clock tree over the buffer based clock tree.
Can we do something to make the buffer based clock tree work? The answer is yes! Let's take a look:
If we balance the load seen by first inverter and the load seen by the second inverter, we might be able to achieve equal rise and fall times, and hence equal high and low pulse widths for the clock transition signal.
In this approximation, we have modeled the wire in form of a T-model. And inverter is modeled using distributed RC model with it's "on" resistance and the diffusion capacitance.Figure 4: RC delay model for inverters and wire |
To have the equal pulse widths for high and low times, the RC delay observed by the first inverter must be equal to the RC delay of the second inverter.
Rchn,1 (CD,1 + CG2) = Rchp,2 (CD,2 + Cwire + CG,1) + Rwire/2 (Cwire + CG,1) + Rwire/2 (CG,1)
If this equation is satisfied, one can say with a fair degree of confidence that the high and low pulse widths would be approximately equal. The resistance and capacitance of the wire is the function of its length and the same can be conveyed by the standard cell library designer to the backend designers.
While most standard cell library vendors provide a symmetrical buffer, there could well be a difference of a few pico-seconds in the buffer rise and fall delay, which creates a difference in the high and low pulse widths. The variation in the duty cycle increases for deeper clock trees!
A simple way to mitigate the problem is to insert an inverter in the middle point of the buffer-cased clock tree. The major challenge, however, lies in finding this middle point. This ensures that high and low pulse widths of the clock reaching at the sink pins of flip-flops is indeed the same!
While most standard cell library vendors provide a symmetrical buffer, there could well be a difference of a few pico-seconds in the buffer rise and fall delay, which creates a difference in the high and low pulse widths. The variation in the duty cycle increases for deeper clock trees!
A simple way to mitigate the problem is to insert an inverter in the middle point of the buffer-cased clock tree. The major challenge, however, lies in finding this middle point. This ensures that high and low pulse widths of the clock reaching at the sink pins of flip-flops is indeed the same!
Figure 5: Inserting an inverter to maintain high and low pulse widths |
Thanks for simplifying the underlying reasons in choosing buffer-based or inverter-based clock tree. :)
ReplyDeleteThanks Babul! :)
DeleteHi Naman,
DeleteBut still I have seen many designs with buffer based clock tree mixed with inverter in order to fix the min pulse or min period violation.
Since Buffer has better drive strength as compared to inverter which will help to recover area as well as power, hence designs are using buffer based clock tree mixed with few inverter (inverters are basically used to fix min pulse).
Please comment here, if anything you are not agree with.
Hi Naman,
ReplyDeleteI am confused with Figure 3. You are trying to observe falling edge inside of the buffer! That's not relevant at all I think. What we need to observe is input and output pins of buffer only to see the transition times.
The inequality of rise and fall time result from inequality of resistance of PMOS and NMOS of the buffer ot inverter. That is, while charging the load (rise edge), the current goes through PMOS, and while discharging the load (fall edge), current goes through NMOS, and hence we see different rise/fall times. This can be mitigated by using different channel widths for PMOS and NMOS.
Please correct me if I am wrong.
Hi Sagar,
DeleteWhile I agree with the contention that unequal resistances (arising due to difference in mobility of holes and electrons) of PMOS and NMOS would give rise to unequal rise and fall times, in this post, I assumed the widths of the two devices have been scaled in ratio of the mobility. Note that the rise and fall time is not just contingent upon the resistance of the devices, but also on the load that the PMOS observes (while charging), or perhaps NMOS observes (while discharging).
You've brought an interesting point that I need not observe inside of the buffer! That is absolutely correct! However the point that we need to observe only the input ad output pins of buffer is partially correct. If in my design, I have all the flops working at positive edge of the clock (or all of them at negative edge of the clock), I indeed need to observe the input and output pins of the buffer only. However, the whole discussion of asymmetric duty cycle arises when we have interaction between positive and negative edge triggered flops. Even in this case I need not observe the transition at the internal pin of the buffer, but I'll need an extra inverter feeding the negative edge triggered flop. And there would lie the difference in fall and rise times.
I'm sure you'd have more to comment. And honestly, your question did put me in a spot of bother. And I'm not quite sure whether this answer will pacify you. I'd encourage you to reply back either way! :)
Thanks,
Naman
Hi Naman,
DeleteThanks for a quick reply!
I am not sure why a negative edge triggered flop will need an extra inverter. That flop will just trigger on the falling edge of the clock (which comes out of the buffer output pin, and not from inside of the buffer), that's it!
A buffer and inverter both can have almost equal rise/fall times, it doesn't matter if it's inverter or buffer.
I am not clear enough on the buffer vs inverter tree tradeoffs, but the depiction of asymmetrical clock pulse using figure 3 is not correct for sure. You have to consider a falling edge also to the input of the buffer. What you have shown is a whole clock pulse generated out of the rising edge alone!
Thanks,
Sagar
hey, somebody help me to calculate how much distance can a buf/inv can drive ?
Deletecan we get these value from library itself ?
hi guys
ReplyDeletejus to add few points.
1. u dnt need extra inverter to drive negative edge flop
2. when you are keeping equidistant inverters. consider equidistant buffers
3.rise/fall slew would definitely be better in inverter than buffer
Hi Naman,
ReplyDeleteCan you explain bit more...how by inserting a inverter...it will result in equal rise and fall trasition and leads to 50% duty cycle...
Hi Naman,
ReplyDeleteThanks for the amazing blogs! At device level, is inverter a better noise filter than a buffer? I was trying to analyze using NMh and NMl values, but am a little confused. Can you help?
This comment has been removed by a blog administrator.
ReplyDeleteNice Blog. I like it..
ReplyDeleteInverter Battery
Good work. thank you for such kind of great information. For More
ReplyDeleteAmazing Article, Thanks for sharing!
ReplyDeleteHow to Choose the Best Inverter
So, industry wide clock trees are built using buffers. What could be the reason? Also, can you comment which Vt buffers/inverters are preferred in clock tree and why?
ReplyDeleteWe normally use LVT type buf/inv while building clock trees. because LVT type will have lesser process variation.
Delete