VLSI SoC Design: March 2013

March 17, 2013

Puzzles: Half Adder using Multiplexer

Let's say you have to realize a half adder. But all you've got are 2:1 Multiplexers. How would you design one half adder using ONLY 2:1 Multiplexers?

Click here to find the solution:

Low Power FSMs

Low Power design is the need of the hour! The post: Need for Low-Power Design Methodology gives an insight into the intent and need for the modern design to be power aware. The subsequent posts on Clock Gating and Power Gating under the tab Low Power Methodology discuss some ways in which the the SoC can be designed for low power. In this post, we will consider one such low power design of an FSM which can be generalized to design any low power sequential circuit.

Consider the following generalized design of a traditional and a low power FSM:

Let's talk about the basic building block that we have used here. The OR gate acts as an clock gate to the flop. The flop that we have used is a toggle flop. When enable = 0, the flop receives the clock, and the flop toggles its state. So, whenever we need to change the state of the flop, we can give a clock pulse.

Enough said! Let's now talk about a real example of a basic synchronous counter. And how we can design a low power synchronous counter using the above method.

In any binary counter:

The lowest order bit toggles after every clock cycle.
Any higher order bit toggles only when all the lower order bits are at logic 1.

Keeping this in mind, we can now build the low power counter!!

References:

Low Power Sequential Circuit Design Using T Flip-Flops

March 14, 2013

VLSI-SoC: Now on Facebook!!

Hey guys! Check out our new Facebook page and stay connected to all the latest posts and puzzles! Just "like" the page and find all the latest updates right on your wall!

You can find our Facebook page at the below link:

VLSI-SoC Facebook Link

Please help us reach out your friends and colleagues. Thanks in advance for your support!!

March 12, 2013

Multi-Cycle Paths: Perspective & Intent

Multi-Cycle Paths, as the name suggests, are those paths which need not be timed in one clock cycle. It is easier said than done! Before we discuss further, let's talk about what Multi-cycle paths does not mean!!

Myth 1: All those paths which the STA team is unable to meet at the highest clock frequency, are potential multi-cycle paths.

Reality: Multi-cycle paths are driven by the architectural and design requirements. STA folks merely implement or appropriately said, model the intent in their timing tools! A path can be a multi-cycle path even if the STA team is able to meet timing at the highest clock frequency.

Myth 2: It is by some magic that the design teams conjure up how many cycles it would be appropriate to for a path to function! <Apologies the hyperbole! :)>. And STA team follows the same in their constraints.

Reality: MCPs are driven by the intent. And implementation is governed by that intent which includes but is not limited to the number of run modes a particular SoC should support.

Consider the following scenario:

Normal mode, Low Power Mode and Ultra Low Power Modes can be considered to be the different run modes of the SoC. You can say that the customer can choose at what time which run mode would be better. Example: when performance is not critical, or your device can go to 'hibernate' mode, you (or the software) can allow the non-critical parts of the SoC to go into a Low Power Mode, and hence save power!

Consider the specifications:

Normal Mode: Most Critical IP & Not-So Critical IP would work at f MHz. Least Critical IP would work at f/2 MHz. Interaction between any two IPs would be at slower frequency.
Low Power Mode: Most Critical IP would work at f MHz. Not-So Critical IP & Least Critical IP would work at f/2 MHz. Interaction between any two IPs would be at slower frequency.
Ultra Low Power Mode: Most Critical IP would work at f MHz. Not-So Critical IP would work at f/2 MHz. And Least Critical IP would work at f/k MHz; (k=> 3-8). Interaction between any two IPs would be at slower frequency.

Consider the Low Power Mode. Any interaction within the Not-So Critical IP would be at slower frequency. However, any paths between the Most Critical IP and Not-So Critical IP would be Multicycle path of 2 in the low power mode. In this case, the clock at the Not-So Critical IP is gated selectively for every alternate clock cycle to implement an MCP. Hence data launched from Most Critical IP now effectively gets two clock cycles (of the faster clock) to reach the Not-So Critical IP. The following figure explains the intent:

This much for the intent! However, as we mentioned that for the Least Critical IP, depending on the mode, would work at f/k MHz => (k=3-8) one might need an MCP of 2, 3, 4.... and so on. This calls for a need of a configurable implementation of multicycle paths. We shall cover it sometime later. Till then, you can assimilate on the intent part. You can also mail me in case you think of any such implementation at my<dot>personal<dot>log<at>gmail<dot>com. Adios!

State Retention Power Gating

The post titled Power Gating demonstrated the implementation of a Power Gating Cell and how it helps in minimizing the leakage power consumption of an SoC. Make sure you go through it once more. The basic rationale is to cut the direct path from the battery (VDD) to ground (GND). Though efficient in saving the leakage power, the implementation discussed suffers from one major drawback! It does not retain the state! That means, once power of the SoC is restored, the output of the power gated cell goes to 'X'. You can't really be sure whether it is logic 1 or a logic 0. Do care? Yes! Because if this X propagates into the design, the entire device can go into a metastable state! In order to prevent such a disastrous situation: the system software can simply reset the SoC. That would boot-up from scratch and make sure that all the devices are initialized.

This means, every time I decide to power gate a portion of my SoC, I'll have to reset that power gated portion once power is returned. This imposes a serious limitation to the application of the Power Gate discussed in the last post. How about designing one power gate which retains the state? But convince yourself that in order to do so, you'd need to spend, though small, some leakage power. Let's call this structure: State Retention Pseudo Power Gate. The term "pseudo" signifies that it would consume a little leakage power contrary to the previous structure which doesn't. But at the same time, you no longer need to reset the power gated portion of the SoC, because the standard cells retain their previous data!! Enough said! Let's discuss the implementation.

The above circuit has two parts.

The one inside the red oval is same as the normal power gating structure.
The one inside green box (on the right) is the additional circuitry required to enable this device to retain it's state.

Operation: Let's say before going into the SLEEP mode, the device had the output as logic 1. After entering the SLEEP mode (power off), the sleep transistors come into action and cut the power and ground rails of the device and hence save the leakage power. But the logic on the right (in green rectangle) is still ON! The output of the inverter would now become OUTPUT', i.e., logic 0. This would in turn enable the PMOS transistor Q1 and output would be restored back to logic 1.

Same is true when the output would be logic 0 before power gating. In that case the NMOS transistor Q0 would come into action to help the output node retain it's data.

Note that: All this while, when the device is in sleep mode, the output node would continue to leak. By adding the additional circuitry, as demonstrated, we are basically trying to create a feedback loop, which again helps in retaining the state. The hit, of course, is the leakage power of 4 transistors. However, the standard cell logic (in red oval) is usually bulky. Even a simple 2-input NAND gate has 4 transistors itself. And higher order input would have more! Same technique can be applied to any sequential device like a Flip Flop, latch or even a clock gating integrated cell.

March 09, 2013

Post Your Queries

Hello friends. It's been 9 months since we started writing. And we have crossed 12K page views since then. I would like to take this opportunity to thank each and every one of you for their precious comments. It really does motivate me to keep writing more! Also, it gives me immense pleasure to know that there are people out there who are reaping benefits out of it! I know we have a long way to go and to do so, I would be grateful if I can hear your feedbacks and suggestions regarding the same.

Please note that I have made some changes to the blog settings.

Unlike before, one would require to log-in using their Google accounts to comment on the blog. I hope you would appreciate the necessity behind it.
I have updated another page: 'Post Your Query' to enable readers to post their own doubts, and if you wish me to cover any specific topic, you can also mail me at my<dot>personal<dot>log<at>gmail<dot>com.
This is only on a trial basis. Let's see how it all shapes up!

OCV vs PVT

In the post PVTs and How They Impact Timing, we talked about the confluence of the Process-Voltage and Temperature factors and their impact on timing. I would urge the readers to go through the post in order to grasp the difference between two key terminologies used in the VLSI industry-

OCV: On Chip Variation;
PVT: Process, Voltage and Temperature

While PVTs are inter-chip variation which depend largely on external factors like: the ambient temperature; the supply voltage and the process of that particular chip at the time of manufacturing. Like PVTs, OCVs are also variations in process, voltage and temperature. But, hey, where's the difference? OCVs are intra-chip variations! To elucidate more about the OCVs, let's talk in terms of chips!

Variation in Process: There are millions of devices (standard cells); and probably billions of transistors packed on the same chip. You can expect every single transistor to have the same process or the channel length! If we say that the chip manufactured exhibits, let's say, worst process, it means that the channel length tends to deviate towards the higher side. This variation may be more for some transistors and less for some. It can be a ponderous task to quantify this variation between the transistors of the device, and is often modeled as a percentage deviation from the normal.
Variation in Voltage: All the standard cells need voltage supply for their operation. And voltage is usually 'tapped' from the voltage rail via interconnects which have a finite resistance.

In two parts of the chip, it is fairly probable for the interconnect length to be different, resulting in a finite difference in the resistance values and hence the voltage that the standard cells actually receive. As evident above, the voltage received by the standard cells on the right would be less as compared to those on the left.

This variation would be less, probably of the order of a few mili-volts, but is can be significant, is again modeled as OCV.

Variation in Temperature: Some parts of the chip can be more densely packed or might exhibit more active switching ss compared to the other parts. In these regions, there is a high probability of the formation of localized 'HOT SPOTS' which would result in increased temperature in some localized areas of the chip. Again, this difference might be order of a few degree centigrade, but can be significant.

All the above mentioned variations are examples of On-Chip-Variations. And usually, these variations are modeled as a fixed percentage of delays. For examples, a 4% OCV derate would mean, that the delays of cells in the data path are inflated by 4% while doing setup analysis and decreased by 4% while doing hold analysis. Same methodology is applied for the clock paths. However, it would be different for launching and capture clock paths. That also gives rise to an interesting topic of Common Path Pessimism Removal which we shall take up shortly.

Clock Skew: Implication on Timing

Clock Skew is an important parameter that greatly influences the timing checks and you would often find the backend design engineers always keeping a close eye on the clock skew numbers.

Clock Skew: The difference in arrival times of the clock signal at any two flops which are interacting with one another is referred to as clock skew. Having said that, please note that skew only makes sense for two flops which are interacting with one another, i.e. they make a launch-capture pair.

If the clock at the capture flop takes more time to reach as compared to the clock at the launch flop, we refer to it as Positive Clock Skew. And when the clock at capture flop takes less time to reach the clock at the launch flop, we refer to it as Negative Clock Skew.

The figure below describes positive & negative clock skew. Assume the delays of clock tree buffers to be the same.

How does clock skew impact the timing checks, in particular, setup and hold? Consider the above example where FF1 is the launching flop and FF2 is the capturing flop. If the clock skew between FF1 and FF2 was zero, the setup and hold checks would be as follows:

Positive Skew: Now imagine the case where clock skew is positive. Here, clock at FF2 takes more time to reach as compared to the time taken by the clock to reach the FF1. Recall that the setup check means that the data launched should reach the capture flop at most setup time before the next clock edge. As evident in the below the data launched from FF1 gets an extra time equal to the skew to reach FF2. Hence setup is relaxed! However, hold check means that data launched should reach the capture flop at least hold time after the clock edge. Hence, the hold is further made critical in case of positive skew. Read the definitions again and again till you grasp it!!

Negative Skew: Here, clock at FF1 takes more time to reach as compared to the time taken by the clock to reach the FF2. As evident in the below the data launched from FF1 gets lesser time equal to the skew to reach FF2. Hence setup is more critical! However, hold is relaxed!

Some Key Points to Note:
Setup is the next cycle check, and positive skew relaxes the setup check and negative skew further tightens it.
Hold is the same cycle check, and negative skew relaxes the hold check and positive skew further tightens it.
Very rarely would one come across a path that is both setup as well as hold critical. Setup becomes critical when data path is huge or you have a large negative skew; and hold becomes critical when either data path is minimal or you have a large positive skew. Both these conditions are mutually exclusive and very rarely does they manifest themselves simultaneously. It is often a case when the uncommon clock path is significant. We shall discuss it in detail later.

Pages