June 02, 2013

Faulty Clock Gating: How "Not" to Gate the Clock

You would come across a plethora of technical literature on clock gating and it's associated techniques. It does not come as a surprise because clock gating is the most commonly employed design technique to save dynamic power. However, many implementations are faulty, in the sense that while they indeed gate the clock, but the result in an overall increased dynamic power consumption. We would discuss one such common technique, which obviates all the power saving benefits of clock gating. You are advised to use your discretion before using it.

The basic rationale behind clock gating:
  • Even when the output of a flip-flop is not toggling, owing to the transitions (and hence charging/discharging of nodes) in the internal circuitry of the flop-flop, it still continues to dissipate dynamic power when it is being fed by a clock signal.
  • When the input of the flip-flop is not toggling or would not toggle, one can effectively gate the clock to that flip-flop for that particular time and save dynamic power. 
One logical implementation for the above problem statement (and this is indeed the implementation employed in many technical papers and patents) is depicted below:


Let's take a look at the above implementation. The XOR gate between the D input and the Q output of the flip-flop has been used as the enable signal for the clock gate CGIC. The logical explanation behind this is: when the output of the flop is same as input, which would be detected by XOR'ing the two, one can gate the clock to the clock gate.
Example: Let's say initially Q =1. Now D = 1, which means that t he output of the flop is destined to stay at "1" for the next cycle as well. XOR'ing these two signals: Q XOR D = 0, EN = 0 would gate the clock to the flip-flop. So, would that save power? Well, one would expect it that way. Let's take a look at why it would result in an increased power dissipation.

The circuit shown above is a trap! The actual circuit would be something like the one shown below:

  • As evident from the above figure, the  XOR gate would continue to toggle for the entire time period of the clock and would become stable only "setup time" before the next clock edge. And during this entire duration, it would continue dissipating dynamic power. You might argue here that the power dissipated must be less than the power dissipated by an idle flop receiving clock. Well, that might be true for some technology, but XOR gate is the most bulky gate (among all primitive gates) and I would say that this power, if not less, would at least be comparable to that of an idle flop receiving a clock signal.
  • Secondly, the circuit above uses a CGIC. Note that CGIC comprises off one latch and an AND gate, while a flop comprises  of two latches. The internal circuitry of the CGIC would continue to charge/discharge and hence dissipate power.
The sum of the above two power dissipation would over-shadow the benefits one was expecting in the first place, and hence it  is a common design trap. Beware of it.





6 comments:

  1. Why do we need a latch in the CGIC? cant we use just an xor gate and an AND gate to gate the clock?

    ReplyDelete
    Replies
    1. Hi Praveen,

      Yes we can use an AND gate, or for that matter an OR, NAND or NOR gate to gate the clock. The unlike CGIC, where setup is 1 full-cycle check, and hold is 0 cycle check; the same is a bit tricky in case of using these primitive gates.

      AND has a half cycle clock gating hold check; and a full cycle clock gating setup check.
      OR has a half cycle clock gating setup check; and a zero cycle clock gating hold check.

      This places a constraint on what polarity flop can be used in order to generate the enable. You might like to review the post which discusses this concept: http://vlsi-soc.blogspot.in/2013/02/clock-gating-check.html
      http://vlsi-soc.blogspot.in/2012/08/puzzle-identify-issue-with-circuit.html

      Please feel free to revert back in case of any doubt.

      Thanks!

      Delete
    2. If you have a cone of logic coming out of the Q then the power saving is more than the flop itself - it is the cone of logic driven by that flop. This is why that technique can save power - the cone of logic may avoid switching and hence those savings outweigh the cost of the XOR/CKEN

      Delete
    3. simply, AND or XOR can generate glitch issue in clock network.
      in order to preventing this, it's required to hold control data.
      holding data is able to use FF or Lat. but this is more latch adoptable.

      Delete
    4. Regarding this comment below:

      >> If you have a cone of logic coming out of the Q then the power saving is more than the flop itself - it is the cone of logic driven by that flop. This is why that technique can save power

      I think what you mean is if clock is not gated, Q will keep on toggle - however, we are talking about the case of clock gating when D and Q are the SAME. So, to start with, Q will not toggle in that case and it will truly be an idle flop. The choice is whether we gate the clock or not. If we gate the clock, the combinational toggling of D / and XOR into the ICG will consume more power than when we don't gate the clock. In that case, we will have the toggling of D plus the dynamic power from the flop flop itself. The author, I think, is suggesting that the latter case is better in most cases.

      Delete
  2. The summary is if we use CGIC for just one flop then obviously it is not advisable. In real designs, CGIC will be used to gate the big logic cone which just sits idle sometimes.

    ReplyDelete