Only a circuit-activity-dependent analysis at the architectural level indicates where power optimization opportunities exist.

By Lars Kruse, ChipVision Design Systems Inc.

Power consumption and energy requirements today are the primary concerns of hardware designers who implement wireless and mobile applications. Although EDA tools are emerging that support a low-power design flow from RTL input down to GDSII output, many design decisions influencing power consumption and energy

click to enlarge

Figure 1. Power vs. Energy trade-off.
requirements are already made at the system and architectural abstraction levels prior to writing the RTL code.

The power consumption of a circuit can be classified as either dynamic or static. Dynamic power consumption is a function of switched capacitance and supply voltages. Static power consumption comprises the circuit-activity-independent leakage power consumption. Leakage power depends on process technology parameters such as threshold voltages, supply voltages, circuit state, and temperature.

As shown in Figure 1, different design strategies address the three different design objectives for a system: minimizing average power consumption, lowering maximum peak power consumption, and reducing energy requirements. The baseline Figure 1a shows the power needed for a 2ns task. By spending less energy during a much shorter period of time (Figure 1b), engineers can reduce the energy requirement of a given system task, but the average power consumption increases. On the other hand, by spending more time on computation, they can lower the average power consumption during execution of a task, while the total required energy increases due to an increase in leakage currents (Figure 1c).

The following sections describe methodologies and techniques for minimizing power consumption and energy requirements for the three design objectives.

Low-power RTL Synthesis Flow
Today’s physical synthesis tools offer various features to support low-power design flows. The most effective techniques to minimize design power consumption at the RT level are support for multiple supply voltages, automatic threshold voltage selection, clock gating, power gating and power-aware resource sharing. Automatic threshold voltage selection reduces the static power consumption of a design. It is supported by technology libraries that offer standard cells designed at several threshold voltages for each logic function. Cells with two or three different threshold voltages are commonly available. Depending on timing requirements, the synthesis tool selects fast but leaky cells, or slower cells that are less power-hungry.

The idea behind clock gating is to turn off the clock signals for parts of the design during idle times. Entire blocks or individual registers can be clock gated. Simple, combinational clock gating techniques disable clock signals feeding into flip-flops when the output values don’t change. Sequential clock gating is more complex but offers higher power savings. It disables the clock on a path of flip-flops by delaying the clock gating signal along the path. Clock gating mainly reduces dynamic power consumption, given that idle times of the circuit can be detected. Leakage power is reduced by clock gating only when cells such as the enable multiplexers around flip-flops are removed from the design.

Power Gating Methodology
Power gating exploits the same idea as clock gating. Cells that do not perform a required computation are turned off using so called sleep transistors. But instead of disabling just the clock signal, sleep transistors also disconnect cells from their power supply. Therefore, power gating reduces both dynamic and static power consumption.

Power gating can be implemented in two different ways: fine or coarse grain. Fine-grain power gating requires that each cell come with its own sleep transistor. The advantage of this technique is good timing control for powering up and down parts of the circuit. However, disadvantages include increased area overhead, limited leakage control, and the necessity for a standard cell library with

click to enlarge

Figure 2. Maximizing idle times.
sleep transistor implementation. Coarse-grain power gating methodology is implemented using special sleep transistor cells. One sleep transistor cell is used to turn on and off a set of standard cells. The coarse-grained approach requires less area than fine-grain power gating due to the lower number of sleep transistors and less routing of enable signals for power gating. Fewer sleep transistors result in better leakage control. The disadvantages are reduced noise margins due to voltage drops across the sleep transistors, and more difficulty guaranteeing correct timing behavior for power up/down phases. For example, it might take several clock cycles to power up a larger block of logic cells.

Power-Aware Resource Sharing
Another synthesis technique not as widely applied as the three techniques mentioned above is power-aware resource sharing. Resource sharing is the synthesis step that assigns a set of operations —whose execution times do not overlap —to the same hardware unit. Resource sharing can be made power-aware by considering the data streams processed by the operations. Sharing a resource among a set of operations interleaves the data streams in time. The toggle rate of the resulting data stream at the resource inputs and, therefore, the power consumption of the resource, depends on the way sharing is performed. Lesser sharing usually means lower dynamic power consumption. However, the area penalty for additional resources also means higher leakage power. Power-aware resource sharing tries to share a resource among operations in such a way that the resulting input data stream has as few bit toggles as possible.

Other synthesis techniques usually offer only a few percentage points of power reduction. These techniques include: operand isolation (gating data inputs of functional units depending on whether output values are used), pin swapping (assigning nets with high toggle rates to inputs with low input capacitance), minimizing wire length of high frequency nets during placement, and power-aware technology mapping (hiding high-frequency nodes inside standard cells).

Low Power Design at the Architectural Level
Minimizing supply voltages or using variable voltages and applying clock as well as power gating design techniques are very efficient ways to reduce the power and energy requirements of a design. These techniques are best applied, however, when they are already considered at the architectural level, before writing the RTL code. Lowering the supply voltage helps lower dynamic and static power consumption. However, voltage reduction comes with either lowered clock frequency or increased gate count to compensate for increased cell delay. A higher gate count increases the leakage current, which can negate the power savings due to lower voltages. Clock frequency reduction can be accomplished by considering pipelining the design or introducing more hardware parallelism, again at the cost of higher leakage. Implementing variable voltage and frequency scaling requires setting up a sophisticated power management scheme at the architectural level, in which designers define a set of fixed supply voltage and frequency pairs, and select a voltage/frequency pair depending on the design’s workload. Accurate power analysis tools at the architectural level are required to evaluate these complex design trade-offs between voltages, frequencies, gate count, and design architecture. It is not only difficult to design a variable voltage power management scheme, but checking the correctness of such a design can become a nightmare. Physical synthesis needs to consider all power modes simultaneously. And timing analysis has to be run in multiple corners for all power modes to verify that synthesis produced a correct gate-level design description.

Clock and power gating exploit idle times of the circuit. If circuit timing permits, coarse-grain power gating of design blocks is the preferred design technique to minimize power during idle times, as it eliminates both dynamic and static power dissipation. Clock gating —preferably at the block level —should be performed during idle times when powering up and powering down the circuit is not possible or practical (in terms of power savings), due to the shortness of the idle time interval. For example, Figure 2 shows a battery-powered design where battery lifetime is maximized and energy requirements minimized, under a given maximum average power constraint. If possible, architectural design choices should be made in such a way that the length of idle times is maximized to be able to power-down the circuit block as long as possible.

Architectural level synthesis, which takes as input a behavioral description of the design and outputs RTL for further physical synthesis, can help increase clock gating and power-aware resource sharing opportunities within a design block. It can also automatically identify clock gate enable signals. A trade-off must be made between maximizing idle times of registers and sharing registers among a set of variables that store operation results. On one hand, the synthesis tool can maximize the time between computing a value and using it the first time to create a clock gating opportunity. On the other hand, in this situation the register cannot be shared for a longer period of time. Power-aware resource sharing is supported by considering data streams of a typical application. Operations whose input data streams exhibit correlations such that interleaving the data streams reduces the amount of bit toggles should be scheduled into separate clock cycles. These operations can then share a resource to reduce dynamic power consumption.

Most of the architectural-level power optimization techniques require an accurate power analysis that considers the circuit activity for a typical application or application set. Only a circuit-activity-dependent analysis at the architectural level shows the designer where power optimization opportunities exist and how they should be exploited.

About the Author
Dr.-Ing Lars Kruse is chief architect at ChipVision, where he drives development of electronic design automation solutions that optimize for low power. He is a Technologist, patent-holder and author of many published articles. ChipVision Design Systems Inc. is located at 2880 Zanker Road, San Jose, CA, 408-449-4550;