Register Transfer Level (RTL) power optimization tools eliminate the need for error-prone manual methods.
Mitch Dale, Calypto Design Systems

The wireless broadband communication landscape continues to change with increasing data rates, multi-band devices and more complex protocols. Even with these challenges, advances in wireless technology have the potential to create new opportunities and redefine existing markets.

click to enlarge

Figure 1. Clock-Gated Datapath
Critical to the success of wireless markets is the implementation of higher-performance, lower-power System-on-Chip (SoC) communication devices. As more signal processing and functionality move into the digital domain, there is a greater emphasis on design power, timing and area. Design techniques for optimizing timing and area are well understood. However, many hardware designers are not familiar with low-power design and the development of power constrained SoC devices requires new design techniques.

Clock gating is a common Register Transfer Level (RTL) power optimization. Today, RTL synthesis tools identify and automate simple, combinational clock gating. Greater power savings can be achieved through sequential clock gating optimizations. Until recently, sequential clock gating required manual identification and implementation by expert hardware designers. With the availability of RTL power optimization tools, designers have access to advanced automated, low-power design techniques, eliminating the need for the often difficult and error-prone manual methods.

This article describes sequential analysis and its application to clock gating. An example of sequential clock gating is given as well as a case study of reducing power in a digital signal correlation block using an automated RTL power optimization tool.

The Power of Sequential Clock Gating
Total power has both static and dynamic components, functions of voltage, load capacitance, switching frequency and static current across all nodes in a circuit. Power optimization targets reducing one or more of these variables. Understanding the cost/benefit of different power optimization techniques is difficult because of the complex interdependency between timing, area and power.

A successful low-power design strategy ensures a cumulative reduction in total power without compromising timing and area requirements. Automated power optimization tools have the advantage of being able to evaluate numerous transformations against multiple constraints simultaneously.

Reducing power in digital design is achieved through a variety of techniques including: lowering voltages, running at reduced frequencies, applying multiple vth cells and gating of clocks to name but a few. Of these, clock gating is the most common RTL optimization for reducing dynamic power.

Most RTL designers understand how to write code for RTL synthesis tools to recognize and insert combinational clock gating cells. Even after combinational optimization, there remains additional power saving opportunities from sequential clock gating.

Sequential clock gating takes advantage of existing inefficiencies in the RTL such as unused computations, data dependent functions and don’t-care cycles. There

click to enlarge

Figure 2. Timing diagram of clock-gated design
are many forms of sequential clock gating transformations. These include conditions such as data being written into a register that will not be used in subsequent clock cycles. Sequential analysis recognizes these unnecessary writes and eliminates them with clock gates.

Sequential clock gating saves dynamic power by reducing power dissipation in the clock tree and associated registers. Additionally, switching activity in downstream combinational logic and registers is eliminated further reducing power dissipation.

The keys to sequential clock gating are understanding the sequential nature of the design and identifying the correct enable conditions.

Sequential Analysis of RTL Designs
Sequential analysis is the process of observing functional behavior over time. Applied to RTL, sequential analysis computes the temporal relationships between design states across multiple clock cycles. These relationships can be exploited to reduce power.

Sequential clock gating uses sequential analysis to identify enable conditions that span multiple cycles. These conditions can become complex, making them difficult to identify and implement. As a consequence, manual coding of sequential clock gating requires experienced engineers with considerable design knowledge.

To demonstrate sequential clock gating, the upper diagram in Figure 1 shows a non-optimized and clock gated datapath. In the example, data flows through two computational stages before being latched into the output register dout. The output of dout is held based on the signal vld_2. The clock gate on dout is a simple combinational substitution of the feedback loop. In the lower diagram in Figure 1, sequential clock gating on d_1 and d_2 requires sequential analysis to propagate the data hold condition backwards, disabling the unused computations in previous cycles.

By looking at the waveform corresponding to the clock gated datapath in Figure 2, the yellow check marks show the cycles during which clock to the register dout is gated. Similarly, the red check marks show the additional switching eliminated by sequential clock gating on d_1 and d-2.

Case Study
Baseband signal processing in wireless devices is computationally intensive and typically operates at a high frequency causing considerable dynamic power dissipation.

One particularly power hungry signal processing design is a correlator function in pattern recognition algorithms. The correlator measures the similarity of two signals. In this case, it was used to find features in an unknown signal by comparing it to a known one at different times. The original design consumed 964µW and already had 44% of the registers clock gated.

To reduce power in the correlator, the design team added sequential clock gating to the RTL code through PowerPro CG from Calypto Design Systems, automated RTL power optimization uses sequential analysis technology to identify clock-gating optimizations. Its cost-driven optimizations take into account area, timing and static power while evaluating sequential transformations.

The design team used the software to run the correlator block and many sequential clock gating opportunities were identified.

click to enlarge

Figure 3. Sequential Analysis of Correlator RTL code
Figure 3 shows one of the sequential clock gating transformations found in the correlator. In this case, the output of register Q is used only when output of register B is zero. The value of register B comes from register A in the previous cycle.

Understanding the temporal relationship between register A and register Q, it becomes clear that register Q can be clock gated whenever register B is zero. Identifying this sequential relationship and recognizing the opportunity for clock gating requires sequential analysis; power-aware RTL synthesis tools won’t find these clock gating opportunities.

The tool generated new, functionally equivalent RTL code with register A driving the enable logic of register Q. Overall, more than 50 sequential transformations were implemented to produce a low-power version of the correlator RTL. This code was run through RTL synthesis to measure power, timing and area. Results showed 24% power reduction with number of clock gated registers increasing by 60%. Results from RTL synthesis showed the total area was unchanged and timing slack went from 1402 ps in the original design to 1378 ps in the new design.

Advances in broadband wireless technology are driving low-power SoC design. The increasing digital content in these devices is necessitating the need for low-power design methods. Sequential analysis of RTL identifies powerful sequential clock gating optimizations that reduce dynamic power without changing functionality or impacting timing. PowerPro CG automates the sequential clock gating process, reducing power without impacting design area or timing.

About the Author
Mitch Dale is director of product marketing for Calypto Design Systems, 2933 Bunker Hill Lane, Santa Clara, CA; 408-850-2300; Dale can be reached at