Paul Master, Vice President, Technology, QuickSilver Technology, Inc., Santa Clara,CA

Up until now, wireless designers have relied on traditional RISC and DSP architectures as the foundation for early generations of wireless handsets. However, demands for greater functionality in 2.5, 3 and 4G, plus the requirements for higher performance and lower power consumption, are placing major limitations on these conventional microprocessor (μP), ASIC, and DSP chip technologies. However, newer IC design approaches like adaptive computing machine (ACM) technology are emerging to give wireless system engineers that higher performance with substantially lower power consumption. Moreover, the ACM provides designers greater design latitude for adding much more functionality as compared to existing μP, ASIC, and DSP-based designs.

Conventional Chip Technology Inefficiencies
The inefficiencies of conventional chip technologies best explained via a benchmark, Fig. 1. Running an algorithm on a μP or DSP incurs a large number of clock cycles as a result of instruction fetches and operand read/writes. Most of the cycles a μP or DSP executes are overhead operations just to set up the actual work output desired. In this example, the benchmark adds together 27 floating point numbers for a 27-input adder function.

Figure 1
Work/Power Tradeoff

In the μP example, it takes 1,013 clock cycles to perform the addition of 27 floating point numbers. It does so by going through innumerable data executions. First, the μP issues an address to memory and then it fetches an instruction. At this point, 40 or so different optimization techniques have been applied to improve the performance of the instruction execution base. Depending on the manufacturer, the μP goes through 4 to 25 pipeline stages in the instruction process.

If it is a very long instruction word (VLIW) or superscalar RISC processor, it will perform speculative execution, register scoreboarding, a variety of different branch prediction techniques, out-of-order execution of the instructions, and a variety of other optimizations. All these steps occur to issue an address to memory, fetch the first data element, and input it into a register. These steps are thus performed 27 times and require 1,013 clock cycles.

A DSP implementation with dual multiply-accumulate units (MACs) performs the same benchmark in 107 clock cycles. A MAC operation is performed in a single unit cycle; however, two multiply accumulates can be simultaneously performed. The DSP is a modified Harvard architecture with two independent data streams and an independent instruction stream. Compared to the μP, the system engineer can take advantage of the course-grain parallelism. This results in the 107 clock cycles, which is an order of magnitude faster than a μP.

A third design approach is adaptive computing. With ACM technology, the system engineer has the ability to implement into silicon the exact hardware required at any given point in time. Hence, an algorithm that implements the 27-input adder can be downloaded into the adaptive computing silicon, and this computation is completed in seven clock cycles an order of magnitude improvement over a DSP. By taking this approach, the adaptive computing IC can run a given algorithm in its most efficient form.

Understanding Adaptive Computing
ACM is a new IC technology targeted at consumer, portable, mobile, and wireless communications. An ACM chip adapts continuously and on-the-fly so that within a fleeting moment, the chip's silicon changes its architecture hundreds of times, thus emulating a much larger device over time.

Its adaptive circuitry allows software algorithms to build and then embed themselves into the most efficient hardware possible for their application. A good analogy is the ACM continuously structures the best ASIC for any given application at any given moment in time. Because the ACM's computational approach eliminates much of the code overhead, such as memory fetches and ALU/MAC set-up procedures, computationally intensive algorithms can run in hardware at hardware speeds, rather than as software running on top of hardware. This constant conversion of algorithms into hardware means faster and more efficient operation as compared to conventional μP or DSP technology.

The adaptive nature of the ACM gives wireless designers higher performance, lower power consumption, small size, low cost, and greater design latitude as compared to rigid μP, DSP, and ASIC designs. It enables what was once a handset to become a single mobile communicator that performs a wide variety of tasks with media-rich applications, including voice, data, image, and video. The ACM silicon becomes a function of its input, rapidly adapting at blazingly fast speeds to create a specific hardware engine for each task.

Conversely, conventional IC technologies like ASICs and μPs/DSPs only provide designers with fixed architectures capable of performing only those functions originally designed into the handset.

Adaptive computing circuitry also enables multi-mode handset operation that allows for constantly changing standards or ever improving software algorithms. This means that with every change, the ACM can adapt on the fly, eliminating the need for new silicon to be created.

Another major issue wireless designers face deals with the fact that a host of algorithms continues to become more computationally intensive and power hungry. Two cases in point are the computational intensive discrete cosine transform (DCT) and the Qualcomm Code Excited Linear Predictive (QCELP) coding speech compression algorithm.

DCT Application
Current second generation (2G) PDAs and the new 2.9G and 3G mobile wireless handsets call for MPEG-4 streaming video, the latest generation video compression/decompression standard primarily targeted at devices with medium speed data communication links. About 20 to 30 percent of the MPEG-4 computations are in the DCT function; 20 to 30 percent in Huffman encoding; and another 20 to 30 percent are in motion estimation. Streaming video like this demands higher levels of processing performance. A dual streaming video MPEG-4 data stream (one encode channel and one decode channel) at full color, with 160 × 160 pixel resolution at 15 frames per second, requires about 3.7 billion operations per second.

Figure 2.
MPEG-4 DCT and Motion Estimation Computational Power Distribution

Fig. 2 shows the specific computations each of the DCT and motion estimation algorithms require. DCT computations are 30 percent each for addition, subtraction, and multiply-accumulate (MAC) functions. Motion estimation involves about 40 to 50 percent additions and another 40 to 50 percent in absolute difference in accumulation computations. A major DSP-based MPEG-4 issue wireless designers will encounter is that DSPs are not well tuned to perform absolute difference in accumulation computations.

It is also vital for wireless designers to know that conventional μP and DSP performance shortcomings can be problematic for achieving the full power of algorithms. At times, the designer or developer must change an algorithm to fit a particular RISC, μP or DSP architecture. In these cases, the developer must fine tune the code so that the algorithm runs as efficiently as possible on that particular μP or DSP architecture.

QCELP Application
Fig. 3 shows a QCELP engineering analysis comparing power consumption among a DSP only, an ASIC, and an ACM. Eight inner code loops or algorithms consume most of the power in the QCELP algorithm. They are codebook search, pitch search, line spectral pairs (LSP) computation, recursive convolution, and four different filters.

Figure 3
QCELP Example

A QCELP algorithm running on a DSP core with embedded memories consumes about 84 milliwatts (mWs) of power in a 0.18 micron CMOS process technology and utilizes four square millimeters of silicon. If the eight most power consuming inner code loops are implemented directly in an ASIC (Fig. 3c), they would consume only 3 mWs, but add 23 square millimeters of rigid silicon. Thus, an ASIC implementation saves significant power, but is much larger than the DSP-based solution and totally inflexible.

Implementing those eight most power hungry QCELP inner code loops in ASIC logic cores consumes only 19 mWs. That's 3 mWs for the ASIC cores and 16 mWs for the DSP core. But this design approach requires 23 square millimeters of silicon. In this instance, the ASIC cores run the eight inner code loops and the DSP core runs the remaining QCELP code.

The ASIC silicon area is considerably larger than the DSP version, but at a major power savings. However, the ASIC approach for wireless and cellular phone designs is becoming more and more impractical due to its rigid architecture and the need to accommodate ever-changing algorithms, standards, and the growing demand for greater functionality in a mobile device.

The wireless system designer can significantly cut power consumption by adding a minimal 5 square millimeter ACM to the DSP engine. Here, the eight inner code loops are removed from the DSP operations and ported into the less power hungry ACM engine, which now consumes only 3 mWs. By taking this route, the designer transfers 68 mWs of power out of the DSP operation, which earlier consumed 84 mWs. The DSP/ACM-based QCELP vocoder design now consumes a total of only 19 mWs.

The algorithm programmable ACM exhibits the power characteristics of an ASIC, yet is comparable in size to the DSP, thus the ACM presents the best of all worlds. The ACM instantiates into hardware the particular algorithms required at any moment in time, as in this design. Data comes into the QCELP speech codec every 20 mS. So, each inner code loop has to be run 50 times a second. Essentially, the ACM is spatially and temporally segmenting a small piece of the silicon to make it appear like an ASIC solution. Thus, at 400 times a second, the ACM is bringing into existence the exact hardware required to run these algorithms.

In both the DCT and QCELP algorithm application examples discussed above, as well as other algorithms, the ACM architecture is significantly more efficient than a DSP-based implementation. Reference again the DCT application. The necessary logic for each DCT function is implemented on the adaptive silicon at any given point in time. Therefore, the need for a a DSP, ASIC, μP, or microcontroller is eliminated. Functions not implemented into an ACM's gates do not use power. Efficiency is improved because the ACM design has a set number of gates and all of them are used during an adaptive transaction. No gates are wasted. Consequently, an ACM design is more direct for each function than a DSP-based design.

Moving Toward Innovation and New Business Paradigm
Adaptive computing is an enabling and disruptive technology that will re-direct the wireless/cellular industry toward innovation and an improved business paradigm. It gives all parties involved in the business — handset OEMs, service providers, and consumers — a virtually unlimited number of choices.

The key is the adaptability of the ACM technology and the fact that by its nature, software becomes hardware as a result of downloads on demand via the Internet.

ACM-based mobile and wireless devices will open the door to greater profit opportunities for the OEM and the service providers. Plus, the consumer reaps the benefits of having considerably more features, functionality, and options as a result of a mobile device's built-in flexibility and efficient performance.

Service providers are currently strapped to a limited price-per-minute business model and from all appearances, the future doesn't look promising. U.S. service providers buy about 90 percent of all cell phones made today. However, current handset technology, based on inflexible ASIC/DSP chips, is at the crux of keeping service providers at a business stalemate.

In short, ASIC/DSP technology limits cellular phones to single, dual, or triple-mode protocols and provides virtually little flexibility for designing in a variety of new features and options. Service providers have little on which to compete because they're buying the same basic phones from OEMs who all use the same conventional ASIC/DSP technology.

On the other hand, the ACM technology holds great promise for dramatically changing this archaic and unprofitable price-per-minute business model. In effect, it becomes a new ballgame for service providers with new opportunities for business expansion, creating new fee structures for a host of new services and features made possible by an ACM-based mobile device's adaptability.

For example, providers can offer differentiation and value through seamless network roaming. Travelers can get new hardware file downloads from their service providers, allowing them to communicate via their ACM-based handsets to and from anywhere in the world. Entertainment such as Internet music and business information can be piped in to the mobile device which the OEM equips with headset ports. The list of new services that wireless providers can create is limited only by their business ingenuity.

The same holds true for the wireless OEM and its staff of design engineers who now have adaptive computing circuitry as the basis for initiating next generation designs. No longer limited by rigid μP and DSP architectures, they now have a newer, more efficient ACM design methodology that sparks greater engineering creativity.

There are countless examples of the variety of choices adaptive computing can yield. The system designer must keep in mind that unlike a conventional cell phone used only for calling, an adaptive computing-based wireless/mobile device can play many roles based on the user's need of the moment. To begin with, the OEM system designer can present the consumer with multi-mode calling to and from anywhere in the world. Plus, the traveler won't need to worry about battery power because the ACM is 10 to 100 times more computationally powerful than current ASIC or DSP engines, while operating at 50 percent power savings.

Other feature examples the OEM system engineer can design into a mobile device are: A host of functions available via the internet, such as CD-quality music, books on tape, and a variety of audio and video broadcast information; MPEG-4 personal video conferencing and video telephony; encrypted messages; speech recognition; verbal e-mail; and the list goes on with new functions and features open to the imagination of the wireless handset OEM.