3G & Beyond: Harnessing the Power of Multicore Processors for 3G, WiMAX & LTE
Multicore processors are now an increasingly common and effective tool to address the challenges of power and performance in cellular basestations.
Although wireless’ killer application is still voice, data is rapidly becoming a close second for 3G, a trend that will continue as operators deploy 4G technologies such as mobile WiMAX (IEEE 802.16e) and Long Term Evolution (LTE). Between e-mail, Web browsing, music downloads and machine-to-machine (M2M) applications, all of that data traffic means additional work for each base transceiver station (BTS) or Node B, particularly those in urban areas.
The workload creates new challenges for chip designers as they develop Systems on Chip (SoCs) for BTS modem applications. One key consideration is the baseband processor platform, where multicore processors are an increasingly common, highly effective tool for balancing power and performance.
For 3G and 4G BTS applications, the ideal solution features multicore Digital Signal Processors (DSP) that have on-chip accelerators, which eliminate the need for an FPGA or a microprocessor. A multicore platform that can support multiple applications and forms a scalable solution to support different form-factors directly benefits BTS vendors by lowering R&D costs and reducing both development time and time to market, especially when used in a software programmable platform.
The ideal solution can also include high-performance interfaces such as Gigabit Ethernet for network connectivity, on-chip Open Base Station Architecture Initiative (OBSAI) and Common Public Radio Interface (CPRI) antenna interfaces to support direct connectivity over the backplane to a RF transceiver card or to a remote radio head (RRH), and RapidIO for inter-DSP connectivity. Figure 1 illustrates some of these choices.
Power issues are a major reason why so many cellular infrastructure vendors and their suppliers are migrating to multicore designs. As BTS workloads increase, simply increasing a DSP’s megahertz is no longer a viable solution because of the power required and the heat produced. The preferred alternative for higher performance DSPs is the use of a multicore design. For example, if the system requires 3 GHz worth of performance from a DSP, the most attractive option is three cores running at 1 GHz, each in a single DSP package. That design meets both power and performance targets.
Another power-reduction technique for high-performance DSPs is TI’s SmartReflex™ technology, which decreases both static and dynamic power consumption while maintaining the specified device performance. Texas Instruments’ Smart Reflex technology considers factors such as device-specific silicon characteristics based on the manufacturing process, as well as thermal parameters. This effectively reduces power within the DSP while maintaining performance targets – currently 1 GHz for the TCI6488, one of the DSPs that includes Smart Reflex.?
As DSPs become more powerful, they’re able to take on tasks that once required adjunct components, such as general-purpose processors, RISC processors and FPGAs. The latest multicore DPSs – such as TI’s TCI6488 – have enough horsepower to shoulder all of the tasks on a baseband card. That directly benefits the vendor’s bottom line and competitive position by eliminating unnecessary components and thus reducing Bill of Materials (BoM) costs. Eliminating power-hungry FPGAs also helps systems designers meet efficiency requirements.
For example, in a multicore processor such as the TCI6488, systems designers can have one DSP core handle the MAC processing that previously had required a separate RISC processor, while the remaining DSP cores manage PHY processing and other functions. The TCI6488 also streamlines the design process by supporting both MAC and PHY layer processing on the same platform. Depending on the vendor’s strategy and in-house capabilities, it could use the TI-provided functional libraries and then tweak them to create its own unique solution, or it could work with one of TI’s third-party partners to source a complete, turnkey solution.
One such solution available is a complete mobile WiMAX Wave 2 PHY and MAC solution. Whatever their choice, systems designers now have the mix of flexibility, low development costs and fast time to market necessary to compete in highly competitive markets such as mobile WiMAX, which has more than 300 vendors jockeying for customers.
The latest DSPs also can support multiple air interfaces, giving vendors the flexibility to use the same platform and knowledge to target multiple markets – thereby reducing both development costs and time to market. For example, the TCI6488 currently supports LTE, WCDMA/HSPA/HSPA+, TD-SCDMA, WiMAX and GSM/EDGE. Figure 2 illustrates some of the current configuration options.
These technology options also show how a baseband platform, such as the TCI6488, can reduce a wireless operator’s capex by providing the highest number of carriers per channel card and enabling them to support new features and standards on the same baseband hardware.
The TCI6488 also illustrates how multicore DSPs can provide vendors with the flexibility to scale a single product design to serve a variety of applications. For example, a systems designer could link together multiple TCI6488s to scale a platform up or down for picocell and macrocell applications. The systems designer also could choose to have one board handling transmit functions and another handling receive functions, or a single board handling both transmit and receive for a certain number of users. Figure 3 illustrates some of these customization options.
Prioritization and Balance
Today’s SoCs typically are multicore DSPs, with independent IP blocks that must interoperate and synchronize to achieve a single, complete modem function. This architecture requires a way to prioritize tasks and then map them into a multicore environment.
The easiest option is to divide the users amongst the DSP cores so that each core maintains its own queues. But there are two drawbacks. First, some functions, such as filtering and demodulation, may be shared among all users. Second, some functions may be required to share coprocessors or peripherals, so they’re not completely independent. As a result, the interaction between the sets of priority queues can get complicated, making it difficult to ensure real-time performance. The coprocessors and peripherals also become more complex because they have to support access by multiple cores, so they have to decide which core’s task gets priority. All of this adds complexity to hardware and software drivers, and makes testing the final system more challenging and time-consuming.
To avoid those drawbacks, the TCI6488 has taken a different approach: assigning a functional task to a single core so that each core is in charge of a unique group of functions. Each coprocessor, which generally accelerates a specific type of function, is associated with a single core. That approach greatly simplifies the order of the tasks performed on that coprocessor. In many cases, peripherals also will communicate with a single core, reducing the testing required to verify that tasks won’t be starved of data.
Because DSPs are used for a variety of functions, the TCI6488 SoC is designed to be highly symmetric where needed. For example, all cores on the TC16488 have access to the Receiver Accelerator Co-Processor (RAC). This design makes it possible to run the same functions on all cores and still provide access to all coprocessor and peripheral resources to any core when needed. However, it’s recommended that systems designers have one core to interact with the RAC on the TCI6488 DSP, thereby simplifying the device’s operation.
By balancing the resource load across multiple cores, it’s possible that a single core may reach its maximum capacity before the others do, depending on the code used for each task. The solution is repartitioning, something that requires a complete change in software architecture — a step that system designers prefer to avoid when the DSP has already completed testing. Due to the advances in software-defined-radio (SDR) methodology and tooling, the task of making software partitioning can be less onerous.
Some DSPs, such as the TCI6488, have used code cycles estimates, spreadsheets and transactional level models to develop the recommended software partition for the WCDMA SoC. This partition was implemented in the TCI6488 DSP in a way that it provides a near-optimal solution while still allowing the simplicity of having only one DSP core controlling the RAC, one core controlling the Turbo Co-Processor (TCP) and Viterbi Co-Processor (VCP) and another core performing Tx chip rate acceleration and communication with the antenna array interface for output.
For other standards — such as OFDM-based standards that don’t use the RAC — it’s easier to develop a symmetric software architecture. But even in those cases, it will be simpler to divide the problem so that the FFT/IFFT and some of the modulation and demodulation is performed by one core and the results communicated to another core for symbol rate processing. This approach simplifies communication between the antenna interface, or Serial RapidIO if this is used for antenna data, and the other DSP cores that are processing the front end. It also simplifies the back-end symbol-rate processing and its communication with the Ethernet or serial RapidIO peripheral.
In fact, OFDMA modulation is jointly performed for all users, which cannot be completely separated onto different DSP cores. As a result, it’s believed that simplicity of software architecture, along with the nature of many modem algorithms, are among the major reasons why system designers should partition tasks so that the software is not symmetric across various DSP cores.
Balancing Resources Across Multiple SoCs
Another issue is whether each SoC should have a different task, such as one SoC performing nothing but symbol-rate decode and another focusing on chip-rate modulation. The catch is that any on-chip coprocessors will not be used efficiently.
For example, a TCI6488 device performing only symbol-rate processing needs a much more powerful, and therefore power- and area-hungry, Turbo and Viterbi decoder. But this decoder is of no use to another SoC that is doing only chip-rate correlation and therefore requires a much more powerful receive accelerator. So unless there is a different SoC for each board function the coprocessors would have to be the worst case for each function. Building a different SoC for each group of functions is a financial waste.
Dedicating SoCs to a particular subset of functions also does not make for a scalable system. Clearly if one wishes to increase the channel density on a board, with each SoC performing the same complete set of functions, one can simply add more SoCs to the board. TCI6488 is designed to allow this to happen with minimal extra hardware. The antenna interface and serial RapidIO will both daisy chain and the Ethernet and RapidIO can be attached to a switch.
But if different SoCs provide different functionality, the number of users has to be basically doubled for this system to be scalable. If 15% more users are required, the way to increase the capability of the SoC doing symbol rate by 15% is to add another SoC, which would be only 15% utilized. The same is true for the other SoCs, producing to a very ?inefficiently scaled solution.
For a system design using multicore, coprocessor-accelerated SoC, the system architecture that is the most scalable at the board level and leads to the simplest, most easily tested software is one where each DSP core in the SoC performs a unique subset of tasks, but each SoC in the system performs the same set of tasks as the other SoCs. The TCI6488 was designed for this scenario in WCDMA/HSPA networks, with an emphasis on flexibility to also efficiently support other modem standards in the same manner.
The end result is that utilizing a multicore DSP in a 3G or 4G BTS provides the mix of performance and power efficiency that is required for success. But not all multicore DSPs are created alike, so for systems designers an equally important choice is a DSP that is backed by an extensive functional library and other tools to ensure lower development costs and fast time to market.?
WDD??Manish Patel is a product manager for the Communications Infrastructure (CI) Business Unit in DSP Systems. Manish has a 17-year career focus in the wireless industry with marketing experience in mobile terminals/PDAs, WiMAX network equipment and customer premise equipment (CPE) devices, cordless phones, WLAN equipment and baseband DSP silicon products.