Glossary of Acronyms

API — Application Program(ming) Interface
ASIC — Application-Specific Integrated Circuit
CPU — Central Processing Unit
DSP — Digital Signal Processor
FPGA — Field-Programmable Gate Array
GUI — Graphic User Interface
I/O — Input/Output
MAC — Media Access Layer
MCM — Multichip Module
MCU — Main Control Unit
OS — Operating System
PDA — Personal Digital Assistant
RISC — Reduced Instruction Set Computer
RTOS — Real-Time Operating System
VoIP— Voice Over Internet Protocol
Wi-Fi — Short for Wireless Fidelity, a name applied by the Wi-Fi Alliance and usually taken to mean any type of 802.11 network

Real-time convergent solutions package signal processing and control together.

Today’s consumers demand constant improvement in their media experience. They want high-quality video and audio on applications such as their cell phones, PDAs, surveillance and security and remote monitoring controls. All of these applications must deliver small form factors and long battery life, and must be affordable. Better still, as standards change, these applications should be able to meet the ongoing functionality and standard upgrades as they become available.

Such application demands put increased pressure on the underlying processors that drive these devices. To process video in real time, they need to run at extremely high clock rates — functionality that fights against achieving the power efficiency that enables long battery life. Also, the silicon needs to be quite small and inexpensive to preserve board real estate and keep end products price competitive. And, to meet rapidly changing multimedia standards, they need to be software reprogrammable.

This article will discuss the issues that previous solutions have had in meeting these needs and present modern convergent solutions that better address current application design needs.

Yesteryear Tactics

Conventional portable media device solutions are based on design philosophies, many of which have significant drawbacks:

  • FPGAs, while eminently flexible, are typically large, expensive and power hungry. Thus, although ideal for prototyping designs, they aren’t well suited for portable multimedia applications.
  • ASICs are ideally sized for the media tasks at hand, but are costly to design and are not readily adaptable to new or changing compression standards.
  • DSPs are an excellent choice for the high-performance, number crunching demanded by video-intensive applications. However, they are too focused on math routines to "run the whole show," so they rely on the slower, more "managerial" microcontroller MCU for system control functionality such as the user interface and an operating system. Having a DSP + MCU solution on separate chips (or even under the same hood) adds to system cost and complexity when the processors are innately different architectures.
  • MCMs combine DSP and MCU functionality into a single chip, reducing cost, space and power. However, the designer must partition a 50/50 share of control and DSP functions. Once the DSP is maxed out, the MCU cannot take on additional computational burden.
  • DSP in MAC takes care of the straightforward signal processing application, but they are not suited for intensive number crunching. This solution lacks the ground- up architectural design necessary for advanced applications.

Today’s Solution — The Convergent Processor

click the image to enlarge

A better solution in many wireless multimedia applications is to gain the performance benefits of a dual RISC + DSP combination, but to combine the features of both devices into a single integrated core. This eliminates much of the cost and many complexity issues associated with a conventional dual processor design.

The convergent processor integrates MCU and DSP functionality in a unified architecture that allows flexible partitioning between control and signal processing needs. If the application demands, the convergent processor can act as a 100% MCU (with code density on par with industry standards), a 100% DSP (with clock rates at the leading edge of DSP technology), or some combination in between.

Available processors such as ADI’s Blackfin (see Figure 1), StarCore’s SC1000/2000 and Freescale’s MSC81xx exemplify convergent processors where their CPU exhibits the major characteristics of both DSP and RISC processors. Other examples combining major elements of both RISC and DSP include the ARM11, the Freescale DSP56800E, Renesas SH3-DSP and Infineon TriCore designs. These devices have been developed to meet the needs of highly demanding integrated systems such as mobile devices that typically combine operator interface/GUI, audio/image processing, wireless connectivity and potentially streaming data inputs.

Apart from the obvious mobile consumer electronics applications, such designs are also appearing in security camera systems, factory floor control systems using Wi-Fi distribution technology as well as a plethora of telecom applications such as VoIP as their functionality is integrated into devices ever closer to the network edge.Until now, however, the specific software needs of these next generation devices have not been properly addressed.

New Silicon, Old Software

Although processing architectures have rapidly evolved, the RTOSs that support applications built on these architectures have not changed much in the past 20 years. The convergent processor poses highly specific real-time problems because it represents the very collision of control (RISC/MCU) and DSP functions on a single convergent device. The design team’s most obvious and immediate problem is which (single) RTOS to choose for such a single core processor.

A software developer using an RTOS with a DSP heritage will find it difficult, if not impossible, to implement the complexity of synchronization and resource management required in what is—at least partially — an MCU environment. A developer using an RTOS with an MCU/RISC heritage introduces the inefficiencies of the MCU-focused RTOS into the processing path of the critical DSP functions. With current multimedia loads, such inefficiencies would be too costly for the application and would certainly not maximize the efficiency of the target silicon.

Given that a basic function of any RTOS should be to provide the real-time tools that maximize the efficiency of the target application both in terms of system design (time to implement) and processing performance/resource utilization, such a compromise is not acceptable. The RTOS should allow the designer maximum flexibility to balance the control and media streaming capabilities that best match a specific design application.


click the image to enlarge

Figure 1. Block Diagram of a single-core BF533 embedded media processor.

Single-stack Functionality

To gain maximum efficiency on a signal processing stream, developers choose a single-stack RTOS, which is a specialized executive for DSP and dataflow processing. In the past, most developers simply rolled their own single-stack RTOS capable of fast, timely responses to demands for dataflow operations such as streaming or signal processing. The secret to a good single stack RTOS is that it be robust, efficient, and have a small footprint (between 2 and 10 kB).

More recently, application developers have preferred the additional functionality and time to market that comes with buying a single-stack RTOS. Some RTOS vendors have developed single-stack executives.

For example, examine a RTXC single stack executive (RTXC/ss) is built around a cooperative scheduler with three types of code entities for this environment: threads (lightweight, specialized tasks), interrupt service routines and kernel services. Thread data, interrupt contexts and other variables are all stored on a common stack. This scheduler grants processor control to threads in response to requests made from other threads or interrupt service routines via kernel services.

An RTXC/ss thread is coded as a C function, but has no context on entry and saves none when it returns processor control to the scheduler. This lack of context makes the transition from the scheduler to the thread quite fast, a decided advantage when responding to a demanding operational deadline. During its execution cycle, a thread cannot explicitly wait for a system event. The lack of context and the inability to block are the two primary attributes that distinguish a thread from a task.

Because it has no context at entry, a thread must perform any required data initialization upon entry. When its operations are complete, the thread returns to the scheduler without a return value and without leaving any of its operational data on the stack.

A thread exists within a user-defined priority level. Multiple priority levels may be defined, allowing a thread at a higher priority level to preempt an executing thread running at a lower priority level. A thread with the same priority level as the executing thread cannot preempt it, but must wait until the current thread completes its execution cycle before the scheduler grants it processor control.

A fundamental characteristic of RTOS implementations that will reveal the heritage of the RTOS is how they manage the stack. DSP RTOSs generally manage a single stack, whereas an RISC RTOS will generally provide support for multiple stacks (one for the OS and one per task, supervisor and user stacks). This is because a single-stack implementation is the most efficient, and it is relatively safe in a DSP system where the number of tasks/threads is small and the synchronization issues relatively simple.

click the image to enlarge

Figure 2. The typical multistack RTOS is ideal for systems that require fast interrupt response time and rapid, deterministic switching between tasks.

The Multistack Plethora

In contrast to the single stack RTOS, the control RTOS uses multiple stacks to provide a degree of segregation and control between what could be a large number of tasks. It is less efficient because stacks have to be swapped at context switch time and perhaps when an interrupt occurs. The stacks, however, are somewhat isolated within one specific task and are kept separate from the RTOS, which must continue even when some tasks fail.

Because this RTOS involves more design complexity, application developers have tended to purchase the multistack RTOS instead of rolling their own. Because of the larger market, many RTOS vendors — Wind River, Enea, Green Hills, QNX, Quadros and many others — offer multistack, event-driven RTOSs.

The multistack kernel (see Figure 2) is a traditional yet flexible multitasking kernel architecture intended for use in applications such as communications, automotive, process control and instrumentation systems.

It is ideal for systems that require fast interrupt response time and rapid, deterministic switching between tasks. Each task has its own stack, allowing it to wait for synchronization with system events. In addition to the task stacks, it employs a system stack for use in processing kernel services and during interrupt service routines.

Each task has a priority, and the default task scheduling policy is preemptive according to priority. The multistack scheduler assigns control of the processor to the highest priority task that is ready to run. If a task of lower priority is in control of the processor when a higher priority task becomes ready to run, the scheduler preempts the lower priority task and grants processor control to the one of higher priority. In addition to preemptive scheduling, the kernel also supports round robin and tick-sliced scheduling for tasks with the same priority. Some multistack RTOSs allows independent variables other than time, which lets tick-slicing become a general solution to limiting the duration of a task’s execution.

There are three additional code entities in many multistack environment besides tasks: kernel service APIs, kernel services and interrupt service routines. Tasks and interrupt service routines perform the operations required by the application and invoke kernel services through their associated API functions to affect behavior of the system.

The multistack kernel supports classes of kernel objects and kernel services support those objects. Objects exist for task synchronization, passing data, managing events and counters, alarms, managing memory and exclusive access to entities. Using his knowledge of the system design, the developer can use a utility to select the object classes and their properties, which scales the size and features of the kernel to the configuration best suited to meet the requirements of his application. The size of multistack kernels varies depending on the applications it primarily targets, but typically embedded multistack RTOSs range between 4.5 and 20 kilobytes, depending on the processor and the efficiency of the compiler.

The Development Dilemma

Most commonly, MCU and DSP software development groups were in separate design teams, with each group free to select an RTOS, development tools and enabling software stacks from their vendor of choice. But now with convergent processing, the development groups are pushed literally onto the same processor.

To better understand the problem, consider the RTOS requirements of each processing model in more detail. DSP has a data flow nature in which a process executes an algorithm on a block of data without stopping, producing another block of data that it passes on to another stage in the sequence. DSP processing often involves high-frequency I/O with strict sampling and processing requirements, making it important for the RTOS to respond to interrupts with minimum latency.

Ideally, the RTOS saves and restores a minimum amount of context between execution cycles of DSP processes, as well as offering low overhead in both the process scheduler and kernel services. Real-time or control processes can stop and wait for synchronizing events to occur. To support that requirement, control applications usually employ a multitasking RTOS in which the scheduler determines which task gets control of the CPU. Whenever there is a change of processes (a context switch), the RTOS must save and restore the processes’ contexts (registers, etc.), actions that can involve moving a large number of bytes and consume a lot of processor cycles. However, such actions make it possible for the processes to stop and start according to the dynamic of the system, which, though ideal for control processing, is not desirable for digital signal processing.

For these new convergent processing architectures, it is important for application developers to concurrently manage high data rate, block-oriented processes and event-driven control processes, and to do so as efficiently as they would have on either a dedicated MCU or DSP. In this model, the ideal RTOS would behave like a low-overhead, lightweight DSP executive sometimes, and at other times, as a complex multitasking RTOS.

For convergent processing to be effective, the RTOS must be designed to operate in a hybrid or dual mode. A dual-mode RTOS combines a traditional task-based kernel architecture for real-time control processing with the specialized single-stack executive for DSP and dataflow operations using a common API. This unified RTOS solution enables both types of application code to run fully optimized on a single processor. All of the kernel functions and services of both the single stack and multistack are fully available to the developer in a dual-mode RTOS and may be scaled to fit the application requirements using a configuration tool. Figure 3 offers an example of a dual-mode RTOS implementation.

click the image to enlarge

Figure 3. The RTXC/dm combines lightweight DSP-centric threads with control-focused tasks.

Managing Threads and Tasks Using Prioritization

So that both processing models can coexist successfully in a single-kernel architecture, the dual-mode RTOS uses three distinct priority zones (see Figure 2). Interrupt servicing occupies the highest priority (Zone 1), which takes precedence over all operations in the other zones. Zone 2, the middle priority, is reserved for all single-stack datastreaming operations; all thread operations and kernel operations of the multistack. Operations in Zone 3 are generally classified as system initialization or task operations and include calls to the API library for kernel services. Zone 3 operations occur when there are no Zone 1 or Zone 2 operations needing attention.

In this model, dataflow applications that run threads always have a higher priority than control plane tasks. This means that data flow operations, such as voice or video processing if organized with threads, preempt any task and run to completion before returning to the preempted task. Task-based operations can gain control of the processor in between operations in Zone 2. In the RTXC/dm kernel, where threads and tasks coexist, tasks can initiate threads. Because of the priority of Zone 2 over Zone 3, task-initiated threads cause immediate preemption.


Silicon manufacturers continue to advance the state of the art with new convergent processing architectures that integrate microcontroller and signal processing functionality. These new architectures demand a new generation of real-time operating system that enables applications to take full advantage of the higher level processing capability. The dual-mode RTOS combines a low-latency cooperative scheduler and a prioritized, preemptive, event-driven scheduler into a single, integrated real-time operating system. Developers can now confidently choose a powerful convergent processor for their next project knowing that they can maximize the processing efficiency of both control and dataflow with a RTOS that adapts to the needs of the application.

The needs of the development process are also met. The dual-mode RTOS provides all the core features of both the multistack and single-core kernels. This will give a level of comfort to both MCU-style application developers and their DSP embedded software engineering counterparts as each group can develop their code in an environment that is familiar, comfortable, and as far as they are concerned dedicated to their individual needs — yet always within the same API. The system designer has only to determine the relative prioritization of the various lightweight DSP threads and the control style tasks.

About the Author

Tom Barrett is president of Quadros Systems Inc. He has been working in the development of commercial RTOS technology since1966. He was the designer and developer of the RTXC RTOS that first appeared in 1978. He has been in the computer industry since 1962, spending much of his time in the embedded systems segment. Having been a programmer, systems analyst, systems designer and business owner, he has brought an experienced and knowledgeable perspective to many articles about RTOS and real-time systems issues. He has a B.A. in Mathematics from the University of Texas.