Virtual System Prototypes (VSPs) offer a means to develop and debug embedded software before the silicon is available.

By Jeff Roane, VaST Systems Technology Corp.

As the semiconductor industry continues to consolidate off-the-shelf hardware components to build wireless consumer devices, such as 3G phones, these devices are increasingly differentiated by software content. The iPhone® is a great example of that trend. In a time where the overall market is in decline, the iPhone uses an off-the-shelf ARM-based application processor, differentiated by a software-based intuitive human machine interface and new application features.

As the focus definitely shifts from hardware to embedded software, new methodologies must be applied to ensure the requested functionality implemented by increasing numbers of lines of code is completed when the product is shipped. Virtual System Prototypes (VSPs) offer a means to develop and debug embedded software before the silicon is available.
So What is a VSP?
In short, this is a C/C++/SystemC based simulation model of the wireless device that models all the components such as processors, graphic accelerators, peripherals, buses, memories etc. and also the testbench to provide stimuli to the wireless system. Parts of the system that are not necessarily needed to develop software are typically not modeled or abstracted, e.g., the RF front-end could be neglected and instead IQ data in the baseband domain is used.

The key is that such a VSP is fast and accurate to ensure that full software loads, e.g., complete protocol stacks and operating systems can be executed in a decent time. The accuracy is mandatory to verify whether the software fulfills real-time constraints, e.g., processing frames within the rates defined in a wireless standard. 
Achieving the Required Speed
Binary translation techniques are used to achieve the necessary speed of such a virtual simulation and the concept is to emulate the instructions of the embedded processor on the host machine,
Figure 1. Abstract view of a mobile terminal.
typically an x86 architecture. The goal is to use as few instructions as possible on the host machine in order to emulate the embedded CPU to achieve high simulation speeds. As the simulation speed depends on many factors, such as the performance of the host machine, complexity of the system, detail of modeling etc., the simulation speed may vary a lot - from slower than real-time up to faster than real-time. Typically, 1 MIPS is accepted by the software developer community as a minimum speed, and software running mainly in cache can easily break the 50 MIPS barrier using binary translation techniques.

Now as the complete wireless system with all its components and stimuli is in place and runs almost as fast as the real hardware, the software can be loaded, executed and, even more important, debugged using the same 3rd party tools that would be attached to the silicon.

Of course, the same compiler and debugger tool chain as for the traditional flow is used to develop the embedded software and the virtual system prototype loads exactly like this target (binary) image. This means that once the real silicon is taped out, the software image runs immediately without further porting efforts, which is also due to the fact that the simulation is cycle-accurate and thus behaves as the silicon.
Meeting Real-time Constraints
In debug mode the 3rd party debuggers such as Lauterbach's Trace32, GreenHill's Multi5, ARM's RealView etc. are launched and attached to the simulator using a debugging interface. These are also in the scope of standardization bodies, and the Multi-Core Debug (MCD) interface is one which is being standardized under the umbrella of SPRINT.

Figure 2. VSP setup for 3G modem protocol stack development.
Debug in End Packages Now Possible
By Stephen Lau, Product Manager for Emulation Technology, Texas Instruments

When and how you debug an embedded System on Chip (SoC) wireless device is changing. This is driven by an expansion of when debugging can be done. Previously, debugging was only done in a product development environment where the SoC is exposed on a target PCB. With strong time to market pressure decreasing product development schedules, there is an increased need to debug SoC while in the final product. Debugging SoC while they are in the final product package also allows for debug and optimization of existing deployed products.
The proposed Mobile Industry Processor Interface (MIPI), Narrow Interface for Debug and Test (NIDnT) port and IEEE 1149.7 standard make debug in end product packages possible. MIPI NIDnT aims to bring debug, trace and test capabilities to the final product by reusing existing external interfaces. For example, the debug interface could be shared with an external memory port such as the micro SD card interface.

IEEE 1149.7 is a new debug standard which preserves industry investment while providing additional features. IEEE 1149.7 is on track for ratification in the first quarter of 2009. It allows debug connections with fewer pins than the IEEE 1149.1 (JTAG) standard, yet can provide additional functionality. IEEE 1149.7 also provides for the transport of information through a background data channel, thereby increasing the utility of the two pins used. IEEE 1149.7 has features which are beneficial to embedded SoC in a wireless device context. Along with reduced pin-count requirements, IEEE 1149.7 allows for operation in new connection topologies, making stacked die and multi-chip modules easier to build. It also improves on JTAG by specifying power scenarios for the debug logic. This aids the SoC in further reducing power consumption.
The look and feel of this setup is the same as if the debugger is attached to the real hardware and, of course, all advanced features are supported. These include instruction tracing, stepping back in time, function profiling, operating system aware debugging, monitoring events correlated to function calls, e.g., number of cache hits and misses encountered during a function call. It is obvious that these debugging features are used to optimize the embedded software with regards to runtime to meet real-time constraints. Cycle-accuracy and modeling of all use case dependent details is key for VSPs so that they can be used for pre-silicon software development.

An example is when the layered protocol stack needs to process 3G frames in time. Imagine the underlying virtual processor model (VPM) does not implement cache behavior, but those that are contributing most to an increase of software execution times as a miss would imply a cache line fill usually done by a burst transfer over the instruction bus. It takes significant time to request the bus, getting the bus granted after the arbitration took place and finally fetching bytes over the bus. These additional cycles dominate the software execution time and may violate the real-time constraints when the software is not yet mainly executed out of cache. It is obvious that a fully arbitrating cycle-accurate bus is a key element to the overall timing as well. Imagine the bus is used and locked by another master or a second processor in a multi-core system and the pure functional model does not handle arbitration at all.

Pure functional VSPs and especially processor models overcome the lack of modeling details by switching back to a more accurate but slower model, for example, to get the important timing information caused by a cache miss. Sometimes these simulations are called Hybrid Simulation, as multiple models of a single processor are involved and also have to be maintained.
Optimizing the Software Architecture
With an accurate and fast solution in place, the software architecture could be optimized early in the design cycle and, in addition, the software developer could even suggest hardware architecture changes that would improve his SW execution times and even lower the power consumption of the mobile device.

The latter is an important design constraint and is correlated to the activities in the processors, on the buses and in the peripherals. The VSP could be used to detect
Virtual Prototypes: the Debugging Panacea
By David Kleidermacher, Chief Technology Officer, Green Hills Software, Inc.

In recent years, simulators designed for embedded software development have reached a new capability level due to advances in virtualization technology and the speeds of host PCs. Similar technologies used to implement virtualization in the data center are now used to make these "virtual prototypes" faster. This, in turn, enables software developers to simulate more complex applications, in some cases, the same complete system that will run on the physical hardware target. A number of companies, including Green Hills and Synopsys, now supply these high speed virtual prototyping environments to software developers, enabling them to develop and integrate software before hardware is available or when it is in short supply, saving time to market.

Hardware model standardization is currently an area of considerable activity, and a key industry goal is to enable virtual prototypes to interoperate with models of varying implementation (RTL, System C, "native" C, etc.) from varying vendors. The combination of technology advancement and ubiquitous standards translates to a near-future world of powerful virtual prototyping platforms that will become the de-facto choice for software development.

Virtual prototypes also present the ideal debugging environment. The prototype is omniscient with respect to system operation, and the host platform can be used to store a history of execution that enables developers to go back at their leisure to replay complex system behavior and easily locate the source of bugs and inefficiencies. These software-centric debugging capabilities are spilling over to the hardware developer. Some virtual prototype debuggers can handle the SystemC and RTL source code itself. The debugger provides hardware-language awareness (e.g. SystemC's ports, signals and modules), and it can debug multiple hardware contexts in parallel. In essence, the software developer is benefiting from the hardware developer's appreciation of modeling while the hardware developer is benefiting from the appreciation of state-of-the-art source code debugging tools.

In essence, the software developer is benefiting from the hardware developer's appreciation of modeling while the hardware developer is benefiting from the appreciation of state-of-the-art source code debugging tools.
power trends early in the design phase. This is simply done by weighting simulation events with coefficients. It is possible to assign a cache miss event with a power coefficient higher than normal cache resident execution, as the fetch over the bus will increase the power consumption by triggering additional gates on the silicon. With a fast and accurate VSP solution, all other contributing events could be detected and weighted, e.g., idle states of processors, normal execution, access to external buses, clocked down or switched off domains.
VSP Benefits
Additional benefits of a VSP include better visibility and debuggability. Once the simulation is paused with a debugger to inspect system registers, states and signal levels, the whole simulation is paused. That means that all the other components are stopped. A second processor would not continue to execute code and possibly write data into shared memory locations that are currently debugged, or FIFOs would not fill up and hinder narrowing down issues.

As with a C/C++/SystemC based VSP all objects of the simulation could be accessed and traced at any time. It is also possible to use high-level debugging techniques - for example to find out the 3G frame that causes the largest CPU load based on the data rate chosen in the High-Speed Downlink Packet Access (HSDPA) mode.

This is easily possible as a VSP offers non-intrusive analyzing capabilities. For the example above, it is possible to setup statistic counters that measure the load and are reset when the periodic interrupt service routine is called with the start of each frame. These processor load analyzers are setup in a way that the operating system idle loop instructions are excluded from the load calculation. Another example is that there are no limitations as to what further events are listed in the function profiling table. It is already common to list the function profiling table with cache statistics, bus activities etc. And with a virtual solution, this view can be extended with hit and miss rates of the customized L2 cache that is part of the SoC.
Using Host Resources
Running the complete system as a virtual simulation on a PC also offers the capability to use host resources for debugging. The target software could use print to output messages, file IO and pipes to communicate to other applications immediately within the embedded target software. Those features can be easily used; however, the drawback is that they are intrusive and have an impact on the software execution time and need to be removed in the final production code.

Other host resources could be used as well. For example, the host USB port within the VSP can be used to emulate the connection of the wireless device to serial devices.

The above features have been used in production of a semiconductor vendor to develop and debug a 3G protocol stack shipped with the silicon. The VSP was hooked up to a 3G tester running on a separate PC via Ethernet. It was possible to initiate a call and debug issues using this setup.

In Figure 2, the VSP setup for 3G modem protocol stack development illustrates the setup, and the software based 3G tester had an interface to provide the application processor directly with the 3G frames allowing the baseband processing to be abstracted away in order to further speed-up the simulation.

Jeff Roane is vice president of marketing for VaST Systems Technology Corporation, 1250 Oakmead Pkwy., Sunnyvale, CA, 408-328-3300,