The output of video encoders originally designed for high bandwidth wired connections can present challenges for a wireless network.

By Joe Hanson, Stretch Inc.

Advances in digital technology and cost reductions of digital video equipment are paving the way for the adoption of video in a host of applications from surveillance to video telephony. In many cases, however, the digital transmission must traverse networks with constrained bandwidths such as wireless links. In these cases, to achieve

click to enlarge

Figure 1. GOP Structure of I And P Frames.
high quality of service the installer must first understand video transmission requirements and design a system to accommodate them. Such accommodations might include the selection of an appropriate video compression scheme and provision of buffering within the system in order to absorb burst data requirements. This article will detail the characteristics of different video CODECS and explore how they can be applied within the specific requirements of a surveillance system.

Video Compression and Surveillance Systems

Video consists of a series of still images displayed one after the other, sufficiently rapidly such that the human eye perceives a single moving image. The fundamental challenge of streaming digital video over a wireless network, or any bandwidth-limited network, is the huge volume of data represented in the stream. A single frame of standard definition video, 720 × 480 pixels at 30 frames per second (fps), has over 345,000 pixels per frame. Each pixel requires Red, Green, and Blue values to describe its color, so serializing data for broadcast over an Ethernet network requires over 200 megabits per second of bandwidth. Using one of the newer generations of high-resolution sensors can easily exceed the capabilities of Gigabit Ethernet. Clearly, some form of compression needs to be applied to the video before transport to make it more manageable. This is the role of a video coder/decoder (CODEC).

A number of video compression standards are deployed today and new standards are in development. It is important to understand that a compression standard defines how to decode a compressed stream, not how to perform the encoding. This means that two implementations of the same standard will not return the same compression ratio, provide the same image quality, or constrain the bit rate with the same limits. Not all CODECs are created equal. Further, the requirements placed on CODECs for surveillance applications differ from those of the familiar world of broadcast television and consumer camcorders. In broadcast encoders and in streaming media applications, the receiver can begin by buffering multiple seconds of data to mask out any network traffic delays. In surveillance applications, security personnel require real-time detection and notification of events such as perimeter intrusion, and might need to direct a pan/tilt camera to follow an intruder. Any delay, buffering or loss of network traffic renders the system useless.

Intra-Frame Compression Schemes

In its most basic form, video compression treats each frame as a still image and compresses it much as a digital still camera would. These compressed frames are called intra-frame or I-frames. The advantage of these types of CODECs is that their algorithms are relatively easy to implement and they exhibit narrow bit-rate fluctuations from frame-to-frame. Their disadvantage is that they exhibit low compression ratios. For standard definition video, the encoder transmits an acceptable level of quality video in the order of 30 Mb/s. This technique is at the heart of M-JPEG and MPEG-2 (I-frame only) compression. For surveillance applications, M-JPEG is used for low frame-rate applications or low-resolution applications that can tolerate the low compression rate.

Inter-Frame Compression Schemes

Inter-frame compression schemes exploit the fact that within a video stream very few changes occur between adjacent images. After the first key frame of video (I Frame) has been compressed, each subsequent frame is compared to its predecessor and only the changes are encoded. Frames encoded with reference to previous frames using difference information are called Predicted or P-frames. The use of P frames results in a dramatic improvement in compression efficiency at the expense of a more complex encoding algorithm.

I and P frames are sent in a structured video stream known as the Group of Pictures, or GOP. Figure 1 shows the relationship of I and P frames in a GOP structure.

For inter-encoded algorithms, bit rates for acceptable quality video can drop to 1.5 to 4 Mb/s from the 30 Mb/s required for M-JPEG. For inter-frame compression standards, the output bit rate is normally expressed as an average over time, not the instantaneous bit-rate. AnI -frame might require five times the bits to encode as a P-frame. A GOP structure of 1 I-frame every 0.5 second implies the I-frame uses 26% of the available bit stream and results in a burst in network bandwidth requirement twice per second. Figure 2 shows the burst nature of the bandwidth usage.

In severely bandwidth-limited systems, this burst in data might result in packet loss and corruption of the decoded I-Frame (the frame that would be used as the basis for the subsequent decode of the

click to enlarge

Figure 2. Instantaneous vs. Average Bandwidth of Compressed Stream.
entire GOP structure). Clearly, when designing a transmission system, the frequency of I-Frames must be carefully considered. In some cases, the behavior of the encoder might also aggravate the situation. When performing motion search, the encoder will analyze the degree of motion in the scene (the degree of similarity with previous images) and might decide that another I-Frame is needed. This behavior of creating variable GOP structures can result in highly variable bandwidth requirements and, when used over bandwidth constrained networks, non-deterministic video quality at the decoder.

Solving these types of issues requires a holistic approach to the problem. The addition of buffering in the network might help reduce the instantaneous bandwidth requirements but results in added system latency. Changing CODEC parameters to fix I-Frame frequency or using a higher efficiency CODEC can help also, but at the expense of video quality or processing requirements. Depending upon the market targeted by the CODEC designer, these controls might not even be exposed to the system integrator. Sizing the bandwidth requirements for a wireless network of security cameras requires consideration of image resolution, the frame rate, the video quality, and the video CODEC. For an MJPEG encoder at any given resolution, bit-rate per frame will be relatively stable for a given video quality level. The capacity can be determined based on the bits per frame. For the inter-predictive encoders, the answer is not so simple. The designer needs to consider the GOP structure (i.e., frequency of I-frames) to determine the probability of concurrent I-frames from multiple cameras saturating the bandwidth. As the number of installed cameras increases, the probability of two devices simultaneously requiring bandwidth for the transmission of an I-Frame goes up. This increases the potential for a delay in or loss of video data resulting in non deterministic video quality. For video transmission, this is clearly unacceptable. Fortunately, advances in video CODEC technology could help the problem considerably by exhibiting less variability in bandwidth requirements between Inter and Intra predicted frames and by degrading video quality more gracefully in the presence of packet loss or corruption.

The Future of Video CODECs for Surveillance

The next generation of scalable video CODECs offers a set of characteristics that promise to ease the burden on surveillance system designers and operators. The Scalable Extension to H.264, known as H.264/SVC, gives the CODEC an output that is scalable temporally, spatially, and in terms of image quality. H.264/SVC makes more efficient use of network bandwidth and its output is more easily predicted. The layered approach taken by H.264/SVC also makes it more resilient to packet loss and results in a higher quality video stream.

As designers plan for future generations of video surveillance systems, they face a difficult set of challenges. Advances made in video compression technologies will continue to find a home in surveillance. These new generations must also retain backward compatibility with existing installations. Flexible software defined architectures such those offered by Stretch, will be key to rolling out these new technologies. The current generation of intelligent CODECs has the ability to adapt their bandwidth requirements as a function of both the video stream and the network characteristics. Predictable bandwidth requirements result in more stable systems and better overall video quality. The performance of software-configurable architectures will allow future generation CODECs to shape their bandwidth consumption to provide error resilient streams that scale both spatially and temporally. The result of these advances will be a quantum leap in the feature sets and the performance available to system integrators. By using built in intelligence, cameras will be able to automatically adapt their performance to optimize bandwidth requirements, quality and system stability and ease the burden currently placed on system integrators. About the Author

Joe Hanson is director of technical marketing for Stretch, Inc., 1322 Orleans Dr., Sunnyvale, CA 94089; (408) 543-2700;