The USC Andrew and Erna Viterbi School of Engineering USC Signal and Image Processing Institute USC Ming Hsieh Department of Electrical and Computer Engineering University of Southern California

Technical Report USC-SIPI-354

“System Level Fault Tolerance for Motion Estimation”

by Hyukjune Chung and Antonio Ortega

June 2002

Widespread deployment of multimedia applications is continuing to create a need for low cost chip designs that can be incorporated into all sorts of consumer devices, such as portable communicators or home appliances. This surge in the use of digital multimedia information has been enabled to a large extent by the emergence of standard algorithms for compression of speech, audio, images and video.

In this proposal we will focus on digital video systems, as they are the most challenging in terms of both computation and memory requirements, but many of the results of our work will be easily applicable to other types of media (including emerging ones such as stereo and multiview video), as well as to future compression standards.

A common characteristic of all compression standards is that they rely on lossy compression, that is, the decoded video is not an exact copy of the original. A key in the design of a successful video compression system is therefore to apply compression techniques in such a way that the error introduced (the distortion in the output video) at a given rate (number of bits used to represent the sequence) is perceived to be minimal by the end user. Knowledge of which types of compression distortion is more clearly perceived by the end user obviously plays an important role in the successful design of these compression algorithms, and will also be an important component of our research work.

We propose that since multimedia compression systems by definition do not provide an exact representation of the data it is possible to allow circuits to produce incorrect computation results, and yet result in a fully usable decoded sequence. However, the error tolerance will depend on the particular algorithmic building block under consideration. Thus, our main task and deliverable in this proposal will be to produce a complete characterization of the error resilience of a standard image/video coding system, such as one based on the JPEG/MPEG-2 standards or the soon to be finalized JPEG 2000 standard for image compression.

We again emphasize that while we will consider JPEG, MPEG-2 and JPEG 2000 as examples, their basic building blocks form the basis for essentially all practical image/video coding algorithms. Moreover, similar approaches are used for popular audio and speech coding schemes. Thus our conclusions should be similar.

We also note the similarity of the problem we consider to that of analyzing the robustness to bit errors of coded video, e.g., when transmitting over noisy links. However, the characteristics of these errors are very different from standard random bit errors and therefore novel analysis techniques will have to be developed.

We propose to take a standard MPEG-2 encoder as our starting point and consider one by one each of the building blocks in such a system, such as motion estimation/compensation, Discrete Cosine Transform (DCT), quantization, entropy coding, along with analysis of the behavior of the various memory buffers involved.

Each of these compenents has completely different error robustness properties, and our goal is to achieve a complete and detailed characterization of each of them, of their interaction. Unlike the current paradigm (where circuits either function according to spec or not), we seek to determine the degree to which they do work, and in particular to do so by examining the final perceptual image/video quality. In addition to our desired error resilience taxonomy we will also such issues as error masking (using programmable features to reduce the effect of errors ) and error detection (defining specific image tests to determine the presence of certain sources of error).

To make things more concrete consider motion estimation and inverse DCT computation. A motion estimation algorithm determines the best match for a block (typically 16 by 16 pixels) in the current video frame, by searching a predifined region in a previously transmitted frame. This search is performed by computing a matching metric for all blocks in the search region, then choosing the lowest matching metric. Assume that the circuitry that computes the metric is flawed. If that is the case, the matching metric can no longer be expected to be exact. However, if the error is confined to only one or two pixels per block, the result, i.e., the best match vector, may in fact turn out to be exactly the same as if the matching metric was computed correctly. Moreover, even if the errors are more serious and affect several pixels, the resulting coded stream will still be correct and decodable. The loss in this case will not be catastrophic. Rather, we expect that as the number of erroneous pixels increases the quality of the encoded sequence will suffer, but will do so in a gradual way. This is exactly the kind of behavior that makes it possible to dramatically increase the yield, with the possibility of selecting those chips with the worse behavior for less critical applications.

As a second example, the IDCT computation itself can be fairly error resilient. For example a frequency may be incorrectly read and the resulting inverted matrix of pixels will not be an exact match of that generated by the encoder. Here again, depending on the frequency and the type of fault, the resulting erroneous block may be completely indistinguishable for the end user. However the IDCT also points to another characteristic of video coders that will have a definite impact on their fault tolerance, namely, the fact that predictive coding (using motion compensation) is used. In a predictive coder the IDCT is used to store a copy of a frame that is then used at both encoder and decoder to generate a prediction. If the encoder and decoder are not in sync (because the decoder suffers from a fault in the IDCT operation for example), then the error will tend to accumulate and may soon become visible, although it will then disappear once the frame is fully refreshed (a complete intraframe coded frame is transmitted). The success of our system then requires a complete characterization of both the impact of a fault in the current frame and also its effect on successive frames.

To download the report in PDF format click here: USC-SIPI-354.pdf (6.5Mb)