The USC Andrew and Erna Viterbi School of Engineering USC Signal and Image Processing Institute USC Ming Hsieh Department of Electrical and Computer Engineering University of Southern California

Technical Report USC-SIPI-438

“Compression of Signal on Graphs with the Application to Image and Video Coding”

by Yung-Hsuan Chao

December 2017

In this Ph.D. dissertation, we discuss several graph-based algorithms for transform coding in image and video compression applications. Graphs are generic data structures that are useful in representing signals in various applications. Different from the classic transforms such as Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT), graphs can represent signals on irregular and high dimensional domains, e.g. social networks, sensor networks. For regular signals such as images and videos, graphs can adapt to local characteristics such as edges and therefore provide more flexibility than conventional transforms. A frequency interpretation for signal on graphs can be derived using the Graph Fourier Transform (GFT). By properly adjusting the graph structure, e.g. connectivity and weights, based on signal characteristics, the GFT can provide compact representations even for signals with discontinuities. However, the GFT has high implementation complexity, making it less applicable in signals of large size, e.g. video sequences. In our work, we develop a transform coding scheme based on a low complexity lifting transform on graph. More specifically, we focus on two important problems in the design of a lifting transform, namely, the design of bipartition and the bipartite graph approximation. The two parts are optimized in terms of energy compaction for Gaussian Markov Random Field (GMRF), which has been widely utilized in modeling the statistics of image data. As application, we consider two types of multimedia signals, including both regular and irregularly distributed signals. Among the first type of signal, we consider the compression of intra-predicted video residuals, which is regular with pixels residing on the 2D grid. However, these signals contain significant edge structures, which cannot be efficiently represented with existing transform coding standards. With the proposed graph lifting transform based on local edges, we demonstrate significant gains as compared to the state of the art DCT based coding, with comparable performance to that achieved by the high complexity GFT. We also discuss different types of edge models for video residuals and propose a new model for ramp edges, which shows promising results in GFT, as compared to the conventional step edge model. As a second type of signal, we propose a coding scheme for non-demosaicked light field images. Similar to the traditional digital camera, a light field camera captures color information using a photo sensor embedded with a color filter array (CFA). On the captured image, each pixel contains one single color component (out of R,G, and B) which are distributed based on Bayer pattern. However, through the conversion to an array of sub-aperture images, which is a representation commonly used for light field processing and display, the distribution of Bayer pattern no longer holds and pixels of each color component are distributed irregularly in space. In order to compress such data, a conventional scheme using DCT requires demosaicking during conversion, which highly increases the amount of data for coding. With a graph based approach, the original signal can be efficiently encoded without any pre-processing step, avoiding the redundancies introduced by demosaicking. We also discuss an intra-prediction algorithm and optimal graph construction for irregularly spaced pixels. The results using the proposed scheme with graph based lifting transform show huge gains in compression as compared to DCT based coding in high bit rates, which are critical for archival scenario and instant camera storage.

To download the report in PDF format click here: USC-SIPI-438.pdf (6.5Mb)