The USC Andrew and Erna Viterbi School of Engineering USC Signal and Image Processing Institute USC Ming Hsieh Department of Electrical Engineering University of Southern California

Technical Report USC-SIPI-429


by Sudeng Hu

May 2016

Object quality assessment for compressed images and videos is critical to various image and video compression systems that are essential in the delivery and storage. Although the Mean Squared Error (MSE) is computationally simple, it may not be accurate to reflect the perceptual quality of compressed signals, which is also affected dramatically by the characteristics of Human Visual System (HVS) such as masking effect. In this thesis, first, video quality metrics are developed based on machine learning approaches. Due to the complicated relationship among a large number of factors, machine learning is used to build a proper model for various features including the distortion features and video content features. Second, an image quality metric (IQM) and a video quality metric (VQM) are proposed based on perceptually weighted distortion in term of the MSE. To capture the characteristics of HVS, for images, a spatial randomness map is proposed to measure the masking effect and a preprocessing scheme is proposed to simulate the processing that occurs in the initial part of human HVS. For the VQM, the dynamic linear system is employed to model the video signal and is used to capture the temporal randomness of the videos. The visual attention is included in the proposed VQM as well, since only a limited parts of details are perceived with high sensitivity while the other parts are significantly blurred in the HVS. The performance of the proposed IQM and VQM are validated on various image and video databases with various compression distortions. The experimental results show that the proposed IQM and VQM outperforms other benchmark quality metrics. In addition to the quality assessment, video compression is also important in the system of video delivery and storage, especially different kinds of video content emerging in recent industries such as screen content and 3-D videos. These video formats have very different characteristics from the traditional videos. In this thesis, first, we propose a coding method that is able to code the content with sharp edges efficiently. Such method is highly valuable for the screen content coding and depth map coding of 3-D video. Second, a RD optimized bit allocation scheme is proposed for 3-D videos. In 3-D videos, there are multiple views and each view contain two types of video, i.e., texture map and depth map. The proposed bit allocation method could properly allocate bits among different views as well as between different maps. The experimental results also verify that proposed bit allocation outperform the benchmark algorithms in terms of RD efficiency.

To download the report in PDF format click here: USC-SIPI-429.pdf (9.2Mb)