Next: Removing Source Redundancy Up: Video Coding: Principles and Previous: Introduction

Digital Representation of Video Information

From the concept of the first systems storing and transmitting digital visual information, some constraints and assumptions were identified. First, the scene is restricted to a finite two-dimensional image size, sampled at a specified frame rate. This introduces the concepts of temporal and spatial resolution. Obviously, the former is chosen to be sufficient to preserve smoothness in the reproduction of scene movement with respect to the human visual system, usually in the range of 24-30 frames per second (fps). The spatial resolution determines the physical size and total number of the picture elements, or pels, that compose a still frame.

Second and last, there is a finite colour depth, allocating a number of bits for each pel, according to the desired display quality. Technically speaking, this determines the sampling rate used to capture the light frequencies of a visual scene. These are represented in various ways, using sets of component signals, which can be combined to synthesise any colour. One of the most frequently used, inspired by the way the human visual system senses light frequencies, is the red, green and blue primary colours representation. It has been shown that 8 bit unsigned numbers suffice for the representation of each of the primary colours, as the average human viewer cannot discriminate more distinct colours.

Other colour coordinate systems exist, usually a result of a linear transform from the RGB system. One of the most practical separates the image brightness information, or luminance, from the colour information, or chrominance. This is accomplished by taking a weighted sum of the RGB signals as the luminance signal, Y, and using colour differences for the chrominance components, C_b and C_r, as shown with the transform pair in (2.1). This colour space is specified in the CCIR 601 recommendation for digital video, yielding an effective range for Y from 16-235, and for the chroma signals C_b, C_r from 16-240.

$\begin{gather} \begin{bmatrix}Y \\ C_b \\ C_r \end{bmatrix} = \begin{bmatrix}0.2... ...cdot \begin{bmatrix}Y-16 \\ C_b-128 \\ C_r-128 \end{bmatrix} \notag \end{gather}$

The partitioning to luminance and chrominance has the advantage that the human eye is more sensitive to the former, and can tolerate lower spatial resolutions of the latter without noticeable effect. Therefore, for digital video coding, each of the chrominance signals is usually subsampled with a ratio of 1:2 or 1:4 to the luminance sampling rate, as shown in Figure 2.1.

**Figure 2.1:** Sampling patterns for (a) 4:2:0 and (b) 4:2:2 formats
$\begin{figure}\centering\epsfig{file=sample.eps,width=5in}\end{figure}$

Next: Removing Source Redundancy Up: Video Coding: Principles and Previous: Introduction

Isaac Kokkinidis
1998-08-27

Hosted by www.Geocities.ws