Multiview Video Compression With 1d Transforms

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Video compression is a method used to reduce the size of video data while keeping as much of the

original quality as possible.Video compression techniques benefit from the redundancies in the video

signal.Exploiting the redundancies, a local region in the current frame is predicted from previously

coded parts of the current frame or previously coded frames.The di_erence between predicted and

original values is named as prediction residual.In many cases, prediction is not accurate enough, and

the prediction residual is also coded.Prediction residual also has some spatial redundancy.To exploit

this redundancy, prediction residual is transformed.The main focus of this thesis is the transformation

of the prediction residuals.

Typically, 2-D Discrete Cosine Transform (DCT) is used both in image compression and compression

of prediction residuals. 2-D DCT is very e_ective at compressing of the images.On the other hand,

images and prediction residuals have di_erent spatial characteristics, and more e_cient transforms can

be used for the compression of the prediction residuals.

One of the redundancies in the video signal is the temporal redundancy.This type of redundancy is

resulted from the similarities between two adjacent frames.To get rid of this redundancy, current frame

is predicted from previously encoded frames.This method is named as motion compensation prediction.

Previously, spatial characteristic of the motion compensation residual has been investigated.By

referring to these analysis results, 1-D directional transforms have been designed for compressing of

the motion compensation residual, and it has been demonstrated that using 1-D directional transforms

for the compression of the motion compensation residual increases overall compression e_ciency.

1.1 Motivations for Thesis

1.2 Overview of Thesis

1

CHAPTER 2

PREVIOUS RESEARCH

This chapter provides a brief background on video compression and discusses previous research related

to this thesis.Section 2.1 presents basics of video compression and H264/MPEG-4 AVC video

compression standard with its MVC extension. Section 2.2 provides analysis results of MC residual

and introduces the 1D transforms used for compression of the MC residual.

2.1 Video Compression

2.1.1 Overview Of Video Compression

In today’s world, there is growing need for transmission and storage of videos.Videos are compressed

before transmission and storage since raw video has an excessive amount of data. For example, 24fps

progressive raw video in size of 1280x720 with 24 bits color depth requires 1.59 Gbps transmission

speed, and 5 minutes of this video requires 59.72 GB memory[10].

Video compression can be defined as representation of the video data using fewer numbers of bits.Video

compression techniques are based on exploiting redundancies in the video such as spatial, temporal

and interview redundancies.

The correlation between neighboring pixel values within the same frame is the source of spatial redundancy.

This type of redundancy can be reduced using discrete cosine transform or wavelet transforms.

These transforms are also used in image compression to exploit spatial redundancy. Intra prediction

is another way of reducing spatial redundancy and this method is used in recent video compression

standards. In this approach, each block is predicted from previously coded neighboring blocks within

the same frame and the prediction residual is transformed.

Temporal redundancy arises from the similarities between two adjacent frames since in typical sequences,

adjacent frames di_er slightly. The correlation between two consecutive frames can easily be

seen in Figure 2.1. To reduce this type of redundancy, most video coders predict the local region of the

current frame from previously encoded frames and this technique is named as motion compensation.

In this approach, generally translational motion is assumed.Motion compensation produces e_ective

results, especially in stationary and slowly moving smooth regions. On the other hand, in unsmooth

regions such as edges and texture regions, this technique can produce large prediction errors.

2

(a) View 0 - Frame 15 of the exit video sequence (b) View 0 - Frame 14 of the exit video sequence

Figure 2.1: Frames which show the correlation between two adjacent frames.

(a) View 0-Frame 15 of the exit video sequence (b) View 1-Frame 15 of the exit video sequence

Figure 2.2: Frames which show the correlation between neighboring views.

In multiview video coding, a scene is recorded from di_erent perspectives with closely placed cameras.

The images captured from these cameras at the same time are highly correlated. Frame 2.2

shows the correlation between two views. This type of redundancy is called interview redundancy,

and it is similar to temporal redundancy. The correlation between frames is reduced using disparity

compensated prediction, which is also similar to motion compensated prediction. In disparity compensated

prediction, the current frame is predicted from a previously coded frame in a neighboring view,

whereas in motion compensated prediction, the current frame is predicted from a previously coded

frame in the same view. Like motion compensated prediction, disparity compensated prediction works

well in stationary and smooth moving regions and can produce significant amount of prediction errors

in unsmooth regions.

In this section inter-view, inter-picture and intra-picture redundancies and predictions are explained,

but this thesis is mainly focused on inter-view and inter-frame predictions. In video compression,

besides prediction information, prediction errors also need to be coded. Especially in the regions

where inter-view and inter-picture prediction produce significant amount of errors, coding of residuals

becomes important.

2.1.2 Overview of H264/MPEG-4 AVC

In implementation step of this thesis, H.264/AVC codec is used.H.264/AVC is the newest video coding

standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts

Group[9].In this part, steps of video compression with H.264/AVC are mentioned shortly.

3

In general, a video compression system consists of an encoder and decoder.H264 syntax does not

include encoder steps, but in practice, encoder process is the mirror of the decoding process. Figure

2.3 represents the encoding and decoding steps [8].

Figure 2.3: The H264/MPEG-4 AVC coding and decoding process.[2]

Before the prediction step, a frame is split into macro blocks of 16x16 pixels. In the prediction process,

each macroblock is predicted from previously coded parts of the current frame (intra prediction) or

previously coded frames (inter prediction). The di_erence between the prediction of a macroblock and

the original macroblock is the residual. Prediction process does not have to be performed on the full

macroblock. Intra prediction can be made on 16x16 or 4x4 blocks, and inter prediction can be made

on 16x16, 16x8, 8x16, 8x8 blocks where 8x8 blocks also can be divided into 4x4 blocks. While frames

coded using intra prediction are named as I Frame, frames coded using intra and inter prediction are

named as P and B frames. The di_erence between B and P frame is that B frame can be predicted from

two frames and P frames from only one.

After the prediction step, residuals are transformed using 8x8 or 4x4 2D DCT and coe_cients, which

are the output of the transform, are quantized. As the last step of the coding process, quantized coefficients

are encoded using variable length coding or arithmetic coding. Quantized coe_cients are not

only data needed to be encoded; all other data that decoder needs to reconstruct the block, such as the

reference and block size of the estimation, are also encoded.

At the decoder side, information sent as an H.264 stream from the encoder is decoded. Quantized

transform coe_cients are re-scaled and with the inverse transformation of these re-scaled coe_cients,

residual macro block is obtained. Using intra prediction or inter prediction, decoder predicts the macro

block and adds residual data to reconstruct the decoded macro block [2].

2.1.3 Extending H264/ MPEG-4 AVC For Multiview

Due to the increasing interest in multiview video, ITU-T Video Coding Experts Group and the ISO/IEC

Moving Picture Experts Group has published multiview video coding (MVC) technology as an extension

of H264/ MPEG-4 AVC that presents the technique for the compression of multiview videos.

Because MVC is an extension of H264/ MPEG-4 AVC, in this section, all encoder and decoder pro-

4

Figure 2.4: Sample prediction structure for MVC.[3]

cesses will not be repeat, only basic di_erences of MVC from H264/ MPEG-4 AVC will be explained.

In this thesis, the term MVC is used for the extended version of H264/ MPEG-4 AVC.

The main aim of MVC is to increase multiview compression e_ciency using redundancy between

views. For this purpose, it enables inter-view prediction. Figure 2.4 shows a sample prediction structure.

MVC design includes one “base view“ coded independently of other views, and this view can be

reconstructed using decoders which do not have multiview support. As expected, while coding base

view, inter-view prediction is not used.Frames are encoded using intra-frame prediction and interframe

prediction. The left view in Figure 2.4 is an example of base view. Inter-view prediction can

be used for coding views other than base view. In prediction step, the best reference frame is selected

from the candidate list composed of inter-frame and inter-view references. Consider the right view in

Figure 2.4. P frame in right view is predicted from I frame in left view using inter-view prediction.

First B frame in right view is predicted from first B frame in the left view and P frame in right view. In

MVC, each block in a B frame of the right view can be predicted either from a previous frame in the

right view or from the adjacent frame in the left view.

2.2 Coding of MC Residual

2.2.1 Motion Compensation

Motion compensation is used to reduce bitrate by using the correlation along the temporal dimension

in the video signal. This technique is based on predicting a local region of the current frame

from previously encoded frames. Encoder estimates the motion between previously encoded frames

and current frame. Although there are many motion compensation algorithms, in this section, block

matching method will be discussed because it is used in H264/ MPEG-4 AVC.

In block matching algorithm, frame is divided into blocks, and for each block, the best matching block

from previously encoded blocks is determined. Estimated motion between current block and best

matched block is mostly referred as motion vector (MV).

Since all pixels in the block are predicted using the same motion vector, size of the block is critical

in block matching method. In H264/AVC, 16x16, 16x8, 8x16 and 8x8 blocks are available, and 8x8

blocks can be further divided into 8x4, 4x8 or4x4 blocks.

It is possible to make integer pixel and fractional pixel accurate prediction with the block matching algorithm.

Searching best matching block in original resolution gives integer pixel accuracy.Using interpolation

while searching the best matching block gives fractional pixel accuracy[12]. In H264/MPEG-

5

4 AVC, quarter-pixel accurate motion compensation is used[9]. Typically, fractional pixel accurate

motion compensation gives better results than integer pixel accurate motion compensation because

motion vector between two video frames is not generally integer multiples. The di_erence between

the two methods can also be seen in Figure 2.5.

Motion compensation can also be achieved using more than one previously encoded frame. In this approach,

predicted block is the weighted average of two predicted blocks which are predicted from

best matching blocks. This method is named as multihypothesis motion compensation[13, 14].In

H264/MPEG-4 AVC, multihypothesis motion compensation is available for B frames.

2.2.2 Empirical Analysis of the MC Residual

In many cases, motion compensated prediction is not accurate enough, and the motion compensated

prediction residual is also coded. To code the motion compensation residual e_ciently, the characteristics

of the residual have to be investigated. This section includes the empirical analysis of the MC

residual based on the study in [1]. Motion estimation residual has di_erent characteristics in smooth

regions, texture regions and edges or object boundaries. In the remaining part of this section, each

characteristic is examined one by one. In this investigation, a sample original frame, reference frame,

predicted frame and prediction residual frame in Figure2.6 are used.Original frame and reference frame

are frame 11 and 10 of Stefan sequence at CIF (352x288) resolution[11].

Generally, motion estimation algorithms are successful in predicting smooth regions such as smooth

backgrounds. Motion compensation residual of these regions is smaller than texture regions, edges

and object boundaries. This is because even if the motion between current and reference frames is not

translational, high correlation between pixels of the block to be predicted enables successful motion

estimation. Typically, residuals of smooth regions are not coded.Court floor in Figure 2.6 is a good

example of the smooth region. Prediction residual in the court floor is almost zero.

In Figure 2.6d, details of the texture regions of the original frame can be seen easily, and this means

that prediction is not quite successful in these regions. On the other hand, unlike original frame, only

high contrast components of texture regions are visible in residual frame; mean value of the texture

regions is predicted well.

Like texture regions, edges or boundary of the objects cannot be predicted well using motion compensated

prediction. Since generally motion between video frames is not translational and motion

prediction assumes translational motion, high magnitude motion compensation residuals present in

edges or boundary of the objects. These residuals have 1D characteristic. For example; humans and

letters transform into 1D structures after motion estimation.

In summary, characteristics of image and motion compensation residual are di_erent. In image, most of

the local region has 2D structures.Motion compensation residual has three di_erent characteristics.In

motion compensation residual, pixel values of many local region are almost zero. Residuals originated

from object boundaries and edges forms the significant portion of non-zero prediction residual, and

these residuals have 1D characteristics.Residuals of the detailed regions are similar to images except

the mean.

6

(a) Integer-pixel accuracy motion compensation residual.

(b) Quarter-pixel accuracy motion compensation residual.

Figure 2.5: Integer and fractional-pixel motion compensation residuals with 8x8 blocks.

7

(a) Current frame (b) Reference Frame

(c) Reconstructed Frame (d) Motion Compensation Residual

Figure 2.6: Frame 11 of Stefan sequence, its reference frame which is frame 10, reconstructed frame

obtained by 8x8 motion compensated prediction of frame 11 from 10 and prediction residual.

8

2.2.3 Auto-covariance Analysis of MC Residual

In the previous section, characterizations of MC residual are analyzed empirically. It has been mentioned

that while local regions of the image have 2D anisotropic structures, local region characteristic

of the MC residual is di_erent. This chapter reviews the statistical characteristics of the MC residual

based on the study in [1] and [4].

To analyze the characteristics of the images statistically, many models have been used. One of these

models is Markov-1 model. Stochastic processes whose conditional distribution depends only finite

number of past values have Markov property. If conditional distribution of the signal depends on the

only a single past value, the signal can be named as Markov-1 signal. Auto-covariance equation of

stationary Markov-1 signal is shown in equation 2.1.

C(I) = _jIj (2.1)

In the equation, I is the distance between the two points of which the auto covariance is computed,

and _ is the correlation parameter._ can take the values between 0 and 1.In [4], decorrelation transform

is calculated for this auto-covariance equation and this transform becomes DCT when correlation

approaches the maximum (_ ! 1).Using separable construction, auto-covariance equation of 2D stationary

Markov-1 signal can be obtained as equation 2.2[1].

Cs(I; J) = _jIj

1 _j jj

2 (2.2)

In this separable model, I and J represent horizontal and vertical pixel distances, _1 and _2 represent

the horizontal and vertical correlation parameters as in equation 2.1.Decorrelation transform of this

covariance equation is also 2D DCT when correlation reaches its maximum (_1 ! 1 and _2 ! 1) [1].

Correlation parameters for images are expected to be high because of the high spatial correlation

between pixels. Taking correlation parameters as _1 = _2 = 0:95 is an acceptable approximation for

typical images [4], and these analyses explain the success of the 2 D DCT in image compression.

MC residual has been also modeled with Markov-1 model and smaller correlation coe_cients than

0.95 have been found [5].This shows that decorrelating MC residual with 2D DCT is not e_ective as

decorrelating image with 2 D DCT.

In [1], MC residuals are modeled using a generalized auto-covariance model. This model is directionally

adaptive and provides an additional degree of freedom around axes with parameter _. The

generalized model is given by equation 2.3 [1]

Cg(_; I; J) = _jIcos(_)+Jsin(_)j

1 _jô€€€I sin(_)+Jcos(_)j

2 (2.3)

The aim of the generalized model is to capture local anisotropic features by rotation around axes.

When _ is taken as 0, separable model is obtained.

To compare separable model and generalized model, image andMCresidual are modeled by estimating

the model parameters for each 8x8 block as in [1]. Firstly, a non-parametric auto-covariance of each

8x8 block is estimated. For this purpose, the mean is removed, the auto correlation is found, and each

9

correlation element is divided by the number of overlapping points. Secondly, the parameters _, _1

and _2, which makes the mean square error between the non-parametric auto covariance estimate and

models minimum, are calculated. In this calculation _1 is taken as the larger correlation parameter, _

takes the values between 0 and 180.

In this thesis, original image and residual in Figure 2.6 are modeled with the separable model and the

generalized model as described in [1].Figure 2.7 shows modeling analysis results. In these plots, each

point represents a _1 _2 pair estimated from one 8x8 block.

Figure 2.7a and Figure 2.7b are obtained by modeling Figure 2.6a with separable model and generalized

model. In Figure 2.7a, positions of points are more scattered and fewer points have a _1 bigger

than 0.5 in compared to Figure 2.7b. These two observations mean that generalized model is more

successful than the separable model at modeling images. This is because modeling with less variety

of the parameters is an indicator of more e_cient image compression and having more points, which

have _1 bigger than 0.5 is an indicator of capturing high correlation more successfully.

Figure 2.7c and Figure 2.7d are obtained by modeling MC residual in Figure 2.6d with separable model

and generalized model.Comparison of Figure 2.7c and 2.7d gives the MC residual modeling capability

of separable model and generalized model. In the plot of separable model, points tend to have _1 and _2

smaller than 0.5. On the other hand, points tend to have _1 larger than 0.5 and _2 smaller than 0.5 in the

plot of generalized model. Contrary to separable model, generalized model catches the correlation on

MC residual along the direction of _1.This means that MC residual has 1D structures, and generalized

model can capture these 1D structures.

Comparison of Figure 2.7a and Figure 2.7c gives statistical di_erences between image and MC residual.

In the region where both _1 and _2 are smaller than 0.5, Figure 2.7a has more number of (_1,_2)

pairs than Figure 2.7a. The smaller correlation parameters are indicators of the smaller correlation

between neighboring pixels in MC residuals compared to images.Furthermore, in the region _1 is

larger than 0.5 MC residual plot has smaller _2 values.It means that MC residual pixels has less 2D

correlation compared to image pixels, MC residual pixels mostly have 1D correlation.

In summary, generalized model is generalized form of separable model, this model provides an additional

degree of freedom around axes.Due to an additional degree of freedom, generalized model is

more successful than separable model on capturing the correlation between pixels in both images and

MC residuals.Generalized model plots indicate that image and MC residual have di_erent characteristics.

Generally, image tends to have 2D structures, and MC residual tends to have significant amount of

1D structures.

10

(a) Separable Model, Image (b) Generalized Model, Image

(c) Separable Model, MC residual (d) Generalized model, MC residual

Figure 2.7: Plots of correlation parameters(_1 and _2) estimated using separable and generalized models.

Figure 2.7a and 2.7b are modeling plots of image in Figure 2.6a.Figure 2.7c and 2.7d are modeling

plots of motion compensation residual in Figure2.6d.

11

2.2.4 Direction-Adaptive 1D-DCTs

Analyses in section 2.2.2 demonstrate that residuals originated from object boundaries or edges form

a significant amount of the MC residual. In this type of residual regions, pixels are correlated in one

directional way.Statistical analysis in section 2.2.3 supports this observation.MC residuals are modeled

with generalized model in equation 2.3.Model plot shows that MC residual pixels are correlated in

one dimension.Hence, 2D DCT is not the optimum choice for decorrelating such regions of the MC

residual.

By considering these characteristics of the MC residual, directional 1D DCTs were designed in [1].

The aim of the design are to exploit 1D correlations in residual and code the residuals more e_ciently.

In this design, di_erent transform sets are proposed for 4x4 and 8x8 transforms. This study was

conducted by modifying H264/AVC. Sets of directional transform for 4x4 and 8x8 can be seen in

Figure 2.8 and Figure 2.9.

In H264/AVC, each macro block is coded using one of the motion compensation block sizes which

are 16x16, 16x8, 8x16, 8x8.Decision on the block size is made using Lagrangian- based rate distortion

optimization. Only 2D DCT is used for the transformation of the motion compensation residual in

standard H264/AVC[9]. In modified H264/AVC, MC residuals can be transformed using one of the

direction adaptive 1D DCTs or 2D DCT.2D DCT is still kept as an option because the 2D DCT is a

globally good transform and in many regions that do not have strong 1D anisotropic features 2D DCT

gives better results.

Experiments were conducted for three di_erent transform configurations in [1]. For each transform

configuration, results are taken for four quantization parameters: 24, 28, 32 and 36. In the first configuration,

the encoder uses 4x4 2D and 1D DCTs, in the second configuration, the encoder uses

4x4-8x8 2D and 1D DCTs, in the third configuration, the encoder uses 8x8 2D and 1D DCTs. The

output bitrate and PSNR values were compared with 4x4, 4x4-8x8 and 8x8 2D DCT cases respectively.

Bjontegaard- Delta (BD) bitrate metric is used to calculate the average gain. On average 4.1%, 11.4%

and 4.8% gains are achieved for 4x4, 4x4-8x8 and 8x8 cases.

Figure 2.8: Eight 4x4 1D transforms proposed in[1].

12

Figure 2.9: Sixteen 8x8 1D transforms proposed in[1].

13

CHAPTER 3

ANALYSIS OF DISPARITY COMPENSATION RESIDUALS

Investigation of disparity compensation residual characteristics is important to design better transforms

for this residual.This chapter mainly focuses on analysing the statistical characteristics of disparity

compensation residual.Section 3.1 gives brief information about disparity compensation for completeness.

Section 3.2 includes empirical analyses of disparity compensation residual.In section, 3.3 disparity

compensation residual is analyzed statistically.Both section 3.2 and 3.3 also include di_erences

and similarities between image and disparity compensation residual.In section 3.4, disparity and motion

compensation residuals are compared.

3.1 Disparity Compensation

In multiview video, same scene is captured with closely placed cameras.Multiview video coding(MVC)

aims to represent these videos captured by di_erent cameras with as few bits as possible.

The same point in an object is mapped to di_erent coordinates in videos captured from di_erent perspectives.

The di_erence between the coordinates is named as disparity.Disparity compensation focuses

on the correlation between the views to reduce the bitrate, and tries to estimate disparity for each

pixel or block.In this approach, the current frame is predicted from the previously encoded frames of

neighboring views.

MVC extension of H264/ MPEG-4 AVC is the multiview coding standart published by ITU-T Video

Coding Experts Group and the ISO/IEC Moving Picture Experts Group[3].In this thesis, sample codec

for MVC extension of H264/MPEG-4 AVC is used.The block matching technique that is used for interframe

prediction is also used for inter-view prediction in MVC extension of H264/MPEG-4 AVC[3].

As in inter-frame prediction, in inter-view prediction, frames are divided into blocks and the best

matching block is determined from previously encoded blocks. These two techniques di_er in terms

of the reference frame. In inter-frame prediction, reference frame is selected from the same view within

the current frame. On the other hand, in inter-view prediction reference frame belongs to neighboring

views.

Motion compensation and disparity compensation have the same block size options in MVC extension

of H264/ MPEG-4 AVC.16x16, 16x8, 8x16 and 8x8 blocks are available, and 8x8 blocks can be further

divided into 8x4, 4x8 or 4x4 blocks. As mentioned in section 2.2.1, the block size is important because

all pixels in the block are represented by the same motion vector.

As explained in section 2.2.1, searching best matching block in original resolution gives integer

pixel accuracy and using interpolation while searching the best matching block gives fractional pixel

14

accuracy[12]. As H264/MPEG-4 AVC, MVC extension uses the block matching algorithm with

quarter-pixel accuracy to make an estimation between current and reference frames[3]. Compared

to integer pixel accurate block matching algorithm, quarter pixel accurate one gives better results because

the disparity between two frames of di_erent views is not generally integer multiples. Figure

3.1 shows the disparity compensation residuals obtained with integer pixel accuracy and quarter pixel

accuracy.

(a) Integer-pixel accuracy disparity compensation residual.

(b) Quarter-pixel accuracy disparity compensation residual.

Figure 3.1: Integer and fractional-pixel disparity compensation residuals with 8x8 blocks.

15

3.2 Empirical Analysis of the Disparity Compensation Residual

Disparity compensation(DC) residual is the di_erence of original and disparity compensated frame.

Typically, disparity compensation residual is too large to ignore, and residual is also transmitted to the

decoder. For e_cient compression, coding of disparity compensation residual is important.To code the

disparity compensation residual e_ciently, the characteristic of the residual has to be investigated. In

this section, DC residual is analyzed empirically using Figure 3.2, 3.3 and 3.4.

Figure 3.2, 3.3 and 3.4 show current frame, reference frame, disparity compensated frame and disparity

compensation residual frames from several test sequences.Current frame of Figure 3.2 is frame 15 of

Exit view 1 sequence at 640x480 resolution.Current frame of 3.3 is frame 25 of Akko&Kayo view 1

sequence at 640x480 resolution.Current frame of 3.4 is frame 30 of Uli view 1 sequence at 1024x768

resolution.Reference frames are adjacent frames of view 1 frames in view 0.Disparity compensated

frame is the outcome of interview prediction of the current frame from the reference frame.Disparity

compensation residual is the di_erence between the current frame and the compensated frame.

In occluded regions, smooth regions, texture regions and object boundaries or edges, disparity compensation

residuals can have di_erent spatial characteristics. In this section, characteristic of DC residual

in these regions is discussed.

The regions captured by only one of the views are referred as occluded regions. Mainly, the di_erence

of viewing area and overlapping objects cause occluded regions. In occluded regions, since the region

exists on either one of the views, prediction fails.Therefore, prediction residual has high magnitude

components in these regions.Occluded region images and DC residuals show similar characteristics.

Papers on the right side of the Figure 3.2a are occluded region examples.

Smooth regions in all three example figures (Figure 3.2, Figure 3.3, Figure 3.4) have near zero prediction

errors. It has been mentioned that block matching algorithm in MVC estimates translational

disparity between current and reference frames. However, in smooth regions, even if the disparity of

block between two frames is not translational, high spatial correlation enables successful prediction.

Therefore, in smooth regions of images, images and DC residuals are both smooth thus have similar

characteristics.However unlike in images, the mean of such regions is zero in DC residuals.

Bookshelf seen in Figure 3.2a can be given as an example of texture regions, and this kind of regions

have large prediction errors so that details of bookshelf are visible. The characteristic of the texture

regions in the residual frame is similar to the original image, but in the residual frame, mean of the

texture regions is almost zero.

In object boundaries or edges, disparity compensation produces large errors. Translational estimation

cause mismatch along the edges and boundaries. For example, the boundary of the peoples’ body in

Figures 3.2, 3.3, 3.4 and edges of the figure on the box in Figure 3.3 have high magnitude components

in the prediction residual frame. Object boundaries or edges in DC residual have di_erent characteristics

than in image. In the DC residual, boundary or edges shows 1D characteristic. On the other hand,

in the image, these regions unite with smooth regions and show 2D characteristics. Edges or object

boundaries in DC residuals are mostly vertical because the disparity is mainly in horizontal direction.

In summary, spatial statistical characteristics of images and disparity compensation residuals are di_erent

in some regions. This di_erence appears especially in object boundaries or edges. In these regions,

2D structures in images transform into 1D structures in disparity compensation residuals. Edges or

object boundaries constitute a significant portion of DC residual and compression e_ciency of these

16

regions can have an important impact on the overall compression e_ciency of multiview video.

(a) Current Frame (b) Reference Frame

(c) Reconstructed Frame (d) Disparity Compensation Residual

Figure 3.2: Frame 15 of Exit view 1 sequence, its reference frame which is frame 15 of view 0 sequence,

reconstructed frame obtained by 8x8 disparity compensated prediction of the current frame

from the reference frame and prediction residual.

17

(a) Current Frame (b) Reference Frame

(c) Reconstructed Frame (d) Disparity Compensation Residual

Figure 3.3: Frame 25 of Akko&Kayo view 1 sequence, its reference frame which is frame 25 of view 0

sequence, reconstructed frame obtained by 8x8 disparity compensated prediction of the current frame

from the reference frame and prediction residual.

18

(a) Current Frame (b) Reference Frame

(c) Reconstructed Frame (d) Disparity Compensation Residual

Figure 3.4: Frame 30 of Uli view 1 sequence, its reference frame which is frame 30 of view 0 sequence,

reconstructed frame obtained by 8x8 disparity compensated prediction of the current frame from the

reference frame and prediction residual.

19

3.3 Auto-covariance Analysis Of Disparity Compensation Residual

In the preceding section, characteristics of disparity compensation(DC) residual are analyzed visually

/empirically. This chapter focuses on the statistical characteristics of the disparity residual based on

the study in [4] and [1].

Images and DC residuals in Figures 3.2, 3.3 and 3.4 are modeled using the separable model (equation

3.1) and generalized model (equation 3.2).The modelling process is the same with section 2.2.3.Firstly,

image or DC residual are divided into 8x8 blocks, and a non-parametric auto-covariance of each block

is estimated.To estimate auto-covariance, mean is removed, the auto correlation of zero-mean block

is found, and each correlation element is divided by the number of overlapping points.Then, the parameters

_, _1 and _2 which minimize the mean square error between the auto covariance estimate and

models are found.In this calculation, _1 is taken as the larger correlation parameter, _ is in the range of

0 to 180.

Cs(I; J) = _jIj

1 _j jj

2 (3.1)

Cg(_; I; J) = _jIcos(_)+Jsin(_)j

1 _jô€€€I sin(_)+Jcos(_)j

2 (3.2)

The model plots are shown in Figures 3.5, 3.6 and 3.7. In these plots, each point represents one 8x8

block. Because analysis results of figures are similar to each other, only results in Figure 3.5 are

investigated in this section.

Figure 3.5a and 3.5b are obtained by modeling image in Figure 3.2a with separable model and generalized

model. In separable model plot(Figure 3.5a), points are located more dispersedly compared

to generalized model plot((Figure 3.5b)).More compact positions are indicator of better modeling and

higher compression e_ciency.In generalized model plot, more points have a _1 bigger than 0.5.Therefore,

generalized model is more successful at capturing higher correlation in images.Higher correlation

parameter gives higher compression e_ciency.

Figure 3.5c and 3.5d are obtained by modeling disparity residual in Figure 3.2d with separable model

and generalized model.In separable model plot(Figure 3.5c), most of the blocks have _1 and _2 smaller

than 0.5. In generalized model plot(Figure 3.5d), blocks tend to have _1 higher than 0.5 and _2 smaller

than 0.5. High values of correlation parameter _1 and small values of correlation parameter _2 point

out 1D correlation between DC residual pixels. In other words, DC residual has 1D structures and

these structures can be captured by generalized model but not by the separable model.

Comparison of generalized model plot of image (Figure 3.5b)and DC residual (Figure 3.5d) gives

statistical di_erences between image and DC residual.The points with _1 larger than 0.5 represent the

blocks where high correlation captured in the direction of the _1._2 values of these points is smaller in

DC residual plot.It means that the probability of having two directional correlation is smaller in DC

residual.Actually, the number of points which have the both _1 and _2 larger than 0.5 is very small

in DC residual plot.This observations point out that contrary to image, DC residual tends to have 1D

structures instead of 2D structures.

In summary, by modeling images and DC residuals with generalized model, higher correlation parameters

can be achieved compared to separable model.Generalized model is better at capturing higher

correlations in images and DC residuals.This is expected because generalized model has more pa-

20

rameters than separable model.Generalized model provides an additional degree of freedom around

axes.By setting angle parameter to zero, separable model can be obtained.Generalized model plots reveals

the di_erence between statistical characteristics of image and DC residual.Correlation between

neighboring pixels is less in DC residuals.Moreover, images tend to have exclusively 2D structures

while DC residuals also tend to have significant amount of 1D structures.

(a) Separable Model, Image (b) Generalized Model, Image

(c) Separable Model, DC residual (d) Generalized model, DC residual

Figure 3.5: Plots of correlation parameters(_1 and _2) estimated using separable and generalized models.

Figure 3.5a and 3.5b are modeling plots of image in Figure 3.2a.Figure 3.5c and 3.5c are modeling

plots of disparity compensation residual in Figure3.2d.

21

(a) Separable Model, Image (b) Generalized Model, Image

(c) Separable Model, DC residual (d) Generalized model, DC residual

Figure 3.6: Plots of correlation parameters(_1 and _2) estimated using separable and generalized models.

Figure 3.6a and 3.6b are modeling plots of image in Figure 3.3a.Figure 3.6c and 3.6d are modeling

plots of disparity compensation residual in Figure3.3d.

22

(a) Separable Model, Image (b) Generalized Model, Image

(c) Separable Model, DC residual (d) Generalized model, DC residual

Figure 3.7: Plots of correlation parameters(_1 and _2) estimated using separable and generalized models.

Figure 3.7a and 3.7b are modeling plots of image in Figure 3.4a.Figure 3.7c and 3.7d are modeling

plots of disparity compensation residual in Figure3.4d.

23

3.4 Comparison of Disparity Compensation Residual and Motion Compensation

Residual

Section 2.2.2 and 2.2.3 provide analysis of MC residual.Section 3.2 and 3.3 provide analysis of DC

residual.In this section, similarities and di_erences between MC and DC residuals are presented by

utilizing these four sections.

Firstly, MC and DC residuals are compared empirically based on the analysis in section 2.2.2 and

3.2.In smooth regions, both MC and DC residuals show similar characteristics.Smooth regions are

easy to predict, and residuals in these regions are almost zero.Hence, in most of the cases, residuals in

these regions are not coded.

MC and DC residuals in texture regions are also similar to each other.Texture regions are di_cult to

predict, and residuals in these regions show similar characteristics to the image except the mean.Mean

of texture regions in these regions is almost zero.

As distinct from MC residual, DC residual has occluded regions.In these regions, DC residual has 2D

structures.

In object boundaries or edges, as MC residual, DC residual has high magnitude components.In images,

these regions unite with 2D regions and have 2D characteristics.Due to high contrast in these regions,

1D structures are formed in the residuals after the prediction step.

Lastly, when MC and DC residuals are compared statistically based on the analysis in section 2.2.3

and 3.3, it is realized that statistical characteristics of both residuals are also similar to each other.In

both residuals, correlation between pixels are weaker than the images.More importantly, both residuals

have 1D anisotropic characteristics rather than 2D characteristics.

3.5 Summary and Outcomes

In [1], 1D directional transforms are proposed for compression of the MC residual.Results in [1]

indicates that using these 1D transforms increases compression e_ciency.

In this chapter, we analyze DC residual.According to analysis results, DC residuals have 1D anisotropic

characteristics as MC residuals.

In this thesis, referring to the characteristics of DC residual and similarities between MC and DC

residual, we propose the same 1D transforms for DC residual.

24

CHAPTER 4

SYSTEM IMPLEMENTATION WITH 1-D DIRECTIONAL

TRANSFORMS

This chapter includes some of codec steps designed to add 1D transforms to the codec.Section 4.1 discusses

implementation of transforms.Section 4.2 includes coding of transform coe_cients.In original

codec, residuals are always transformed with 2D DCT.In our implementation, residuals are transformed

using one of 1D transforms or 2D DCT.To choose optimum transform, transform selection algorithm

is needed.Section 4.4 is about transform selection.Information representing the selected transform

is named as side information.Side information is sent to the decoder for each block in the DC

residual frame.Coding process of side information is described in section 4.3.

4.1 Transformation of Residuals

H264/ MPEG-4 AVC uses integer transform in transformation step[19].In integer transform, transform

and quantization steps are merged.In this thesis, floating-point implementation of fast DCT

algorithms[4, 20] is used.Using floating point arithmetic instead of integer arithmetic increases overall

computational complexity but this does not change the results.

4.2 Coding of Transform Coe_cients

According to H264/ MPEG-4 AVC standard, transform coe_cients can be coded using one of two

methods; Context Adaptive Variable Length Coding(CAVLC) and Context Adaptive Binary Arithmetic

Coding. In this thesis, CAVLC is preferred, and only the details of CAVLC are discussed.

In CAVLC method, coe_cients are ordered using predefined scans. The aim of the scans is to order

coe_cients from biggest to smallest in magnitude. In regular H264/ MPEG-4 AVC codec, scans are

specified considering characteristics of the 2D DCT. In this thesis, we add new scan patterns for 1D

transforms. These scan patterns are the same with the patterns in [1].Except scan patterns, CAVLC

implementation is not modified.

The scan patterns for 4x4 and 8x8 1D transforms are shown in Figure 4.1 and 4.2. H264/ MPEG-4

AVC uses four length-16 scans instead of using one length-64 scan for 8x8 2D DCT. As seen in Figure

4.2, we also use four length-16 scans for 8x8 1D transforms. With scans for 1D transforms, we try to

order coe_cients from largest to smallest and to keep neighboring coe_cients close to each other.

25

Figure 4.1: Scan patterns used in coe_cient coding of proposed 4x4 1D transforms[1].

Figure 4.2: Scan patterns used in coe_cient coding of proposed 8x8 1D transforms[1].

26

4.3 Coding of Side Information

The decoder needs the selected transform information to perform appropriate inverse transform.As in

[1], the selected transform information is named as side information in this thesis. In our experiments,

we run H264/ MPEG-4 AVC in CAVLC mode and side information is also coded using VLC method.

Codewords used in experiments are seen in table 4.1. For transform size 4x4 and 8x8, 4-bit and 5-bit

codewords are used as side information of 1-D DCTs.On the other hand, for both transform sizes, 2-D

DCT is represented by 1-bit codeword. In most of the region, 2-D DCT is still optimum transform or

1-D DCTs give slightly better results compared to 2D DCT.That is why shorter codeword is given to 2

D DCT. This case helps to increase overall e_ciency, but decreases the probability of 1D DCTs being

selected. 1D DCT is selected over 2D DCT if the decrease in distortion is enough to compensate for

the higher side information bits.

Table 4.1: Side Information Codewords

Transform : Codeword

2-D DCT : 1

1-D Transform: 0XXX

(a) 4x4 Block Transforms

Transform : Codeword

2-D DCT : 1

1-D Transform: 0XXXX

(b) 8x8 Block Transforms

4.4 Transform Selection

For each residual block to be transformed, we present di_erent transform options. Each transform has

an e_ect on distortion and bitrate of the block. Comparing the transform results and choosing the best

transform is critical for compression e_ciency.We use Lagrangian based Rate-Distortion optimization

for selecting best transform option[15, 16].

Simple formulation of Lagrangian based Rate-Distortion optimization is shown in equation 4.1.In this

equation, D represents distortion cost, R represents bitrate cost.Lagrange multiplier _ controls the

trade-o_ between distortion and rate. Thus, joint cost J is calculated.

J = D + _R (4.1)

Lagrangian based Rate-Distortion optimization is also used by H264/ MPEG-4 AVC to chose best coding

mode of each macroblock[17, 18].Each macroblock is coded with all available coding options(such

as 16x16, 16x8, 8x16 MC prediction) and coding mode with minimum joint cost is chosen.

In our transform selection process, total number of bits used to code the residual block is taken as

bitrate, mean square error of the residual block is taken as distortion.The value of the Lagrange multiplier

is the same value used for selecting macroblock coding mode.Best transform of the each 8x8

block is decided independently of other blocks.One side information codeword is coded for each 8x8

block.If transform size is 8x8, all available 8x8 transforms are applied to the 8x8 block, and transform

with minimum cost is selected as best transform.If transform size is 4x4, transform selection is a little

di_erent.In this case, one common transform is selected for four 4x4 blocks in the 8x8 block.All available

4x4 transforms are applied to the 4x4 blocks in 8x8 block, and the transform with minimum cost

27

is selected as best transform option for the 8x8 block.Using the same transform for all 4x4 blocks in

the 8x8 block reduces flexibility but increase overall e_ciency.

28

CHAPTER 5

EXPERIMENTAL RESULTS AND ANALYSIS

In this chapter, compression e_ciency of using directional 1D transforms in addition to 2D DCT in

multiview coding is analyzed.The compression e_ciency of directional 1D transforms in multiview

and single view coding is also compared.

Experiments are conducted using three di_erent setups.In Section 5.1, common environment of the

experiments are explained. In section 5.2, experimental results related to single view compression are

presented.In single view experiments, 1D transforms and 2D DCT are used to transform MC residuals.

Section 5.3 provides multiview compression experiment results.In multiview experiments, 1D

transforms and 2D DCT are used to transform MC and DC residuals.Section 5.4 compares the affect

of adding 1D transforms in single view and multiview coding.Finally, section 5.5 investigates the

multiview e_ciency of 1D transforms using di_erent frame types(P and B).

5.1 Common Properties of The Setups

In all experiments, the compression e_ciency of using 1D directional transforms in addition to 2D

DCT for coding MC and/or DC prediction residuals is analyzed.Results of proposed transforms are

compared with the conventional coding method, which uses always the 2D DCT.Experiments are

conducted by modifying JMVC software (JMVC 8.5) .The JMVC (Joint Multiview Video Coding)

software is the reference software for the Multiview Video Coding (MVC) project of the Joint Video

Team (JVT) of the ISO/IEC Moving Pictures Experts Group (MPEG) and the ITU-T Video Coding

Experts Group (VCEG).

To conduct the experiments, the JMVC software is modified as described in section 4.Directional 1D

transforms are provided as an option to transform MC and/or DC residual blocks.Only transforms of

luminance component residuals are modified.Chrominance components use only 2D DCT.

Sequences used in experiments are shown in Table 5.1 and Figure 5.1.We use first 180 frames of

640x480 resolution videos and first 120 frames of 1024x768 resolution video.More information about

test sequences can be found in [6] and [7].

29

Table 5.1: Properties of the Test Sequences

Name Resolution Fps

Exit 640x480 25

Ballroom 640x480 25

Race1 640x480 30

Vassar 640x480 25

Uli 1024x768 25

Results are obtained with three di_erent experimental setups.For each experiment, after configuration

is determined, encoding and decoding is repeated for four di_erent quantization parameters (QP); 24,

28, 32, 36.Lower QP provides higher picture quality, and thus higher bitrate.

To compare coding e_ciency of the conventional coding system(which uses only 2D DCT) and the

modified coding system (which uses also 1D transforms), their compression outputs are analyzed as

follows; Average bitrate (in kbit/s) of the compressed video stream and PSNR(in dB) between compressed

and original frames are recorded for each QP. PSNR is obtained from compressed and original

luminance component of the frames. However, bitrate includes compressed bitstream of luminance

and chrominance components.With Bitrate and PSNR values for di_erent QPs, rate-distortion plots are

plotted to compare the two codecs’ compression e_ciency for a range of picture qualities.To provide

average bitrate and PSNR di_erences between rate distortion plots of the two systems, Bjontegaard-

Delta (BD) bitrate metric[8] is used.

In summary, experiments are conducted following these four steps;

_ Configure the codec.

_ Compress the video using original codec for four di_erent QP, record bitrate and PSNR values.

_ Compress the video using modified codec for four di_erent QP, record bitrate and PSNR values.

_ Compare the results.

30

(a) Exit (b) Ballroom

(c) Race1 (d) Vassar

(e) Uli

Figure 5.1: First frames of the test sequences used in the experiments.

31

5.2 Single View Compression Experiment

5.2.1 Setup Properties

The aim of this setup is to analyze the e_ciency increase of 1D transforms on single view compression.

We need to obtain single view results of 1D transforms to compare single view and multiview

compression results. Results of 1D transforms are obtained using two di_erent transform sizes and

three di_erent encoders.So, this setup includes three comparisons:

_ 4x4 DCT vs. 4x4 1D(including 4x4-DCT)

_ 8x8 DCT vs. 8x8 1D(including 8x8-DCT)

_ 4x4-and-8x8 DCT vs. 4x4-and-8x8 1D(including 4x4-and-8x8-DCT)

Only view 1 sequences are used in the experiments.Prediction structure of compression can be seen in

Figure 5.2.As seen in the Figure, the first frame is coded as I frame, remaining frames are coded as P

frames.

Figure 5.2: Prediction Structure of the Single View Compression Experiment

5.2.2 Bjontegaard-Delta Bitrate Results

Figure 5.3 shows Bjontegaard-Delta bitrate savings of the modified encoders which use 1D and 2D

DCTs together.Each plot shows the bitrate saving of the encoder with di_erent transform size.Comparisons

of 4x4 DCT and 4x4 1D, 8x8 DCT and 8x8 1D, 4x4-8x8 DCT and 4x4-8x8 1D are given in Figure

5.3.

As seen in Figure 5.3, adding the 1D directional transform option to the encoder provides bitrate

savings for all test sequences and transform sizes.It means that, using 1D directional transforms and

2D DCT for the compression of the motion compensation residual instead of using only 2D DCT

increases compression e_ciency.This e_ciency increase is related to the characteristics of the video

sequence and encoder transform size.For all test sequences, maximum bitrate saving is reached with

8x8 transform size and minimum bitrate saving is reached with 4x4 transform size.

32

Exit Ballroom Race1 Vassar Uli Average

0

2

4

6

8

10

12

14

16

18

Bitrate Savings (%)

(a) 4x4 1D vs 4x4 DCT

Exit Ballroom Race1 Vassar Uli Average

0

2

4

6

8

10

12

14

16

18

Bitrate Savings (%)

(b) 8x8 1D vs 8x8 DCT

Exit Ballroom Race1 Vassar Uli Average

0

2

4

6

8

10

12

14

16

18

Bitrate Savings (%) (

c) 4x4-8x8 1D vs 4x4-8x8 DCT

Figure 5.3: Average bitrate savings gained by adding 1D directional transform to the encoder. These

results are single view compression results of test sequences.

33

5.3 Multiview Compression Experiment

5.3.1 Setup Properties

Aim of this setup is to analyze the e_ciency of 1D transforms on multiview compression by using

di_erent transform sizes.In this manner, we observe e_ciency of the 1D transforms and find out how

this e_ciency is a_ected by transform sizes.Transform sizes are 4x4 and 8x8.

This setup includes three comparisons:

_ 4x4 DCT vs. 4x4 1D (including 4x4-DCT)

_ 8x8 DCT vs. 8x8 1D (including 8x8-DCT)

_ 4x4-and-8x8 DCT vs. 4x4-and-8x8 1D transform(including 4x4-and-8x8-DCT)

Two views, view 0 and view 1, are used in the experiments.View 0 is taken as base view and it is

compressed independently from view 1.View 1 is compressed by using view 0.In compression of view

0, the first frame is coded as I frame, remaining frames are coded as P frames.In compression of view

1, all of the frames are coded as P frames. Prediction structure can be seen in Figure 5.4.

Figure 5.4: Prediction Structure of the Setup Two

5.3.2 Rate-Distortion Plots

Figure 5.5 includes the rate-distortion plots of 4x4 1D vs 4x4 DCT, 8x8 1D vs 8x8 DCT and 4x4-8x8

1D vs 4x4-8x8 DCT comparisons.The plots show MVC results of the Uli view 1 sequence.Because

rate distortion plots of all test sequences are similar to each other, this section presents only plots of

one test sequence.

The plots show that for all transform sizes, while QP is 32 or 36, directional 1D transforms give

slightly better results than 2D DCT.When QP decreases, e_ciency of 1D transforms also increases.

One of the reason for better e_ciency in higher quality is that fraction of bitrate used in the coding of

transform coe_cients increases as QP decreases.buraya olcum alip koyabiliriz.In lower quality case,

more transform coe_cients are zero compared to higher quality case, so the e_ect of 1D directional

transform becomes less visible.The other reason is bitrate cost of the side information.Bitrate of the

side information requires higher percentage of the total bitrate in lower quality case compared to low

quality one.

The plots also show that increase in the transform size also increases the di_erence between the two

curves. The di_erence between two curves is most in 8x8 case and least in 4x4 case.This originates

34

from the fact that di_erence between 1D and 2D transforms becomes smaller with smaller transform

size. In the extreme case, taking 1 point 1D DCT and 2D DCT give the same results.

5.3.3 Bjontegaard-Delta Bitrate Results

Figure 5.6 shows Bjontegaard-Delta bitrate savings originating from adding 1D transforms to the encoders

with di_erent transform sizes.Plots of the encoders with 4x4, 8x8, 4x4-8x8 transforms are given

in turn. 1D transform improves coding performance of all test sequences.However, this improvement

changes depending on the transform size and the test sequence.

Increasing transform size a_ects bitrate saving of 1D over 2D positively.When three charts are examined,

it is seen that bitrate saving of 1D is largest in 8x8 case and smallest in 4x4 case.As explained

in section 5.3.2, this is because the di_erence between 8x8 1D and 2D transforms is higher than the

di_erence between 4x4 1D and 2D transforms.

Characteristics of the video sequence highly a_ects the bitrate savings of 1D transform.In some of the

cases, this dependency becomes very explicit.For example, in 8x8 DCT case, bitrate saving of the Exit

sequence is almost 2.5 times bigger than bitrate saving of Race1 sequence.

35

0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 1 1:1 1:2 1:3 1:4

_104

32

34

36

38

40

Bitrate(kb=s)

PS NR(dB)

4x4 1D

4x4 2D DCT

(a) 4x4 1D vs 4x4 2D DCT

0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 1 1:1 1:2 1:3 1:4

_104

32

34

36

38

40

Bitrate(kb=s)

PS NR(dB)

8x8 1D

8x8 2D DCT

(b) 8x8 1D vs 8x8 2D DCT

0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 1 1:1 1:2 1:3 1:4

_104

32

34

36

38

40

Bitrate(kb=s)

PS NR(dB)

4x4-8x8 1D

4x4-8x8 2D DCT

(c) 4x4-8x8 1D vs 4x4-8x8 2D DCT

Figure 5.5: MVC results of the Uli view 1 sequence.This view is predicted from view 0 by using

di_erent transform sizes and di_erent transforms.

36

Exit Ballroom Race1 Vassar Uli Average

0

2

4

6

8

10

12

14

16

18

Bitrate Savings (%)

(a) 4x4 1D vs 4x4 2D DCT

Exit Ballroom Race1 Vassar Uli Average

0

2

4

6

8

10

12

14

16

18

Bitrate Savings (%)

(b) 8x8 1D vs 8x8 2D DCT

Exit Ballroom Race1 Vassar Uli Average

0

2

4

6

8

10

12

14

16

18

Bitrate Savings (%) (

c) 4x4-8x8 1D vs 4x4-8x8 2D DCT

Figure 5.6: Average bitrate savings gained by adding 1D directional transform to the encoder. These

results are MVC results of view 1 sequences.This view is predicted from view 0 by using di_erent

transform sizes and di_erent transforms. 37

5.3.4 Bitrate for Coding Side Information

In section 4.3, it is mentioned that extra bits send in our 1D transform implementation.These extra

bits inform the decoder about the selected transform.These bits are named as side information.In this

section, the bitrate cost of the side information is examined.

Figure 5.10 shows bitrate percent of the side information for the Exit sequence.Bar charts are obtained

using encoders with di_erent transform sizes.In the bar chart, each column represents bitrate percent

results with di_erent quantization parameters.In all of three charts, when QP increases, bitrate percentage

of the side info also increases.In section 5.3.2, it is explained that decreasing QP a_ects the

performance of 1D transform positively.In general, if QP diminishes, usage of 1D increments, increment

in the usage of 1D transform increases both bitrate saving and bitrate percentage of the side

information.

Figure 5.11 shows the average side information bitrate percentage of all test sequences when encoder

uses both 4x4 and 8x8 1D transforms.There is a correlation between average bitrate saving and average

side info bitrate percentage.Sequence with higher bitrate saving tends to have higher average side info

bitrate percentage.However, this correlation is strong only if the di_erence between bitrate savings is

high.Vassar sequence, which has by far the biggest average bitrate saving according to 5.5c, also has

the biggest average side information bitrate percentage.On the other hand, although Exit sequence has

higher bitrate saving than Ballroom sequence in 5.5c, it has slightly smaller side info bitrate percentage.

Both figures 5.10 and figure 5.11 reveal that side information costs the significant amount of bitrate.In

our implementation, we send side information bits by using a simple algorithm.More e_cient algorithm

can be developed.

24 28 32 36

0 1 2 3 4 5 6

Quantization Parameter

Bitrate spent on side information (%)

24 28 32 36

0 1 2 3 4 5 6

Quantization Parameter

Bitrate spent on side information (%) 24 28 32 36

0 1 2 3 4 5 6

Quantization Parameter

Bitrate spent on side information (%)

Figure 5.10: Percentage of bits sent for side information.These graphs are obtained from MVC coding

of Exit view 1 sequence using di_erent transform sizes.As a multiview reference, view 0 is used.

38

Exit Ballroom Race1 Vassar Uli Average

0

1

2

3

4

5

Bitrate spent on side information (%)

Figure 5.11: Average bitrate percentage of all test sequences when encoder uses both 4x4 and 8x8 1D

transforms

5.3.5 Probabilities for Selection of Transforms

Probabilities for selection of transforms depend on quantization parameter(video quality), characteristics

of the sequence and transform size.Figure 5.12b shows average transform selection probabilities

of all sequences when 4x4 and 8x8 transforms are used with quantization parameter 24 and 36. Quantization

parameter 24 represents high quality, quantization parameter 36 represents high quality.

2D DCTs are selected more often than 1D transforms.One of the reason of this di_erence in selection

probability is the length of the side information bits.It has been mentioned in 4.3 that side information

codeword of 2D DCTs, 4x4 1D transforms and 8x8 1D transforms are 1, 4 and 5 bits respectively.

Since 1D transforms have longer codewords, they have a disadvantage in rate distortion optimization

step compared to 2D DCTs.1D transforms are selected if their distortion score is good enough to

compensate their higher bitrate. The other reason 2D DCTs are selected more often than 1D transforms

is that 1D transforms give successful results in specific regions.

In high quality videos, 1D transforms are se



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now