Visual Information Can Be Perceived Differently

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Visual information can be perceived differently. All television signals ultimately excite some response in the eye and the viewer can only describe the result subjectively (Watkinson, 2000). Previous scientific research has proven that the human eye is much more receptive depending on the age of the subject. Watkinson (2000), also states that in a young person the lens is flexible and muscles distort it to perform the focusing action. This is different in an older person as the lens loses flexibility which causes presbyopia.

Video signals require large bit rates. An uncompressed digital video signal of standard definition television (SDTV) has a bit rate of 270 Mbit/s. even when this bit depth is reduced from 10 bits to 8 bits and removing all blanking samples, a bit rate of 166 Mbit/s remains (Bock, 2009). Even the largest disk available cannot hold such high content with average documentaries lasting around 30 minutes. However, when a video signal is sampled pixel by pixel there is a quite a lot of redundant information. These redundancies are spatial, temporal and statistical.

Audio Visual coding systems (mpeg) can help reduce redundancy by discarding unnecessary signal information. The human perception is not affected by the removal of these signals. According to Robinson (1999), the chroma information of a colour television signal is transmitted at half the resolution of the luminance (black/white intensity component) information, because our eyes have a lower special resolving power for chroma than for luminance. Effectively, half of the chroma information is thrown away, but our eyes fail to perceive any serious degradation in the image quality.

Mpeg3

2 BACKGROUND

2.1 Introduction

The most exciting aspects of digital video are the tremendous possibilities which were not available with analogue technology. Error correction, compression, motion estimation and interpolation are difficult or impossible in the analogue domain, but are straightforward in the digital domain. Once video is in the digital domain, it becomes data, and only differs from generic data in that it needs to be reproduced with a certain timebase (Watkinson, 1994).

2.1 Human Visual System

2.1.1 Human Perception

The Human Visual System (HVS) has two obvious transducers, namely the eyes, coupled to a series of less obvious but extremely sophisticated processes, which take place in the brain (Watkinson, 2008). The human concept of reality is similar to a three-dimensional store in the mind. The mind perceives these objects as they are whether they are in full vision or not. For example a bottle of coke, placed to the left is not in full sight, however in peripheral vision is appears as a black lump. However, in the mind though it still appears as a bottle of coke with the correct colour. Moving objects do not follow this theory. Animate objects require more attention. The human eye can see clearly a fast moving object when the eye is tracking the object. For example when a ball is hit by a tennis player, the relative motion of the ball becomes smaller when the eye is tracking the ball. Wang, Ostermann and Zhang (2002), state that this phenomenon that the eyes automatically move to track observed objects is known as ‘smooth pursuit eye movement’.

2.1.2 The Retina

The retina (fig.1) does not respond instantly to light, but requires between 0.15 and 0.3 second before the brain perceives an image (Watkinson, 2008). The human eye accommodates to changes in light. When all components of the eye function accurately, light is converted to impulses and conveyed to the brain where an image is perceived. The retina is an information processing machine, located in the retina are the sensors also known as rods and cones. Vision using the rods is monochromatic and is sensitive to low light. Therefore rods are only sensitive to luminance. Cones are only receptive when it is lighter. Therefore cones are only sensitive to RGB light. The cones can be broken down into 3 sections - S-cone, M-cones and L-cones. A central part of the eye (the fovea) is densely populated with L and M cones and directly connected to the nervous system, allowing the highest resolution. Resolution then falls away from the fovea. As a result the eye must move to scan large areas of detail (Watkinson, 2008). The S cones are slightly different are blue sensitive and rare. These can lead to true colour vision being fuzzy and inferred. Having more S-Cones would lead to ‘chromatic aberration’ of the eye.

http://educationalelectronicsusa.com/l/images/light-XIIIa.gif

Fig. 1 – The Human Eye (educationalelectronicsusa.com)

2.2 Audio Masking

2.2.1 Human Auditory System

The same way as the eye works similarly to a camera. The ear works similar to a microphone. First, air pressure changes cause the eardrum to vibrate. This vibration is carried by three bones of the ear to the basilar membrane (Waggoner, 2010). The human ear can detect tiny amounts of distortion, and will accept a huge amount of dynamic range (Watkinson, 2002).The ear has a dynamic range from around 20 Hz to 20 kHz. Frequencies below 20 Hz are inaudible but can be distinguished by ‘rumble’. Frequencies at 20 kHz are only receptive to young people and as they get older these will decrease. According to Watkinson (2002), the human ear is most sensitive between 2 kHz and 5 kHz. Watkinson also states that although some people can detect 20 kHz at high levels, there is much evidence to suggest that most listeners cannot tell if the upper frequency sound is 20 kHz or 16 kHz. The noise at this range would be high-pitched which would be inaudible to humans over a certain age. Most music recordings would introduce high-pass and low-pass filters at 18 kHz and 20 Hz respectively to remove unwanted frequencies. However, this is subject to genre.

2.2.2 Psychoacoustics

Psychoacoustics is ‘lossy’ compression where an encoder will remove information that cannot be heard, in turn, saving the information that can be heard. According to Waggoner (2010), two sounds occupying of nearly the same pitch can sound just like a single note, but louder than either sound would be on its own. This is known as simultaneous masking. A non-masked threshold is the quietest level of a signal which can be perceived in quiet.  Masked thresholds are the quietest level of the signal perceived when presented in noise (Proaudiosupport.com, 2013). To put this in context if the masked threshold is 15db and the non-masked is 30db, the amount of masking would be 15db. Temporal masking is quite similar. However, this effect occurs when playing similar sounds one after another. The ear cannot detect this second sound, thus this information will be removed. Some codecs are designed to remove certain frequencies. Normally, the sounds that would be left would be in the 200 Hz to 2000 kHz. One of the more recent codecs that use this psychoacoustics model is mpeg4.

http://www.infj.ulst.ac.uk/~pnic/HumanEar/Andy's%20Stuff/MScProject/workingcode_Local/humanear.jpg

Fig – 2 – The Human Ear (Infj.ulst.ac.uk)

2.3 Audio Visual Compression

2.3.1 Lossy Compression

With lossy compression, the quality of an image may be degraded. This is necessary to reach a target data rate suitable for storage and transmission. To put digital audio over a modem link, or high definition television through a 6-MHz channel, it is evitable that the compression process will result in some loss (Symes, 2004). The objective of lossy compression is to get the maximum quality of video with minimum loss to quality. These techniques rely on the knowledge of how the information will be received by the recipient and are known as perceptual coding systems (Symes, 2004). With the multimedia environment such as video conferencing, extreme compression is required. Areas that would be lost would be spatial resolution, mainly luminance and colour.

2.3.2 Lossless Compression

Some compression techniques are truly lossless (Symes, 2004). In theory this means that data that has been compressed can be decompressed without the loss of any original data. This is achieved by removing redundancy data during the compression process. When the information is finally decompressed, it can be recreated from the remaining data. There are three types of redundancy this format uses in video sequences. These are spatial, spectral and temporal redundancy.

2.4 Video Redundancy

2.4.1 Spatial Redundancy

Spatial redundancy is the similarity in colour values shared by adjacent pixels. A red sweater in a video frame will generally possess a uniform colour value, with little or no perceptual variation from one pixel to the next. MPEG-1 employs intraframe spatial compression on redundant colour values using DCT (discrete cosine transform) (www.cs.ucf.edu/).

According to Richardson (2002), the output of a CCD array is an analogue video signal, a varying electrical signal that represents a video image. Sampling the signal at a point in time produces a sampled image or frame that has defined values at a set of sampling points. The most common format for a sampled image is a rectangle with the sampling points positioned on a square or rectangular grid.

2.4.2 Temporal Redundancy

Temporal coding is different and allows a higher compression factor, but has disadvantages that an individual picture may exist only in terms of the differences from a previous picture (Watkinson, 2008). When editing using this format it is important to be cautious as, if a picture is removed by an edit, the different data will be insufficient to re-create the current picture. Temporal redundancy is the sameness in temporal motion between video frames. If frames were not redundant, there would be no perception of smooth, realistic motion in video.

filmstrip_progressive.jpg

Fig.3 – Temporal Redundancy (J.Angus lecture notes)

2.4.3 Spectral Redundancy

Spectral redundancy in video is the similarity between colour spectra or "brightness." MPEG-1 operates in the YUB colour space. RGB data is converted to YUB. 24-bit RGB is subsampled at 4:2:0 YCrCB, where Y = luminance (brightness) and CrCB = chrominance (colour difference) (www.cs.ucf.edu/). The human eye differentiates brightness more readily than difference in pure colour value.

2.5 Motion Estimation

Motion estimation creates a model of the current frame based on available data in one or more previously encoded frames (‘reference frames’) (Richardson, 2002). The purpose of the motion estimation in MPEG is not exactly to estimate motion in regions of the picture, but, is to access a prediction region that minimizes the amount of prediction error information that needs to be coded. Using the best estimates of average motion in 16 x 16 macroblocks this can be achieved. However, some video processing algorithms require accurate motion vectors, where the estimation motion is a good match to motion as perceived by the observer. There are three methods of motion estimation. These are block matching, pixel recursive, and pyramid.

2.6 Motion Compensation

Moving objects in a ‘natural’ video scene are rarely aligned neatly along block boundaries but are likely to be irregular shaped, to be located at arbitrary positions and (in some cases) to change shape between frames (Richardson, 2003). Fig. 4

Reference frame

Possible matching positions

Current frame

Problematic macroblock

Fig.4 - Motion compensation of arbitrary-shaped moving objects (Richardson, 2003)

shows an oval-shaped object is moving and the rectangular object is static. It is difficult to find a good match in the reference frame for the highlighted macroblock, because it covers part of the moving object and part of the static object. Neither of the two matching positions shown in the reference frame are ideal.

3 ANALYSIS OF SPATIAL REDUNDANCY REDUCTION

3.1 Overview

Spatial redundancy reduction is the removal of pixels within a video by employing some data compressors, such as transform coding. The strength of transform coding in achieving data compression is that the image energy of most natural scenes is mainly concentrated in the low-frequency region, and hence into a few transform coefficients (Ghanbari, 2003). A quantisation process of the coefficients takes place with the insignificant coefficients being discarded without significantly affecting the reconstructed image. However, this process is lossy and the original data cannot be retained.

Fig.5 – Joint occurrence of a pair of pixels (Ghanbari, 2003)

Fig.5 shows how transform coding can lead to data compression. Pixels x1 and x2 may take any value between 0, which is black and 255 (white), since there are similarities between them, then it is likely that their joint occurrence lie around the 45-degree line, displayed in fig.5. If the x1x2 coordinates are rotated 45 degrees to y1y2, the joint occurrences on the new coordinates have a uniform distribution along the y1 axis; these are highly peaked around zero on the y2 axis.

3.2 Discrete Cosine Transformation (DCT)

Two widely used image transforms are the discrete cosine transform and the discrete wavelet transform. The DCT is usually applied to small blocks of images of 8 x 8 squares. Since its introduction into 1974 DCT has become the most popular transform for image and video recording (Richardson, 2003). Richardson (2003), also states the main reasons for its popularity are mainly it is effective at transforming image data into a form that is easy to compress. The second factor is that it can be effectively implemented into software and hardware.

The forward DCT converts images (the spatial domain) into transform coefficients (the transform domain). Fig.6 shows a block of input samples to a transform process. When the samples are repeated in a time-reversed order and performing a discrete Fourier transform on the double-length sample set, a DCT is obtained.

Fig.6 – The full butterfly diagram for an FFT (Fast Fourier Transform) (Watkinson, 2008)

The coefficients of the DCT measure how tall the wave is (amplitude) and how quickly it goes from one peak to the next (frequency) ( Waggoner, 2010). Due to the human eye noticing edges and changes, the important parts of the image have the highest amplitude. However, the higher the amplitude, the less the precise value of the change is critical. High frequencies are visible also, but the sharpness of the edge is not important.

3.3 Quantization

In video and audio the values to be quantized are infinitely variable voltages from an analogue source (Watkinson, 2008). Quantization has two benefits which take place during the compression process. Firstly, if the quantization process is correctly designed, visually significant coefficients are retained and the unnecessary coefficients are discarded. The second is a sparse matrix that contains levels with limited discrete values can be compressed significantly. However, this can have a detrimental effect to the quality of the image due to the reconstructed coefficients not being identical to the original coefficients. Therefore the decoded image will be different from the original. Another factor which much be noted is the amount of compression and the loss of image quality depend on how levels of quantization have taken place. Large levels mean that the coefficient precision is reduced slightly and compression is low. Therefore small levels mean a significant reduction in coefficient precision with high compression.

Fig. 7 shows the resultant uniform probability density.

Fig. 7 – In (a) an arbitrary signal is represented



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now