Sender Based Repair Techniques Computer Science Essay

Print   

02 Nov 2017

Disclaimer:
This essay has been written and submitted by students and is not an example of our work. Please click this link to view samples of our professional work witten by our professional essay writers. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of EssayCompany.

Voice over IP is a method of forwarding voice as packets using the Internet Protocol (IP) over an IP network (Kumar 2006, p.1). Sending a voice over a packet-switched network has some advantages such as cost savings and improved services (Shavit 2007). But the quality of the voice sent has not always been competitive. Packet switching has a ‘best effort’ performance (Huston 2001). The network sends every packet as fast as possible but there is no preference in treating the packets. Also, there is no guarantee that all packets have been delivered successfully (Huston 2001). According to Hardman et al. (1995), ‘packet loss is a persistent problem and the end-to-end delay is also a critical factor.’ Jitter is also a problem occurring in Voice over IP and in the study done by Mehta (2005), the codec used to transform the analog signals into digital form is a factor to be considered as well.

2.1 Packet loss and discards

Loss of packets greatly affects the voice quality and unfortunately they are very common. Some sources of packet loss according to studies are:

Congestion of routers and gateways (Hardman et al. 1995), (Mehta 2005, p.1). When there are too many packets which are being sent, the router cannot handle all of them. The packets start queuing and there is a buffer overflow. The arriving packets are discarded by routers. The size of the buffer is limited.

The internet routes packets one at time. Some of the packets may be delayed during their transmission (Hardman et al. 1995), (Maheswari 2009, p.120). If they are late for too long, they become useless in some cases and are rejected by the receiver (Mehta 2005, p.1).

Bit error can be another problem. In certain cases, there are some bits which get changed when the packets travel from one place to another (Sgarson 2007). The sum of bits present in the packet is attached to it. When the packet is received, the router checks the content of the packet and the sum of bits. If they are different, the packet gets discarded.

Also packets can get lost if it’s time-to-live (TTL) in its header is expired. Due to this reason, even with an infinite buffer size, packet loss is not eliminated as stated by Nagle (1987).

2.2 Types of delays

Packets are sent to their destination via a series of intermediate nodes. The sum of all delays experienced by a packet on its way to the destination is called the end-to-end delay. In a research conducted by Brady (1971), if the conversation patterns are not to be broken down, the end-to-end delay should be kept below 600ms in the absence of echoes. The packet size directly affects the end-to-end delay.

There are some delays which are relatively fixed such as coding algorithms and decoding and there are some which rely on the network conditions (Kostas et al. 1998, p.21).

Processing delay: the time a routing device takes to acquire a packet in and forward it (Ramaswamy 2004). The information takes a certain amount of time to travel across the network and to reach the other end. Router needs to examine the packet’s header and directs it, manages the data flow and selects the best path.

Queuing/Buffering delay: time taken for the packet to be buffered before transmission of packets onto the link (Ramaswamy 2004). When a user sends many packets at a time, the router cannot deal with all of them together. So it assigns priorities and a queue is created. The packets wait until the router processes them.

Propagation delay: time taken for a digital signal to be transmitted via the wire (Ramaswamy 2004).

Transmission delay: time taken for a router to send the packet onto the wire (Ramaswamy 2004).

Packetization delay: time taken for the encoded information to be placed in a packet. This delay is also called accumulation delay since the voice accumulates in a buffer before being released (Cisco 2006). Packetization delay depends on the block size and the number of blocks.

2.3 Jitter

Jitter is the variation of delay of the arrivals of the packet and it is present only in packet-based networks (Kumar 2006, p.5). There is an expected time interval for the packets to reach its destination but they may experience different delays. Daniel et al. (2003) examined the characteristics and causes of the network delay jitter and developed a model for simulation of jitter. In the study, the main cause of jitter is the queuing delays experienced by the packets at the nodes. Also when there is congestion in the IP network, the packets take different paths to reach their destination and this may lead to packet delay jitter. To reduce this effect, jitter buffers are used. The buffer will introduce a small amount of delay so that the timing variations are smooth. However, using buffers to surmount this problem can lead to end-to-end delay (Daniel et al. 2003, p.1738).

2.4 Packet Recovery Techniques

‘Packet loss is unavoidable’ as stated by Kostas et al. (1998, p.21) but it can be controlled so that a better quality of speech is produced. Many methods were proposed by many authors and they can be separated into two techniques:

Sender-based

Receiver-based

2.4.1 Receiver-based techniques

Receiver-based techniques perform its action at the receiver only. This technique performs error-concealment whereby only an estimate of the missing packet is obtained (Ofir 2006, p.18). There are three main categories in the received-based repair:

Figure 2.1: Receiver based schemes (Hardman et al. 1998, p.44)

2.4.1.1 Insertion-based schemes

In this type of scheme, a ‘fill-in’ frame is inserted for a lost packet. This technique is easy to implement. A special feature of this technique is that the features of the signal are not used to help reconstruction. But the performance is usually of poor quality (Hardman et al. 1998, p.45).

Splicing

In splicing, the lost packet is replaced by zero-length fill-in. There is no gap left but the timing of the data is changed (Hardman et al. 1998, p.45). A study performed by Gruber and Strawczynski (1985) shows that splicing technique performs weakly and can be used for very low loss rates (<3%).

Silence substitution

In this technique, the lost packets are replaced by the value ‘0’. The space left by a missing packet is filled up by silence so as the timing relationship between the neighbouring packets is maintained (Hardman et al. 1998, p.45). This method is widely used because it is very simple to implement. However as the packet sizes and loss rate increases, the result produced by silence substitution gets worst. The performance of silence substitution is good for short packet lengths (< 4 ms) and low loss rates (< 2 %) (Jayant and Christenssen 1981).

Noise substitution

The use of noise can be a replacement for a lost packet. Background noise, usually additive white Gaussian noise, is inserted in the space left by the missing packet. Warren (1982) investigated the human perception of interrupted speech. It was shown that the human brain has the capacity of repairing the missing speech segment with a noise rather than a moment of silence. This is done naturally by the human brain. This effect is known as ‘phonemic restoration’. The quality of the speech seems to be better (Miller and Licklider 1950) and it has an improved intelligibility (Warren 1982) with noise substitution. The timing relationship can still be preserved. Moreover during the silent periods, the sender can send a ‘comfort noise’ for the lost packet. Therefore noise substitution is usually more recommended than silence substitution (Hardman et al. 1998, p.45).

Packet Repetition

Packet repetition is the replacement of the missing packet by a copy of the previous packet that reached just before the loss. The performance of packet repetition is good and it is less complex. Fading of the repeated units can be made to improve the quality of repetition. The signal amplitude is decreased to zero. Packet repetition with fading is a step towards the interpolation techniques (Hardman et al. 1998, p.45).

2.4.1.2 Interpolation-based schemes

There are several Interpolation-based methods and they try to interpolate from packets which are found near a loss so as to act as a substitute for the missing packet (Hardman et al. 1998, p.45). Interpolation has a key advantage compared to insertion-based techniques. The varying features of a signal are taken into consideration for the replacement. But they are more difficult to implement (Hardman et al. 1998, p.45).

Waveform Substitution

In this method, a sound is used before and non-compulsorily after so as to find a signal which is suitable to replace the missing packet. From the correct speech which is received, a segment is taken to fill in the lost packet. A study of waveform substitution has been performed by Goodman et al. (1986). The sound quality has been found to be better than using silence substitution or packet repetition.

Pitch Waveform Replication

Wasem et al (1988) used a pitch detection algorithm. In this scheme, positive and negative peaks in the waveform which give an approximation of the pitch are continuously searched. It has been found that Pitch Waveform Replication gives a better result than waveform substitution.

Time Scale Modification

Time Scale Modification enables the audio signal to be stretched across the space created left by the missing packet. Sanneck et al (1996) presented a scheme where vectors of pitch cycles which intertwine each other on each side of the loss, are offset to compensate for the loss and at the place of overlapping, the vectors are averaged. Time scale modification is computationally heavier but the result is better than waveform substitution and pitch waveform replication (Hardman et al. 1998, p.45).

2.4.1.3 Regeneration-Based schemes

Regeneration-Based scheme uses algorithms for audio compression so as to obtain codec factors to produce a replacement waveform. The result is expected to be good since a lot of information is used in the repair but this model is used rarely because it is difficult to implement (Hardman et al. 1998, p.46). There are two types for this scheme:

Interpolation of transmitted state

The decoding part can interpret what the state codec should be in. The reproduced signal gradually fades when there are more losses. This method is not simple to implement (Hardman et al. 1998, p.46).

Model-Based Recovery

In this technique, speech is regenerated to fit in the missing segment to cover the loss (Hardman et al. 1998, p.46).

2.4.2 Sender-Based Repair techniques

Figure 2.2: Taxonomy of sender-based repair techniques (Hardman et al. 1998, p.41).

The Sender-based Repair technique adds a certain amount of redundancy to the information which is later used by the receiver (Floros et al. 2008). The sender-based repair can be divided into two parts:

Active retransmission

Passive channel coding

Retransmission is simple to work with. To retransmit, it is not necessary to use the original data. The speech can be changed to a lower bandwidth depending on how much overhead can be accepted. But it adds on to the communication latency. Retransmission requires information like the packet’s sequence number and an acknowledgement thus increasing the amount of overheads (Hardman et al. 1998, p.44).

2.5 Forward Error Correction (FEC)

A common example of passive channel coding is forward error correction (FEC). FEC techniques are generally based on the use of error detection and correction. In FEC, a controlled amount of redundant packets is transmitted together with the original packets. Errors can be spotted and corrected without retransmitting the message again (Electronic Design 2000). There are many FEC algorithms namely Hamming code, Bose-Chandhuri-Hocquenghem code and Reed-Solomon code. The data transmission channel can greatly affect a code’s performance (Electronic Design 2000). FEC can be media-independent, that is it does not depend on the contents of the information. However FEC schemes introduce additional delays and an increased amount of bandwidth is used. Media-specific FEC technique transmits each unit of the speech in many different packets. In case of a packet loss, another packet with same unit can be used instead (Hardman et al 1998), (Bolot and Garcia 1997). Error-correcting codes, though being more complex than error detection codes are usually given more attention in communication applications.

FEC codes can be divided into:

Block codes

Convolutional codes

Block codes

Message blocks of fixed length are formed from the binary information sequence. Each message block contains k information bits and they are encoded into a block of n codeword digits. Parity bits are attached to the information bits forming the group of n bits. Linear block codes are characterised by the notation (,k) with a block code of length and 2k code words (Sklar 2001). Reed-Solomon code is the most known block code. Block codes are usually chosen in cases where applications need high speed to perform.

2.5.1 Reed-Solomon codes

2.5.1.1 Overview and Properties

Reed-Solomon codes are non-binary cyclic error correcting codes and they form part of optimal erasure codes. Since its discovery in 1959 by Irving Reed and Gus Solomon, the Reed-Solomon code has become an essential part of many wireless communiction applications, satellite communication, storage devices, digital television and broadband modems (ADSL) (Chua and Pheanis 2006).

Reed-Solomon codes add redundant information to the original data. After encoding, the encoded data may contain errors. The decoder will then detect where errors are found in the output data and will correct them with the help of the redundant information added. The amount of redundancy is important since the number of errors that can be corrected will depend on it.

The total number of code symbols in the encoded block, , present in a block code consist of k information bits and r parity bits. A Reed-Solomon code is represented by the notation (, k). The code has number of symbols which consist of number of bits. The number of k information symbols is also known as the dimension of the code.

[2.1]

The difference (n - k) which represents the number of parity symbols is also called 2t. The number of symbols that can be corrected by the Reed-Solomon decoder is up to (n-k)/2 (Sylvester 2001).

Only half of the parity symbols are corrected. One parity symbol is used to trace the error and another one to correct the error. One useful characteristic of Reed-Solomon code is that information symbols added to an RS code of length n wiill not decrease its minimum distance (Sklar 2001). The minimum distance is given as:

[2.2]

One property of Reed-Solomon codes is that they can correct burst errors (Agrawal 2010-2011, p.33). These errors are caused due to fading in the communication channel. RS code can correct a symbol with only one bit error and also a symbol which has errors in all its bits since it will take it as a single error. In case of erasures, the error is already situated. So only one parity symbol is used so as to correct the error. Reed-Solomon code is the best option for encoding and decoding for its ability to correct burst errors and erasures according to Agrawal (2010-2011).

Reed-Solomon code is a good choice when long block codes need to be transmitted because when the code block size increases, the error performance is also better. As the amount of redundancy added increases, the code rate decreases and the error-correcting capability also increases (Sklar 2001).

2.5.1.2 Galois field

Encoding and decoding with Reed-Solomon codes is based on a field called Galois field. A field is a resulting collection of operations like addition, subtraction, division or multiplication and they are subject to the laws of commutativity, distributivity and associativity (Oz and Naor 2004, p.3). Galois fields are finite and can be represented by a fixed length binary word. A galois field GF(p) contains p elements.

The Galois field can be widened to GF(pm) where m is non zero positive integer (Sklar 2001). A generator polynomial generates each element of the field. There can be different polynomials which will generate different fields. Primitive polynomial is used in Reed-Solomon codes and it defines the finite fields (2m). A polynomial is normally written starting with low order to the high order.

2.5.2 Convolutional codes

Convolutional code adds redundant bits. Its notation is (n, k, K). The ratio k/n is known as the code rate. The integer K is the constraint length and it represents the number of K (k bit) stages that the shift registers consists of. In a convolutional encoder, the input sequence of k-bit information is passed through shift registers. The output bits of the registers are sampled to form the binary code symbols and are then transmitted. The original information sequence can be found if the decoder knows the encoder’s state sequence. An important feature of the convolutional encoder is that it has a memory. The outputs of the encoder, n, do not depend only on the input k, but also on the previous K-1 input (Sklar 2001, p.383).

Figure 2.3: Convolutional encoder (Sklar 2001)

Decoding of Convolutional codes using Viterbi algorithm

The viterbi algorithm reconstructs the maximum likelihood path in the trellis. The distance between the received signal at a time t and all the trellis paths go through each state at that time is calculated. This algorithm causes less heavy load (Sklar 2001, p.401).

2.6 Theoretical Framework

In this section, some theories of digital communication will be reviewed.

Digital voice communication

Voiced and unvoiced sounds

In this part we will discuss how speech signals are produced. Speech production can be grouped into three different components (Sakshat Virtual Labs ‘no date’):

The first one is the quasi-periodical pulse.

The second case is where the input excitation is noise-like in nature.

And the last one is where there is no excitation.

Voice speech occurs when the input excitation is almost periodic. Oscillatory vibrations of the vocal cords form voiced sounds. The vocal fords stop the air blown out of the lungs through the trachea and the glottal wave is produced (Sakshat Virtual Labs). There are some fundamental frequency and its harmonics in the spectrum of the voiced speech. The existence of the harmonic structure is defined by the frequency components which are repeated at regular intervals.

C:\Users\MY PC\Desktop\mashouda\voiced nd unvoiced_files\experiment3-theory-fig4.JPG

Figure 2.4: Block diagram representation of voiced speech production (Sakshat Virtual Labs)

The duration of each cycle is known as the fundamental period (T0) .The fundamental frequency (F0) of input excitation is called the pitch frequency and it is one of most essential factor of the voice source (Deng and Dang ‘no date’,p.15), (Shue 2010, p.3).

The pitch depends on the intensity and composition. The pitch is much higher in female and children voice, about 200 Hz for an average female voice and 200-300 Hz for children whereas for men it is usually around 100 Hz (Cassidy 2002).

In unvoiced speech, the air is forced through a vocal tract obstruction resulting in a turbulence and the sound caused is usually represented by a noise source. There is neither fundamental frequency nor any harmonic structure in the excitation signal (Lemmetty 1999). It has a relatively flat spectrum. This is how a voiced and an unvoiced speech can be distinguished.

C:\Users\MY PC\Desktop\mashouda\voiced nd unvoiced_files\experiment3-theory-fig61.jpg

Figure 2.5: Block diagram representation of unvoiced speech production (Sakshat Virtual Labs)

The voiced and unvoiced speech is produced in sequence and they are separated by a silence region. In this region, there is no speech output. However silence is important since the speech becomes clearer and the information present in the speech can be identified.

In the analog, the voice transmission frequency spectrum is technically 4 KHz. For digital telecommunication, the signal is 8 KHz, that is it is sampled twice the rate (Mitra 2001, p.3).

PCM Communication

PCM Signal

PAM Signal

Sampling

Encoder

Input signal

Quantization

Figure 2.6: PCM Communication

Figure 2.6 shows the steps required for PCM communication. Pulse Code Modulation is used to convert analog signals into digital form (Waggener 1995). The input signal which is an analog signal is first passed through a low pass filter of a certain cutoff frequency. All frequency components above this cutoff frequency will be blocked. The signal is then sampled to produce a Pulse Amplitude Modulated signal. Sampling is a process whereby the values of the filtered input signal can be obtained at discrete time intervals, that is at a constant sampling frequency (Wells 2001). The sampling rate () is also known as the Nyquist rate. The sampling frequency should be selected above the Nyquist rate so that there is sufficient number of samples to represent the analog waveform (aliasing) (Hadi ‘no date’, p.8).

[2.3]

The PAM signal is continuous in amplitude and discrete in time. The signal is converted to a digital form. Each sample obtained is allocated a discrete value from a range of possible values which is reliant on the number of bits used to characterize each sample and this process is called quantization. Each sample is assigned to the quantization level nearest to the value of the sample. Quantization noise or error is obtained by making the difference between the original speech and the discrete value assigned to it. It can be reduced by increasing the number of quantization levels. When Quantization noise increases, the signal-to-noise ratio of a signal decreases since there are more errors (Wells 2001).

Quantization can be uniform or non-uniform. In the uniform quantization, the quantization

levels are uniformly spaced. The quantization noise is the same for all the magnitudes since noise is dependent on the step size.

A non-uniform quantization process is also known as companding. In non-uniform quantization, the step size varies. The quantization noise is proportional to the signal size (Hadi, p.30). Noise is reduced for the weak leading signals but for the rarely occurring signals, noise increases (Hadi, p.30).

Compressing the signal to be transmitted at the transmission side and expanding it at the receiving side forms the companding process. There are two companding schemes (Wells 2001) namely:

µ-law companding (used in North America)

A-law companding (used in Europe)

Speech coding

Speech coding is the process of compressing the voice signals for efficient transmission. Coding algorithm is used to minimize the bit rate in the digital representation of a signal without a significant loss of the signal. A digital speech is changed into a coded representation by a speech coder and a speech decoder reconstructs the speech (Johnson and Alwan 2003, p.1). Speech coders are different in terms of bit rate, delay, level of complexity and perceptual quality of the speech (Johnson and Alwan 2003, p.1). A good speech coding is one which uses less bit rate to represent a speech while preserving a good quality of speech. Speech can be processed in blocks using the speech coders but this causes a communication delay. There are mainly two speech coding techniques:

Waveform coding

It tries to reproduce the speech waveform as identical as possible (Johnson and Alwan 2003, p.1). It is at high bit rates that this type of coding gives a good quality of speech (Atal and Jayant 1996).

Vocoders

They keep only the spectral properties of the speech. Even a lower bit rates, a clear speech can be produced (Atal and Jayant 1996).

Speech coders are used in cellular communication, videoconferencing and voice over IP.

Gilbert Loss Model

Figure 2.7: Gilbert Model

The Gilbert loss model, also known as the 2-state Markov chain model is used to implement burst packet loss. It is simple and is well accepted to be used in voice over IP. The network is modeled with two states. State ‘1’ represents a packet loss and state ‘0’ represents delivery of the packet to its destination (Sanneck 2000, p.70). Figure 2.7 shows the different states and whether a packet is lost or delivered.

Gilbert model is usually a better approximation for the processes of packet loss. The parameter p denotes the transition probability from state ‘0’ to state ‘1’. It is the probability

that a packet will be dropped next given that the previous packet is not lost. The parameter q denotes the probability to remain in state ‘1’. It is the probability of a packet being dropped given that the previous packet is dropped.

The matrix of transition probability of Gilbert model is:

2.6.5 Erasure codes

Erasure code is a forward error correction code for the binary erasure channel. To protect information from getting lost, erasure codes provide space-optimal data redundancy (Aguilera 2005). These codes are used in communication and storage systems. k blocks of source data generates n blocks of encoded data such that the original data can be recovered back from a subset of the k blocks. The receiver protects the data up to n-k nodes. The code is represented as an (n, k) code.

The code rate is as follows:

[2.4]

There are different types of erasure codes:

Optimal erasure code

Near optimal erasure code (examples: LT codes, Raptor codes)

Rateless erasure code/ Near optimal fountain

In this project, both receiver and sender-based repair techniques are used for the concealment of packet loss. The speech is encoded using two different FEC schemes, Reed-Solomon code and Convolutional code. The Gilbert packet loss model and the random loss are used. The two FEC schemes and the receiver-based techniques are compared to know which combinations of techniques perform better.



rev

Our Service Portfolio

jb

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

whatsapp

Do not panic, you are at the right place

jb

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now