02 Nov 2017
Influences of code cloning in software maintenance cost
Abstract: Major portion of software cost will be occupied by software maintenance in a software project life cycle cost. Customer will be thinking that accommodating changes to the software even after product delivery is easy, but it enormously affects the maintenance cost. Unless we compare particular software with similar or different versions, we will not know that it has more copied fragments in it. But maintenance plays big role in improving software quality. Code clones or copied fragments make it difficult to modify the software. Different approaches proposed to overcome this problem. In this paper we present an analysis of a cost model which considers various parameters.
Keywords: Clone detection, clone pair, code fragment Textual Analysis, Metrics computation
A code clone is a pair of code fragments in source files of a software product. It is found that code clone makes software maintenance difficult. The code clone problems sometimes become serious one, especially for the large scale industrial software. Maintenance thus preserves and increases the value that software provides to its users. Reducing the number of changes to the software that performed during maintenance threatens to reduce this value. An important goal of software engineering is thus to facilitate the construction of systems that are easy, and more economical to maintain.
Substantial research effort on software clones has established the negative impact of cloning on software maintenance activities in general [1]. It, often unnecessarily, increases size of the code and thus effort required for size related activities such as inspections. Changes to a code fragment, such as a bug fix, often need to be performed to its duplicates as well, cloning increases modification effort. If duplicates are missed when cloned code is modified, inconsistencies can be introduced into the system that can lead to faults, or existing faults can fail to be removed from the system. A study we published in [2] uncovered over 100 faults in productive software through analysis of unintentionally inconsistent changes to cloned code.
Successful software depends on the total cost of life cycle and software maintenance plays a major role in it. Duplicated code creates a problem in software development process. Maintenance of software system is defined as changes of a software product after delivery to correct faults, to improve performance. The maintenance is the most expensive phase of software life cycle. Software Requirement Specifications are read and changed often for requirement elicitation, software design and test case specification [3].
We classified clone pairs into following two types (Figure 1.)
In-module clone pair
We call a code fragment pair "in-module clone pair" if both fragments in the pair exist in the same
module.
(2) Inter-module clone pair
We call a code fragment pair "inter-module clone pair" if each fragment in the pair exists in the different module.
There are some root causes of cloning that exist in programming.
Systems are modularized based on principles such as information hiding, minimizing coupling and maximizing cohesion.
Programmers often reuse the copied code/ text as a template and then customize the template in the pasted content.
Disadvantages of cloning are following:
Several unwanted duplicates of code increase maintenance cost.
Incompatible changes to cloned code can create error and lead to incorrect program performance.
Maintenance is the most costly part of the software lifecycle.
Fabio Calefato et al [4] described how a semi-automated approach could be used to identify cloned functions within scripting code of web applications. The approach was based on the automatic selection of potential function clones and the visual inspection of selected script functions.
Stephane Ducasse et al [5] investigated a number of simple variants of string-based clone detection that normalize differences due to common editing operations, and assessed the quality of clone detection for very different case studies.
C. Kapser et al [6] presented an in-depth case study of cloning in a large software system that is in wide use, the Apache web server; they provided insights into cloning as it exists in this system, and they demonstrated techniques to manage and make effective use of the large result sets of clone detection tools.
Chanchal K. Roy et al [7] provided a qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools, and organized the large amount of information into a coherent conceptual framework. They began with background concepts, a generic clone detection process and an overall taxonomy of current techniques and tools.
Robert Tibshirani et al [8] applied the fused lasso method to the "hot-spot" detection problem in comparative genomic hybridization (CGH) data. The CGH signal was approximated by a piecewise function that has relatively sparse areas with nonzero values. Hence, the method was useful for determining which areas of the signal were likely to be nonzero.
Minhaz F. Zibran et al [9] presented a refactoring effort model, and proposed a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.
Hiran Dhanjia et al [10] developed a PCR-based assay that easily identifies a clone with high likelihood of producing ESBLs, including CTX-M-15.
Christopher Brown et al [11] presented a technique for the detection and removal of duplicated Haskell code. The system
was implemented within the refactoring framework of the Haskell Refactorer (HaRe), and used an Abstract Syntax Tree (AST) based approach.
Previous research in clone detection has produced a number of different techniques for the identification of duplicated source code. In the following, we describe the four most commonly used approaches for clone detection [12,13,7]:
Text based clone detection techniques work on the source code of the software system and use text transformation and normalization approaches like pattern matching and substring matching, or data mining techniques like latent semantic indexing (LSI) [15,16]. Text-based approaches are usually programming language-independent and scale well to large code bases containing millions of lines of code. However, most of these approaches are not robust to modifications of the cloned source code that are commonly carried out during software development, such as adding and deleting lines of code.
Token-based clone detection techniques work on higher-level abstractions of the software system. Using a lexical analysis, the textual representation of the system’s source code is transformed into token sequences, which are then surveyed for duplications. These techniques can be made robust to minor code modifications [17]. While token-based techniques are usually able to find a higher number of clones than other approaches, they report many false positives. As a result, additional manual verification of the clone detection results is needed, thus rendering these approaches less scalable for large-scale studies.
Syntax-based clone detection techniques evaluate the similarity of source code blocks by calculating and comparing metrics on a syntax tree representation of the code. These techniques usually produce a high level of precision in their results, but only moderate recall [18]. However, syntax-based approaches do not scale well to large-scale systems and typically require compilable source code, rendering them less valuable for measuring clones at the revision level.
Semantics-based clone detection techniques work on the program dependency graph level of a software system and hence require a thorough reverse engineering of the software system under study [19,20]. While empirical studies on the performance of these approaches reported very good results in terms of precision and recall [21], these techniques are usually hard to implement and do not scale well to the size of real-world software projects [22].
A code fragment (CF) is any sequence of code lines (with or without comments) of any granularity. A Code Fragment is identified by its file name and begin-end line numbers in the original code base and is denoted as a triple (CF.FileName, CF.BeginLine, CF.EndLine).
A code fragment that has identical or similar code fragment(s) to it in the source code, in general, terms as code clone. A copied fragment can be used with or without minor modifications in a system by the developer. If there are no modifications or the modifications are within a certain level in the copied fragment then the original and copied fragments are called code clones and they form a clone pair [12].
if (x >= y)
z = m + y;
m = m + 1;
else
z = m - x;
if (a >= b)
c = d + b;
d = d + 1;
else
c = d - a;
File A File B
Figure.2 shows the two similar Code Fragments, where identifiers are modified and all other statements are similar. The code which is highlighted in File A and File B can be called clone pairs.
There are two most important type of similarity between code fragments. Fragments can be similar based on the similarity of their program text, or they can be similar based on their functionality. In the following we have described with the types of clones based on both the textual (Types 1 to 3) and functional (Type 4) similarities as per the clone detection literature.
Type-1: Identical code fragments except for variations in whitespace, layout and comments.
Type-2: Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments.
Type-3: Copied fragments with further modifications such as changed, added or removed statements, in addition to variations in identifiers, literals, types, whitespace, layout and comments.
Type-4: Two or more code fragments that perform the same computation but are implemented by different syntactic variants.
Clones are typically created by copy & paste. Many different causes can trigger the decision to copy, paste (and possibly modify) an artifact fragment. There are two types (i) Inherent cause (ii) Maintenance Environment.
Creating software is a difficult, intellectually challenging task. Inherent causes for cloning are those that originate in the inherent complexity of software engineering even ideal processes and tools cannot eliminate them completely. One inherent reason is that creating reusable abstraction is hard. It requires a detailed understanding of the commonalities and differences among their instances. When implementing a new feature that is similar to an existing one, their commonalities and differences are not always clear. A second reason is that understanding the effect of a change is hard for large software. An exploratory prototypical implementation of the change is one way to gain understanding of its impact.
The maintenance environment comprises the processes, languages and tools employed to maintain the software system. Maintainers can decide to clone code to work around a problem. First, to reuse code, an organization needs a reuse process that governs its evolution and quality assurance. Missing or unsuitable reuse processes hinder maintainers in sharing code. In response, they reuse code through duplication. Second, short-sighted project management practices can trigger cloning. Third, to make code reusable in a new context, it sometimes needs to be adapted. Poor quality assurance techniques can make the consequences of the necessary changes difficult to validate.
We have to evaluate the cost of Open Source Systems Code (OSS). The three essential steps are mandatory:
Acquire the Open Source System (OSS) on the internet.
Calculate approximately the software lines of code (SLOC) in the OSS.
Use the analytical model [23] to evaluate cost by various new parameters.
There are multiple options available to save maintenance cost like avoiding cloned code, reducing the complexity, concentrating on test activities etc.
Any Code can be developed newly than reusing the existing code, or code can be adapted with minor changes and commercial Off the Shelf Tools (COST) can be used or automatic code generator software tools may be used in the process of developing a system. Our observations and results are plotted for the cloned code as the following.
Sys 1
869
2386
13900
Sys 2
2112
14960
10201
Sys 3
25077
4993
28175
Sys 4
1428
36499
29845
The statistics presented in the Table 1 and Figure 3 shows the cloned SLOC in different programming languages. In Sys 1 and Sys 3, the average cloned code observed to be high in JAVA but in case of Sys 2 and Sys 4, .NET contains larger amount of cloned code.
Many factors influence maintenance productivity, the type of system and domain, development process, available tools and experience of developers, to name just a few. Since these factors vary substantially between projects, they need to be reflected by cost estimation approaches to achieve accurate absolute results. The more factors a cost model comprises, the more effort is required for both its creation and its associated factor lookup tables, and for its instantiation in practice. If an absolute value is required, such effort is unavoidable.
The assessment of the impact of cloning differs from the general cost estimation problem in two important aspects. First, we compare efforts for two systems, the actual one and the hypothetical one without cloning, for which most factors are identical, since our maintenance environment does not change. Second, relative effort increase w.r.t. the cloning-free system is sufficient to evaluate the impact of cloning. Since we do not need an absolute result value in terms of costs, and since most factors influencing maintenance productivity remain constant in both settings, they do not need to be contained in our cost model. In a nutshell, we deliberately chose a relative cost model to keep its number of parameters and involved instantiation effort at bay.
The negative impact of cloning on program correctness has been stated qualitatively many times, its quantitative impact and thus its significance in practice remained unclear. Furthermore while cloning in source code had been studied intensely, little was known about its extent and consequences in other software. A lack of awareness of cloning is a threat to program correctness. While the analyzed systems varied in their share of unintentional differences and thus the amount of cloning awareness among their developers the negative impact of unintentionally inconsistent change was uniform: about every second unintentionally inconsistent change had a direct impact on program correctness. These results thus give strong indication that awareness of cloning is crucial during software maintenance. Clone control is required to achieve and maintain awareness of cloning to alleviate the negative impact of existing clones.
Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.
Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.
Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.
Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.