The Data Leakage Detection

Published Date: 02 Nov 2017

Data is a valuable asset in many organizations. There exists communication between the client and the professionals, which are "privileged" communications that legally cannot be discussed with or divulged to third parties. In business, the confidentiality of information is basic to the security of corporate information .In order to ensure secure transactions between business organizations data leakage detection plays an indispensable part. While carrying out business transactions, sometimes sensitive data must be handed over to supposedly trusted third parties for some enhancement or operations, for example, a hospital may give patient records to researchers who will devise new treatments. Similarly, a company may have partnerships with other companies that require sharing of customer data. Our goal is to identify the guilty agent when distributorâ€™s sensitive data have been leaked by some agents.

Perturbation is a very useful technique where the data is modified and made less sensitive before being handed to agents. For example, one can add random noise to certain attributes, or one can replace exact values by ranges [2]. Traditionally, leakage detection was handled by watermarking, e.g., a unique code is embedded in each distributed copy[3]. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be very useful in some cases, but again, they involve some modification of the original data.

Hence we propose data allocation strategies that improve our chances of detecting guilt agent. The data allocation strategies help the distributor to "cleverly" give data to agents. We also develop an online model for distributing the data to the agents by data allocation strategies including "fake" objects in the data requests. We start in section 2 by describing the existing system followed by problem definition in section 3.Then, section 4,5,6,7 deals with elaboration of the proposed system. Section 8 gives the scope and the applications of the proposed system.

II. Existing System

In existing system, applications where the original sensitive data cannot be perturbed were considered. Perturbation is a very useful technique where the data is modified and made â€•less sensitive before being handed to agents. However, in some cases it is important not to alter the original distributorâ€™s data. Traditionally, leakage detection was handled by watermarking, e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in the hands of an unauthorized party, the leaker can be identified. Watermarks can be very useful in some cases, but again, involve some modification of the original data. Furthermore, watermarks can sometimes be destroyed if the data recipient is malicious.

Initially these two techniques provided security against data leakage .But they altered the original sensitive data which is not acceptable in every case. Lastly, system is not online capture of leak scenario and also in existing system more focus is on data allocation problem.

III. Problem Definition

In the business process, sometime owner of data gives set of sensitive data to trusted agents for performing some operation on it. This type of data is very sensitive and leakage of this type of data happens when confidential business data is leaked out, if that data leaked and found in some unauthorized place, it leaves the company unprotected and destroys the image and customers trust and goes outside the jurisdiction of the corporation. This uncontrolled data leakage puts business in a vulnerable position. If this data is no longer within the domain, the company is at serious risk hence distributor must find out the guilty agent if the leaked from one or more agents, as op-posed to having been independently gathered by other means. Here the data allocation strategies (across the agents) that improve the probability of identifying guilty agent are pro-posed. This method works if leaked data is obtained as it was distributed or if fake records are deleted.

IV. Algorithm

Allocation for Explicit Data Requests In this request the agent will send the request with appropriate condition. Allocation for Sample Data Requests In this request agent request does not have condition. The agent sends the request without condition as per his query he will get the data.

1. Explicit Data Request :â€“

In the first place, the goal of these experiments was to see whether fake objects in the distributed data sets yield significant improvement in our chances of detecting a guilty agent. In the second place, we wanted to evaluate our e-optimal algorithm relative to a random allocation.

ALGORITHM â€“

1: R â† âˆ… Agents that can receive fake objects

2: for i = 1, . . . , n do

3: if bi > 0 then

4: R â† R âˆª {i}

5: Fi â† âˆ…

6: while B > 0 do

7: i â† SELECTAGENT(R,R1, . . . , Rn)

8: f â† CREATEFAKEOBJECT(Ri, Fi, condi)

9: Ri â† Riâˆª {f}

10: Fi â† Fi âˆª {f}

11: bi â† bi âˆ’ 1

12: if bi = 0 then

13: R â† R\{Ri}

14: B â† B â€“ 1

2. Sample Data Request â€“

With sample data requests agents are not interested in particular objects. Hence, object sharing is not explicitly defined by their requests. The distributor is "forced" to allocate certain objects to multiple agents only if the number of requested objects exceeds the number of objects in set T. The more data objects the agents request in total, the more recipients on average an object has; and the more objects are shared among different agents, the more difficult it is to detect a guilty agent.

ALGORITHM â€“

1: a â† 0|T| a[k]:number of agents who have received object tk

2: R1 â† âˆ…, . . . ,Rnâ†âˆ…

3: remaining â†

4: while remaining > 0 do

5: for all i = 1, . . . , n : |Ri| < mi do

6: k â† SELECTOBJECT(i,Ri) 7: Ri â† Riâˆª {tk}

8: a[k] â† a[k] + 1

9: remaining â† remaining â€“ 1

V. Implementation and Methodology

Data Transfer

Fake objects addition

Guilt Model Analysis

Show the probability distribution of data leakage

Logout

Fig 1. Implementation flow.

Data Distributor

Data Allocation

Fake Object

Fig 2 Methodology diagram.

Data Distributor

The data distributor becomes the core of the agent guilt model. The owner of the data is known as the distributor and to whom the data was passed on is known as agent.

The data distributor has handed over some sensitive information to a business organization. In order to prevent the information from getting leaked, the agent guilt model stresses the distributor to add fake objects in the data set that has to be transferred.

In a sense, the fake objects acts as a type of watermark for the entire set, without modifying any individual members. If it turns out an agent was given one or more fake objects that were leaked, then order to improve his effectiveness in detecting guilty agents. Our use of fake objects is inspired by the use of "trace" records in mailing lists. Fake objects are objects generated by the distributor that are not in set T. The objects are designed to look like real objects, and are distributed to agents together with T objects, in order to increase the chances of detecting agents that leak data.

VI. Scope

Technology is evolving rapidly, and data is becoming more accessible therefore the scope for data leakage is very wide .Data leakage detection will help the organizations to prevent their intellectual property that needs protection from competitors for the company to stay profitable. The main benefit of data leakage detection is the protection of information that is critical for the organizations. It will prevent unauthorized release of confidential information thereby minimizing loss of intellectual property that will help the organization to maintain the competitive advantage that a business holds. So any organization that has online presence and holds confidential customer information will receive negative publicity and damaged reputation when their data has been lost or stolen or leak. Preventing this from occurring will help to maintain a positive reputation for the organizations.

VII. References

[1] Peter Gordon â€•Data Leakage â€“ Threats and Mitigationâ€– SANS Institute Reading Room October 15, 2007

[2] J. Clerk Ma P. Papadimitriou and H. Garcia-Molina, â€•Data leakage detection,â€– Stanford University.

[3] R. Agrawal and J. Kiernan, â€•Watermarking Relational Databases Proc. 28th Intâ€™l Conf. Very Large Data Bases (VLDB â€™02), VLDB Endowment, pp. 155-166, 2002.

[4] Panagiotis Papadimitriou, Student Member, IEEE, and Hector Gar-cia-Molina, Member, IEEE â€•Data Leakage Detectionâ€• IEEE Transactions on knowledge and data engineering, Vol. 23, NO. 1, January 2011 [5] P. Buneman and W.-C. Tan, â€•Provenance in Databases,â€– Proc. ACM SIGMOD, pp. 1171-1173, 2007.

[6] Y. Cui and J. Widom, â€•Lineage Tracing for General Data Warehouse Transformations. The VLDB J., vol. 12, pp. 41-58, 2003.

[7] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, â€•Flexible Support for Multiple Access control Policies,â€– ACM Trans. Data-base Systems, vol. 26, no. 2, pp. 214-260, 2001.

[8] P. Bonatti, S.D.C. di Vimercati, and P. Samarati, An Algebra for Composing Access Control Policies,ACM Trans. Information and SystemSecurity, vol.5, no.1, pp.1-35, 2002.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now