EMPLOYING DIFFERENTIAL PRIVACY FOR BIG DATA SECURITY

ABSTRACT:

Big data frequently contains huge amounts of personal identifiable information and therefore the protection of user’s privacy becomes a challenge. Lots of researches had been carried out on securing big data, but still limited in efficient privacy management and data sensitivity. This paper designs a mechanism that employs Differential Privacy for protection of personal data, which enforces privacy and access restriction level. The Differential Privacy technique acts on the request by introducing a minimum distortion to the information provided by the database system. The mechanism, DP-Data was implemented with Python scripting and Java Programming languages, Mysql and VmWare on Apache Hadoop platform. To test the effectiveness of DP-Data, a medical dataset with 1,048,576 instances and 12 attributes was employed. It was evaluated based on its utility, scalability, accuracy, sensitivity, specificity and processing time. The results indicated accuracy of 95.80 %, sensitivity of 93.60 %, specificity of 98.00 % and 0.40 ms processing time with high utility and good scalability which shows that the time it takes to preserve a data of 5000 tuples or less are almost similar. From these results, the application of differential privacy in solving privacy issue proved a high level of efficiency. Hence, the deployment of a secure big data framework that is based on access restriction and preserved level of privacy poses a higher level of protection of user’s privacy in comparison with other techniques.

Keywords: Big Data, Privacy Preservation, Differential Privacy.