A Threat and Risk Assessment Approach for Big Data

Current information security and privacy classifications are being applied with some difficulty to a Big Data environment, which in the context of the public sector involves the emergence of large databases and increased data sharing. Such databases are usually classified as high-risk, resulting in costly security safeguards. However, de-identification can drastically lower the actual privacy risk posed by information. Could mapping de-identification to risk classifications allow organizations to invest more wisely in security and take advantage of the opportunities of Big Data?

Threat and Risk Assessments (TRA) are commonly required for new Canadian government programs and other public sector initiatives in order to determine whether their information assets are being protected appropriately. Their focus is on security: examining the potential for harm if information is accessed, released, or used inappropriately; analyzing potential risks to information; and identifying appropriate lifecycle safeguards to protect information.

In 2005, the federal government released the Canadian Information Security and Privacy Classification Policy as a guideline for risk assessments. This system defines four risk levels, based on criteria such as potential threats to public safety, injury to individuals or enterprises, financial loss, and damage to government relationships and reputation. Appropriate safeguards are identified for each risk level. The Ontario Ministry of Government Services has since adopted these classifications as a guide for TRAs within the Ontario Public Service.

Applying these classifications to a broad variety of public sector contexts has led to a couple of significant problems, both related to the phenomenon of Big Data. The federal classification guidelines were clearly designed with a political context in mind: examples given for the various risk levels include cabinet documents, briefings, speeches, and contact information. At the provincial level, these classifications do not translate easily to contexts such as healthcare, where information is collected in large volumes and regularly shared between organizations. The first problem is that the large volume of information contained in healthcare databases results in a great potential for harm in the event of a breach; consequently, such databases usually are classified as high-risk. The safeguards mandated to protect high-risk information are costly, and with the emergence of Big Data, these costs are likely to grow exponentially. The second problem pertains to information sharing: not only is there a possibility that high-risk, classified information is being shared with parties with inadequate security safeguards, but the sharing of personal information raises a number of more basic privacy issues.

To resolve these issues, government needs to stop conflating privacy with security. On the one hand, it is possible for information to be protected by adequate security safeguards but to violate privacy law nonetheless. A significant issue in the healthcare sector has been that of cascading rights when organizations share personal health information for research purposes. While all of the organizations involved may have effective security practices, the information is often disclosed and used for purposes to which patients did not consent. Because shared information is stored in multiple locations, it is often also retained longer than mandated by privacy standards. On the other hand, it is possible to protect privacy without security. Sophisticated and efficient de-identification processes can remove identifying details from records containing personal information while preserving the utility of data for research. Properly de-identified information can be shared with only a minimal risk to privacy.

The distinctions between privacy and security have a couple of implications: first, process matters when it comes to protecting data. Excellent security safeguards will not ensure proper information management if privacy concerns are not integrated into business processes and practices. Second, de-identification can radically change information risk. Calculations of re-identification risk – the probability that an individual could be identified based on their (de-identified) data – provide an objective measure of privacy risk. When privacy risk is very low, fewer security safeguards are needed. Thus, mapping levels of de-identification to information risk classifications could enable much more efficient and effective investment in information safeguards. An approach that unites privacy and security with regard to risk classification could well be the means to unlock the opportunities offered by Big Data while containing the costs of information security.