Confidentiality and ethics in data governance

With the proliferation of IoT sensors, social media and other sources of big data, people are now revealing highly personal information on a daily basis, often without their knowledge or consent. This is called passive data generation. In some cases, individual consumers may have consented to the collection of their information, but it is not possible to determine precisely how their information may be used once it is included in large data sets and shared with third parties. companies and research institutes. The very essence of big data requires it to contain significant amounts of data to be analyzed, which often contains quite a bit of personal data. Informed consent becomes nearly impossible to obtain, as no one can clearly identify exactly what they are looking for until they have performed the analysis. So in recent years the concept of broad consent has been introduced, with the ultimate effect of removing the consent requirement altogether. Consent has become the Big Data analytics equivalent of a blank check.

A common response to this challenge is to simply anonymize the data so that it is no longer covered by privacy laws. Anonymized information no longer falls under the restrictions of privacy laws and therefore does not require the consent of the individual since the data does not technically contain protected information. Unfortunately, this logic breaks down very quickly due to the ease with which individuals can be re-identified by combining disparate data sources. A 2019 study found that 99.98% of Americans could be re-identified with any data set containing as few as 15 demographic data elements. There are also several examples of disparate anonymized public datasets being combined to piece together private information, such as when public data on New York taxi rides was combined with other sources to identify private information about specific taxi drivers, their religious practices, and even to track celebrities who have used the taxis in question. All of this now challenges the perception that has been applied to consumer data privacy since the 1970s that consumers provide informed and rational consent for their data to be used by the organization to whom they provide access to their data. Consent and anonymity are two myths in today’s big data world. The responsibility to act ethically with respect to consumer privacy now lies with data scientists rather than legal and compliance departments.

Due to the ubiquitous nature of passive data generation and the ease with which anonymized data can be rehydrated, big data can take on surveillance qualities and can easily cross a constitutionally protected line regarding individuals’ right to privacy. private life. Thus, the main ethical question is not whether to collect the information, but how and when it is ethically responsible to analyze this information. This is the key ethical decision companies must make. For too long, the feeling has been that companies are capable of analyzing all the information they have been able to gather. It’s time for data scientists to help their organizations change the way they think. In the famous words of fictional Dr. Ian Malcom, portrayed by Jeff Goldblum in Jurassic Park, “Your scientists were so preoccupied with whether they could, they didn’t stop to think if they should.” It’s time for our data scientists to stop and ask themselves if they SHOULD.