Computer Science Curriculums Must Emphasize Privacy Over Capability

Source: forbes.com

The idea of privacy is in many ways antithetical to the data-driven mindset promoted by computer science curriculums over the past decade. Massive advances in data availability and analytic capabilities has led to a reshaping of many curriculums towards coursework teaching how to manage, explore, assess, understand and exploit these newfound digital riches. Deep learning and other analytics courses are frequently filled beyond capacity. In contrast, the idea that programmers should give up their data riches in the name of privacy is a view that receives little attention in many curriculums.

Today’s computer science curriculums have centralized around preparing tomorrow’s future technology leaders to harness the digital revolution. From managing “big data” to making sense of it through deep learning, coursework has heavily emphasized the positives of today’s digital deluge rather than the considerable negatives it can wreak on privacy, safety and security.

Privacy was historically far too often lumped under the heading of cybersecurity and relegated to an afterthought. Privacy violations were largely viewed in the context of companies losing control of customer data, rather than deliberately harnessing that data in privacy-invading manners.

A company that harvested massive amounts of data from its customers, held onto it without any form of cyber intrusion and lawfully resold that data to others was often viewed as privacy-protecting, since it successfully safeguarded the data in its hands from loss.

An increasing number of programs have begun to integrate some form of ethical training into their curriculums. Yet here the focus has largely been on the programmer themselves, emphasizing how to think about what they do with users’ data and how to address biases in their designs. Such courses may emphasize topics like AI explainability to mitigate inadvertent demographic bias and consideration of algorithmic harm, such as why building an AI system that can forcibly uncover members of vulnerable communities could cause grave harm and thus should not be pursued even if it represents a great technical achievement.

Some programs teach compliance with privacy laws like GDPR, but this guidance typically revolves around “minimal minimization” in which data collection is adjusted to fit the letter of the law, if not its spirit, utilizing the laws’ myriad loopholes and exemptions.

In contrast, library and information science curriculums have historically emphasized privacy and civil liberties concerns, especially the minimization of data collection. In contrast to the digital behemoths racing behind us vacuuming up every byte of our online behavior and interests, libraries have historically adopted precisely the opposite stance, keeping only the bare minimum of information they need and deleting data the first moment they can.

Libraries have historically had enormous insights into our most intimate and unfettered interests, often recording our information consumption from the first children’s books our parents checked out of the library to read to us as infants. If libraries kept this information they could build incredible personalized recommendation systems and generate lucrative revenue streams reselling that data or making it available for advertising.

Instead, most public libraries have practiced absolute minimization in which every data point is discarded the moment it is no longer needed. Rather than keeping a user’s entire checkout history through time, most public libraries have historically kept a list only of the items currently checked out, deleting them as soon as the materials are returned.

This minimization was borne out of necessity, with libraries, especially in the pre-Internet era, of great interest to surveillance-minded authorities.

Computer science curriculums, however, have not historically emphasized this idea of minimization-at-all-costs. Quite the opposite, with data hoarding embraced as the path to limitless riches. After all, even if you pay for a service today you are still the product, as data exhaust becomes more valuable than subscription fees.

Privacy naturally conflicts with capability when it comes to data analytics. The more data and the higher resolution it is, the more insight algorithms can yield. Thus, the more companies prioritize privacy and actively delete everything they can and minimize the resolution on what they do have to collect, the less capability their analytics have to offer.

This represents a philosophical tradeoff. On the one hand, computer science students are taught to collect every datapoint they can at the highest resolution they can and to hoard it indefinitely. This extends all the way to things like diagnostic logging that often becomes an everything-or-nothing concept that has led even major companies to have serious security breaches. On the other hand, disciplines like library and information science emphasize privacy over capability, getting rid of data the moment it is safe to do so.

When it comes to government surveillance, data breaches, ethically questionable research, insider threats and other privacy issues, the less data companies keep about their users, the less information there is that can be misused and the lower their storage and analytic costs. If a company can make do with a terabyte of aggregated data rather than a petabyte of individual-level high resolution data, it can achieve considerable cost savings and move many of its batch analyses to real-time.

In the end, rather than enshrining mottos like “data is the new oil” into the vocabulary of tomorrow’s future technology leaders, perhaps we should emphasize “privacy first” and focus on how companies can absolutely minimize the data they collect to ensure a more privacy-protecting and less Orwellian future.

Related Posts

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Artificial Intelligence