Where Insights Meet Privacy: Privacy-Preserving Machine Learning
Artificial intelligence (AI) and machine learning (ML) have the power to deliver business value and impact across a wide range of use cases, which has led to their rapidly increasing deployment across verticals. For example, the financial services industry is investing significantly in leveraging machine learning to monetize data assets, improve customer experience and enhance operational efficiencies. According to the World Economic Forum’s 2020 “Global AI in Financial Services Survey,” AI and ML are expected to “reach ubiquitous importance within two years.”
However, as the rise and adoption of AI/ML parallels that of global privacy demand and regulation, businesses must be mindful of the security and privacy considerations associated with leveraging machine learning. The implications of these regulations affect the collaborative use of AI/ML not only between entities but also internally, as they limit an organization’s ability to use and share data between business segments and jurisdictions. For a global bank, this could mean it’s prohibited from leveraging critical data assets from another country or region to evaluate ML models. This limitation on data inputs can directly affect the effectiveness of the model itself and the scope of its use.
The privacy and security implications of leveraging ML are wide and broad and are often the ultimate purview of the governance or risk management functionalities of the organization. ML governance encompasses the visibility, explainability, interpretability and reproducibility of the entire machine learning process, including the data, outcomes and machine learning artifacts. Most often, the core focus of ML governance is on protecting and understanding the ML model itself.
In its simplest form, an ML model is a mathematical representation/algorithm that uses input data to compute a set of results that could include scores, predictions or recommendations. ML models are unique in that they are trained (supervised ML) or learn (unsupervised ML) from a set of data in order to produce high-quality, meaningful results. A good deal of effort goes into effective model creation, and thus models are often considered to be intellectual property and valuable assets of the organization.
In addition to protecting ML models based on their IP merits, models must be protected from a privacy standpoint. In many business applications, effective ML models are trained on sensitive data often covered by privacy regulations, and any vulnerability of the model itself is a direct potential liability from a privacy or regulatory standpoint.
Thus, ML models are valuable — and vulnerable. Models can be reverse engineered to extract information about the organization, including the data on which the model was trained, which may contain PII, IP or other sensitive/regulated material that could damage the organization if exposed. There are two particular model-centric vulnerabilities with significant privacy and security implications: model inversion and model spoofing attacks.
In a model inversion attack, the data over which the model was trained can be inferred or extracted from the model itself. This could result in leakage of sensitive data, including data covered by privacy regulations. Model spoofing is a type of adversarial machine learning attack that attempts to fool the model into making the incorrect decision through malicious input. The attacker observes or “learns” the model and then can alter the input data, often imperceptibly, to “trick” it into making the decision that is advantageous for the attacker. This can have significant implications for common machine learning use cases, such as identity verification.