Facebook Open-Sources Machine-Learning Privacy Library Opacus
Facebook AI Research (FAIR) has announced the release of Opacus, a high-speed library for applying differential privacy techniques when training deep-learning models using the PyTorch framework. Opacus can achieve an order-of-magnitude speedup compared to other privacy libraries.
The library was described on the FAIR blog. Opacus provides an API and implementation of a PrivacyEngine, which attaches directly to the PyTorch optimizer during training. By using hooks in the PyTorch Autograd component, Opacus can efficiently calculate per-sample gradients, a key operation for differential privacy. Training produces a standard PyTorch model which can be deployed without changing existing model-serving code. According to FAIR,
[W]e hope to provide an easier path for researchers and engineers to adopt differential privacy in ML, as well as to accelerate DP research in the field.
Differential privacy (DP) is a mathematical definition of data privacy. The core concept of DP is to add noise to a query operation on a dataset so that removing a single data element from the dataset has a very low probability of altering the results of that query. This probability is called the privacy budget. Each successive query expends part of the total privacy budget of the dataset; once that has happened, further queries cannot be performed while still guaranteeing privacy.
When this concept is applied to machine learning, it is typically applied during the training step, effectively guaranteeing that the model does not learn “too much” about specific input samples. Because most deep-learning frameworks use a training process called stochastic gradient descent (SGD), the privacy-preserving version is called DP-SGD. During the back-propagation step, normal SGD computes a single gradient tensor for an entire input “minibatch”, which is then used to update model parameters. However, DP-SGD requires computing the gradient for the individual samples in the minibatch. The implementation of this step is the key to the speed gains for Opacus.
For computing the individual gradients, Opacus uses an efficient algorithm developed by Ian Goodfellow, inventor of the generative adversarial network (GAN) model. Applying this technique, Opacus computes the gradient for each input sample. Each gradient is clipped to a maximum magnitude, ensuring privacy for outliers in the data. The gradients are aggregated to a single tensor, and noise is added to the result before model parameters are updated. Because each training step constitutes a “query” of the input data, and thus an expenditure of privacy budget, Opacus tracks this, providing real-time monitoring and the option to stop training when the budget is expended.
In developing Opacus, FAIR and the PyTorch team collaborated with OpenMined, an open-source community dedicated to developing privacy techniques for ML and AI. OpenMined had previously contributed to Facebook’s CrypTen, a framework for ML privacy research, and developed its own projects, including a DP library called PySyft and a federated-learning platform called PyGrid. According to FAIR’s blog post, Opacus will now become one of the core dependencies of OpenMined’s libraries. PyTorch’s major competitor, Google’s deep-learning framework TensorFlow, released a DP library in early 2019. However, the library is not compatible with the newer 2.x versions of TensorFlow.