-
Notifications
You must be signed in to change notification settings - Fork 14
/
README.Rmd
169 lines (139 loc) · 7.67 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
output:
github_document:
html_preview: false
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
set.seed(3033362) # for reproducibility
```
# diffpriv <img src="man/figures/logo.png" align="right" />
```{r, echo = FALSE}
#version <- as.vector(read.dcf('DESCRIPTION')[, 'Version'])
#version <- gsub('-', '.', version)
version <- "0.4.2.9000"
```
```{r, echo = FALSE}
#dep <- as.vector(read.dcf('DESCRIPTION')[, 'Depends'])
#m <- regexpr('R *\\(>= \\d+.\\d+.\\d+\\)', dep)
#rm <- regmatches(dep, m)
#rvers <- gsub('.*(\\d+.\\d+.\\d+).*', '\\1', rm)
rvers <- "3.4.0"
```
[![packageversion](https://img.shields.io/badge/Package%20version-`r version`-orange.svg?style=flat-square)](commits/master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/diffpriv)](https://cran.r-project.org/package=diffpriv)
[![Travis Build Status](https://travis-ci.org/brubinstein/diffpriv.svg?branch=master)](https://travis-ci.org/brubinstein/diffpriv)
[![Coverage Status](https://img.shields.io/codecov/c/github/brubinstein/diffpriv/master.svg)](https://codecov.io/github/brubinstein/diffpriv?branch=master)
[![license](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/)
[![minimal R version](https://img.shields.io/badge/R%3E%3D-`r rvers`-6666ff.svg)](https://cran.r-project.org/)
## Overview
The `diffpriv` package makes privacy-aware data science in R easy.
`diffpriv` implements the formal framework of differential privacy:
differentially-private mechanisms can safely release to untrusted third parties:
statistics computed, models fit, or arbitrary structures derived on
privacy-sensitive data. Due to the worst-case nature of the framework, mechanism
development typically requires involved theoretical analysis. `diffpriv` offers
a turn-key approach to differential privacy by automating this process with
sensitivity sampling in place of theoretical sensitivity analysis.
## Installation
Obtaining `diffpriv` is easy. From within R:
```{r eval=FALSE}
## Install the release version of diffpriv from CRAN:
install.packages("diffpriv")
## Install the latest development version of diffpriv from GitHub:
install.packages("devtools")
devtools::install_github("brubinstein/diffpriv")
```
## Example
A typical example in differential privacy is privately releasing a simple
`target` function of privacy-sensitive input data `X`. Say the mean of
`numeric` data:
```{r example-1}
## a target function we'd like to run on private data X, releasing the result
target <- function(X) mean(X)
```
First load the `diffpriv` package (installed as above) and construct a
chosen differentially-private mechanism for privatizing `target`.
```{r example-2}
## target seeks to release a numeric, so we'll use the Laplace mechanism---a
## standard generic mechanism for privatizing numeric responses
library(diffpriv)
mech <- DPMechLaplace(target = target)
```
To run `mech` on a dataset `X` we must first determine the sensitivity of
`target` to small changes to input dataset. One avenue is to analytically bound
sensitivity (on paper; see the [vignette](http://brubinstein.github.io/diffpriv/articles/diffpriv.pdf)) and supply it
via the `sensitivity` argument of mechanism construction: in this case not hard
if we assume bounded data, but in general sensitivity can be very non-trivial
to calculate manually. The other approach, which we follow in this example, is
sensitivity sampling: repeated probing of `target` to estimate sensitivity
automatically. We need only specify a distribution for generating random probe
datasets; `sensitivitySampler()` takes care of the rest. The price we pay for
this convenience is the weaker form of random differential privacy.
```{r example-3}
## set a dataset sampling distribution, then estimate target sensitivity with
## sufficient samples for subsequent mechanism responses to achieve random
## differential privacy with confidence 1-gamma
distr <- function(n) rnorm(n)
mech <- sensitivitySampler(mech, oracle = distr, n = 5, gamma = 0.1)
mech@sensitivity ## DPMech and subclasses are S4: slots accessed via @
```
With a sensitivity-calibrated mechanism in hand, we can release private
responses on a dataset `X`, displayed alongside the non-private response
for comparison:
```{r example-4}
X <- c(0.328,-1.444,-0.511,0.154,-2.062) # length is sensitivitySampler() n
r <- releaseResponse(mech, privacyParams = DPParamsEps(epsilon = 1), X = X)
cat("Private response r$response: ", r$response,
"\nNon-private response target(X):", target(X))
```
## Getting Started
The above example demonstrates the main components of `diffpriv`:
* Virtual class `DPMech` for generic mechanisms that captures the non-private
`target` and releases privatized responses from it. Current subclasses
+ `DPMechLaplace`, `DPMechGaussian`: the Laplace and Gaussian mechanisms
for releasing numeric responses with additive noise;
+ `DPMechExponential`: the exponential mechanism for privately
optimizing over finite sets (which need not be numeric); and
+ `DPMechBernstein`: the Bernstein mechanism for privately releasing
multivariate real-valued functions. See the
[bernstein vignette](http://brubinstein.github.io/diffpriv/articles/bernstein.pdf) for more.
* Class `DPParamsEps` and subclasses for encapsulating privacy parameters.
* `sensitivitySampler()` method of `DPMech` subclasses estimates target
sensitivity necessary to run `releaseResponse()` of `DPMech` generic
mechanisms. This provides an easy alternative to exact sensitivity bounds
requiring mathematical analysis. The sampler repeatedly probes
`DPMech@target` to estimate sensitivity to data perturbation. Running
mechanisms with obtained sensitivities yield random differential privacy.
Read the [package vignette](http://brubinstein.github.io/diffpriv/articles/diffpriv.pdf) for more, or [news](http://brubinstein.github.io/diffpriv/news/index.html)
for the latest release notes.
## Citing the Package
`diffpriv` is an open-source package offered with a permissive MIT License.
Please acknowledge use of `diffpriv` by citing the paper on the sensitivity
sampler:
> Benjamin I. P. Rubinstein and Francesco Aldà. "Pain-Free Random Differential
> Privacy with Sensitivity Sampling", to appear in the 34th International
> Conference on Machine Learning (ICML'2017), 2017.
Other relevant references to cite depending on usage:
* **Differential privacy and the Laplace mechanism:**
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. "Calibrating
noise to sensitivity in private data analysis." In Theory of Cryptography
Conference, pp. 265-284. Springer Berlin Heidelberg, 2006.
* **The Gaussian mechanism:** Cynthia Dwork and Aaron Roth. "The algorithmic
foundations of differential privacy." Foundations and Trends in Theoretical
Computer Science 9(3–4), pp. 211-407, 2014.
* **The exponential mechanism:** Frank McSherry and Kunal Talwar. "Mechanism
design via differential privacy." In the 48th Annual IEEE Symposium on
Foundations of Computer Science (FOCS'07), pp. 94-103. IEEE, 2007.
* **The Bernstein mechanism:** Francesco Aldà and Benjamin I. P. Rubinstein.
"The Bernstein Mechanism: Function Release under Differential Privacy." In
Proceedings of the 31st AAAI Conference on Artificial Intelligence
(AAAI'2017), pp. 1705-1711, 2017.
* **Random differential privacy:** Rob Hall, Alessandro Rinaldo, and Larry
Wasserman. "Random Differential Privacy." Journal of Privacy and
Confidentiality, 4(2), pp. 43-59, 2012.