# Tutorial #12: Differential Privacy I: Introduction

The recent machine learning revolution has been fueled by data, and this data frequently comes from the end-users themselves. Consider a smartphone manufacturer trying to make type faster and more accurate, a storehouse trying to help its customers find the products they want, or a doctor trying to interpret a medical test. Machine memorize can help, but only when fed with data about how people type, what they buy, and what kinds of medical tests and results patients receive. consequently, organizations collect more and more data about their users to build these systems .
This raises concerns around the privacy of that datum. Emails, textbook messages, and medical records contain sensible information and the owners rightfully expect their privacy to be powerfully guarded. At the same time, the potential benefits of ( for case ) better medical diagnosis are undeniable. This leaves us with a challenge : how do we perform machine learning on data to enable technical advances without creating excessive risks to users ‘ privacy ?
In this two part tutorial we explore the offspring of privacy in machine learn. In separate I we discuss definitions of privacy in data analysis and cover the basics of differential privacy. Part II considers the problem of how to perform machine learning with differential privacy .

## Early approaches to privacy

We must first define what we actually mean by privacy in the context of data analysis. early work struggled with how to find a courtly definition of privacy that matches a exploiter ‘s expectations while besides being hardheaded. To see why this is unvoiced, let ‘s take a look at some earlier notions of privacy.

The most square approach is anonymization where identifying information is removed. For exercise, a name may be removed from a medical record. unfortunately, anonymization is rarely sufficient to protect privacy as the remaining information can be uniquely identifying. For case, given the sex, postal code, long time, ethnicity and acme, it may be potential to identify person uniquely, even in a identical big database .
indeed, this famously happened to the Governor of Massachusetts, William Weld, in 1997. After an indemnity group released health records which had been stripped of obvious personal information like patient diagnose and address, an aspiring graduate scholar was able to “ deanonymize ” which records belonged to Governor Weld by cross referencing with public voter rolls. This is an case of a linkage attack, where connections to other sources of information exploit to deanonymize a dataset. linkage attacks have been successful on a stove of anonymized datasets including the Netflix challenge and genome data .

## k-anonymity

One approach to prevent linkage attacks is k-anonymity. A dataset is said to be $k$ anonymous if, for any person ‘s criminal record in a dataset, there are at least $k-1$ other records which are indistinguishable. thus if a dataset is $kilobyte$ -anonymous, then the best a linkage attack could always do is identify a group of $k$ records which could belong to the person of sake. even if a dataset is n’t inherently $k$ -anonymous, it could be made so by removing integral fields of data ( like names and addresses ) and selectively censoring fields of person people who are peculiarly alone .
unfortunately, $k$ -anonymity is n’t sufficient for anything but very large datasets with only small numbers of simpleton fields for each phonograph record. intuitively, the more fields and the more possible entries there are in those fields, the more unique a record can be and the harder it is to ensure that there are $kelvin$ equivalent records .

## Centralized data and tracker attacks

Another solution is to not release the data. rather, it is kept centralized by a trusted party which answers queries. But how do we ensure that those queries do not leak individual data ? One theme might be to allow only elementary queries such as count. furthermore, answers could entirely be returned when there is a minimal question set size ( for example, we are counting large numbers of records ) .
unfortunately, this scheme is vulnerable to tracker attacks. Consider counting all records for patients who smoke, and then counting the phone number of patients that smoke whose name is n’t Marcus Brubaker. Both queries count bombastic numbers of records but together identify whether Marcus smokes. even if names are n’t available, a combination with a linkage fire could reveal the like information .

## Fundamental law of information recovery

In the previous section we saw that childlike approaches to data privacy are vulnerable to attacks. This raises the interview of whether it is possible to guarantee individual privacy in a dataset. indeed an early definition of privacy ensured that nothing could be learned about an individual when datum was released. In perfume this definition required that person observing the released data would know nothing more about an individual ‘s record than before the notice. unfortunately, this notion is flawed : if you can not learn anything new from observing the released data then the released data must not have any information in it that was n’t already available .
This raises a critical issue when it comes to understanding privacy with data analysis ; it is impossible to allow for utilitarian data analysis without at least some find of learning about the underlie data. It has since been shown formally that for any question mechanism which does n’t amply destroy information, an attacker given access to enough queries could finally reconstruct the dataset. This consequence is referred to as the “ cardinal Law of Information Recovery “ and makes explicit an inevitable tradeoff. If you wish to extract useful information from a dataset, that brings with it a risk to the privacy of the data .
This may seem to doom the entire notion of private data analysis. But in fact it lays out clearly that the calculate should be to quantify and limit how much privacy is actually lost. This is the goal of differential privacy .

## Differential Privacy

Consider an individual who is deciding whether to allow their data to be included in a database. For example, it may be a patient deciding whether their aesculapian records can be used in a analyze, or person deciding whether to answer a survey. A utilitarian notion of privacy would be an assurance that allowing their data to be included should have negligible shock on them in the future. As we ‘ve already seen, absolute privacy is inherently impossible but what is being guaranteed here is that that the chance of a privacy rape is humble. This is precisely what differential privacy ( DP ) provides .

## Randomized response

Differential privacy builds conceptually on a anterior method known as randomized response. hera, the key idea is to introduce a randomization mechanism that provides plausible deniability. Consider a surveil asking people whether they cheated on their taxes. As we have seen, queries about the results of that survey could potentially convey information about a individual individual. however, imagine if the responses recorded in the review were randomized ; a coin is flipped and if the result is ‘heads’ a random answer is recorded rather of the truthful solution. With a small manage it is placid potential to use the survey results to estimate the fraction of people who cheated on their taxes. however, every individual has plausible deniability : the commemorate reaction may or may not be the true value and hence individual privacy is protected .
In this case, there is a parameter which is the probability that the true reaction is recorded. If it ‘s very probably that the true response is recorded, then there is less privacy protection. conversely, if it ‘s unlikely that the true response is recorded, then there is more. It ‘s besides clear that, regardless of the probability, if an individual is surveyed multiple times, then there will be less protection, even if their answer is potentially randomized every prison term. differential privacy formalizes how we define, measure and track the privacy protective covering afforded to an individual as functions of factors like randomization probabilities and number of times surveyed .

## Classical definition of differential privacy

Consider two databases $\mathcal { D }$ and $\mathcal { D } ‘$. which differ by only a individual record. In addition, we consider a randomized mechanism $\mbox { M } [ \bullet ]$ that operates on the databases to produce a result. This mechanism is differentially individual if the results of $\mbox { M } [ \mathcal { D } ]$ and $\mbox { M } [ \mathcal { D } ‘ ]$ are about indistinguishable for every choice of $\mathcal { D }$ and $\mathcal { D } ‘$ .
More formally, a mechanism $\mbox { M } [ \bullet ]$ is $\epsilon$ -differentially secret if for all subsets of output $\mathcal { S } \subset \mbox { Range } [ \mbox { M } ]$ and databases $\mathcal { D }$ and $\mathcal { D } ‘$
\ [
\Pr ( \mbox { M } [ \mathcal { D } ] \in \mathcal { S } ) \le \exp [ \epsilon ] \Pr ( \mbox { M } [ \mathcal { D } ‘ ] \in \mathcal { S } ) .\tag { 1 }
\ ]
The term $\epsilon$ controls how much the output of the mechanism can differ between the two adjacent databases and captures how much privacy is lost when the mechanism is run on the database. boastfully values of $\epsilon$ correspond to entirely weak assurances of privacy while values close to zero ensure that less privacy is lost .
This definition is quite opaque and if it does n’t seem obvious then do n’t worry. belated in this tutorial we will provide a number of easy-to-understand examples that will make these ideas clear. First though, we will re-frame the definition of differential gear privacy in terms of divergences .

## Relation to divergences

There is a close connection between $\epsilon$ -DP and divergences between probability distributions. A discrepancy is a measure of the deviation between probability distributions. It is zero if the distributions are identical and becomes larger the more that they differ .
Since the mechanism $\mbox { M } [ \bullet ]$ is randomized, there is a probability distribution over its output. The mechanism is $\epsilon$ -differentially private if and only if
\ [
\mbox { div } [ \mbox { M } [ \mathcal { D } ] \Vert \mbox { M } [ \mathcal { D } ‘ ] ] \leq \epsilon\tag { 2 }
\ ]
for databases $\mathcal { D }$ and $\mathcal { D } ‘$ differing by at most a individual record. hera $\mbox { div } [ \cdot \Vert \cdot ]$ is the Renyi divergence of order $\alpha=\infty$. In other words, $\epsilon$ quantifies how large the deviation can be between the distributions of results when the mechanism is applied to two neighbouring datasets ( figure 1 ) .

From the position of person choosing to participate in a dataset where entree was $\epsilon$ -differentially private, the extra costs on average will be a factor of $\exp [ \epsilon ]$ higher than if they did not participate. Setting an appropriate value of $\epsilon$ for a given scenario is a challenge problem but these connections and guarantees can be used to help calibrate it depending on both the sensitivity of the data and the needs of the analysis .

## Properties of Differential Privacy

The preceding discussion considers only a single fixed mechanism being run once. however, we ‘ve already seen that running multiple queries or using outside information could lead to privacy violations. How can we be sure this wo n’t happen here ? Differentially individual mechanisms have two valuable properties that allow us to make some guarantees .
Post-Processing: Differentially private mechanisms are immune to post-processing. The composition of any function with a differentially secret mechanism will remain differentially private. More formally, if a mechanism $\mbox { M } [ \bullet ]$ is $\epsilon$ -differentially private and $\mbox { deoxyguanosine monophosphate } [ \bullet ]$ is any function then $\mbox { deoxyguanosine monophosphate } [ \mbox { M } [ \bullet ] ]$ is besides at least $\epsilon$ -differentially private. This means that privacy will be preserved even in the presence of linkage attacks .
Composition: Differentially private mechanisms are closed under composition. Applying multiple mechanisms ( or the lapp mechanism multiple times ) hush results in the overall mechanism being differentially private, but with a different $\epsilon$. Specifically, a writing of $thousand$ mechanisms, each of which are $\epsilon$ -differentially private is at least $k\epsilon$ -differentially secret. This provides some guarantees about robustness to tracker attacks .
The post-processing property allows us to treat differentially private mechanisms as generic pieces. Any of the large library of differentially individual mechanisms can be combined together while placid preserving differential privacy. however, the composition theorem besides makes plain that there is a limit ; while composition preserves privacy, the value of $\epsilon$ increases and then the total of privacy lost increases with every mechanism that is applied. finally, the value of $\epsilon$ will become so large that the assurances of differential privacy become practically useless .

## The Laplace Mechanism

We ‘ll now examine one of the classical techniques for derived function privacy. The Laplace Mechanism takes a deterministic officiate of a database and adds noise to the leave. much like randomizing the response to a binary star question, adding noise to continuous valued functions provides “ plausible deniability ” of the true leave and hence, privacy for any inputs into that calculation .
Let $\mbox { f } [ \bullet ]$ be a deterministic serve of a database $\mathcal { D }$ which returns a scalar value. For case, it might count the number of entries that satisfy a condition. The Laplace mechanism works by adding make noise to $\mbox { f } [ \bullet ]$ :
\ [
\mbox { M } [ \mathcal { D } ] = \mbox { fluorine } [ \mathcal { D } ] + \xi, \tag { 3 }
\ ]

where $\xi\sim \mbox { Lap } _ { \xi } [ b-complex vitamin ]$ is a sample from a Laplace distribution ( calculate 2 ) with scale $bel$. The Laplace mechanism is $\epsilon$ -differentially secret with $\epsilon = \Delta\mbox { f } /b$. The term $\Delta \mbox { farad }$ is a constant called the sensitivity which depends on the officiate $\mbox { fluorine } [ \bullet ]$ .

Let ‘s break down the components of this kinship. Larger amounts of make noise better preserve privacy but at the expense of a less accurate reception. This is controlled by the scale parameter $bel$ of the Laplace distribution which makes the response given by $\mbox { M } [ \bullet ]$ less accurate for larger values of $bel$. here we see the tradeoff between accuracy and privacy made explicit .
however, the amount of derived function privacy afforded for a situate value of $boron$ depends on the serve $fluorine [ \bullet ]$ itself. To see why, consider adding Laplacian noise with $b=1$ to ( i ) a affair which averages people ‘s income in dollars and ( two ) a function which averages people ‘s stature in meters. number 2 shows that most of the probability mass of this Laplacian noise will be between $\pm 3$. Since the expect range of the income affair is much larger than that of the height function, the fixed added noise will have relatively less effect for the income affair .
It follows that the amount of noise must be calibrated to the properties of the officiate. These properties are captured by the constant $\Delta \mbox { farad }$ which determines how much the output of $\mbox { fluorine } [ \bullet ]$ can change with the summation or removal of a individual element. Formally, $\Delta \mbox { degree fahrenheit }$ is the $\ell_1$ sensitivity of $\mbox { degree fahrenheit }$ and is defined as
\ [
\Delta \mbox { farad } = \max_ { \substack { \mathcal { D }, \mathcal { D } ‘ } } \Vert \mbox { degree fahrenheit } [ \mathcal { D } ] – \mbox { fluorine } [ \mathcal { D } ‘ ] \Vert_ { 1 } \tag { 4 }
\ ]
where $\Vert \cdot \Vert_1$ is the $\ell_1$ norm and $\mathcal { D }$ and $\mathcal { D } ‘$ disagree in only one element .

## Examples

To get a feel for this, we ‘ll present a few work examples of functions that are made differentially secret using the Laplace mechanism .

## Example 1: Counting

We start with a function that counts the number of entries in the database $\mathcal { D }$ which satisfy a given property $\mathbb { I } ( ten )$ :
\begin { equation }
\mbox { fluorine } _ { count } [ \mathcal { D } ] = \sum_ { x\in\mathcal { D } } \mathbb { I } ( ten ). \tag { 5 }
\end { equation }
Since adding or removing any chemical element of this database can only change the count by a maximum of 1, we conclude that $\Delta \mbox { fluorine } _ { count } = 1$. Using the relation $\epsilon = \Delta \mbox { f } /b$. We can deduce that an $\epsilon$ -differentially secret mechanism for counting entries in a database is given by
\ [
\mbox { M } _ { count } [ \mathcal { D } ] = \mbox { farad } _ { count } [ \mathcal { D } ] + \xi, \tag { 6 }
\ ]
where $\xi\sim\mbox { Lap } _ { \xi } [ \epsilon^ { -1 } ]$ is a random draw from a Laplace distribution with parameter $b=\epsilon^ { -1 }$ ( design 3 ) .

## Example 2: Summing

A second dim-witted affair would be to sum the entries in the database $\mathcal { D }$ which satisfy a given property $\mathbb { I } ( adam )$ :
\begin { equality }
\mbox { fluorine } _ { union } [ \mathcal { D } ] = \sum_ { x\in\mathcal { D } } x\cdot\mathbb { I } ( x ). \tag { 7 }
\end { equation }
unfortunately, the $\ell_1$ sensitivity of this function, without any more information about the values of the field, is infinite ; if the values can be randomly boastfully, then the come that their sum could change with the addition or subtraction of a new entry is besides randomly large .
To work around this let ‘s assume that $C$ is an upper bound of absolute values in a given field thus that $|x| \leq C$ for all possible values of $x$. then the $\Delta \mbox { farad } _ { sum } = C$ and hence, an $\epsilon$ -differentially private mechanism for summing entries in a database is given by adding Laplacian noise $\xi \sim \mbox { Lap } _ { \xi } [ \epsilon^ { -1 } C ]$ with parameter $b=\epsilon^ { -1 } C$. If we do n’t know the measure of $C$, then we can truncate the fields to a chosen respect $C$ and report the sum of the truncate values, giving the same mechanism as before but with the extra parameter $C$, where smaller values of $C$ reduce the amount of randomness added for privacy budget $\epsilon$ .

## Example 3: Averaging

last, let ‘s consider a officiate that averages the entries in a database $\mathcal { D }$ which satisfy a given property $\mathbb { I } ( ten )$. Following from the discussion of summing, we assume that we clip the values by $\pm C$ before the calculation. There are many different ways we can go about implementing such an operation but we ‘ll begin by again directly applying the Laplace mechanism. Consider the function
\ [
\mbox { f } _ { avg } = \begin { cases }
\frac { \sum_ { x\in\mathcal { D } } x\cdot \mathbb { I } ( ten ) } { \sum_ { x\in\mathcal { D } } \mathbb { I } ( ten ) } & \hspace { 0.5cm } \sum_ { x\in\mathcal { D } } \mathbb { I } ( adam ) > 0 \\
0 & \hspace { 0.5cm } \sum_ { x\in\mathcal { D } } \mathbb { I } ( adam ) = 0,
\end { cases } \tag { 8 }
\ ]
where the second case is introduce to prevent division by zero when none of the elements satsify the place $\mathbb { I } ( adam )$ .
The $\ell_1$ sensitivity of $\mbox { f } _ { avg }$ is $C$ .To witness this consider the worst case scenario where $\mathcal { D }$ is the evacuate database ( or there are no entries which satisfy the circumstance ) and $\mathcal { D } ‘$ consists of precisely one modern chemical element with value $C$ then the result of $\mbox { fluorine } _ { avg }$ will change by precisely $C$. Hence, to achieve $\epsilon$ differential privacy, we must add make noise $\xi\sim \mbox { Lap } _ { \xi } [ \epsilon^ { -1 } C ]$. Notice that this is the same sum of make noise that we added to the summarize function, but the average is typically a lot less than the union. Hence, this mechanism has very poor accuracy. It adds a bunch of noise to account for the worst sheath scenario of the influence of a single record when the database is empty .

## Example 4: Averaging using composition

A better approach to averaging can be found through the use of the composing and post-processing properties. We can combine the mechanism for summing and counting to give the mechanism
\ [
\mbox { M } _ { comp\ avg } [ \mathcal { D } ] = \mbox { M } _ { total } [ \mathcal { D } ] / \mbox { M } _ { count } [ \mathcal { D } ], \tag { 9 }
\ ]
which is $\epsilon_ { comp\ avg } = \epsilon_ { summarize } + \epsilon_ { count }$ differentially secret. If we set $\epsilon_ { sum } = \epsilon_ { count } = \frac { 1 } { 2 } \epsilon_ { avg }$ we get the same overall privacy cost for the two approaches to averaging. however, the accuracy of the compositional approach will be importantly better, particularly for larger databases ( visualize 4 ). The downside of this access is that there is no long elementary additive make noise on top of the true answer. This makes the relationship between $\mbox { fluorine } _ { avg }$ and $\mbox { M } _ { comp\ avg }$ more complex and potentially complicates the interpretation of the output. See Section 2.5 of this book for early mechanisms for averaging .

## Other mechanisms and definitions of differential privacy

The above examples were based on the Laplace mechanism. however, this is not the only mechanism that induces differential gear privacy. The exponential mechanism can be used to provide differentially secret answers to queries whose responses are n’t numeral. For exemplify “ what discolor of eyes is most common ? ” or “ which town has the highest prevalence of cancer ? ”. It is besides useful for constructing better mechanism for numeral computations like medians, modes, and averages .
The gaussian mechanism works by adding gaussian noise alternatively of Laplacian noise and the horizontal surface of noise is based on the $\ell_2$ sensitivity alternatively of $\ell_1$. The gaussian mechanism is commodious as additive gaussian make noise is less likely to take on extreme values than Laplacian noise and by and large beneficial tolerated by downriver analysis. unfortunately, the gaussian mechanism only satisfies a weaker human body of differential privacy known as $( \epsilon, \delta )$ -differential privacy. formally, a mechanism $\mbox { M } [ \bullet ]$ is $( \epsilon, \delta )$ -differentially secret if
\ [
\Pr [ \mbox { M } [ \mathcal { D } ] \in \mathcal { S } ] \le \exp [ \epsilon ] \Pr [ \mbox { M } [ \mathcal { D } ‘ ] \in \mathcal { S } ] + \delta, \tag { 10 }
\ ]
for all subsets of end product $\mathcal { S } \subset \textrm { Range } [ M ]$ and databases $\mathcal { D }$ and $\mathcal { D } ‘$ that differ by at most one component. $\epsilon$ -differential privacy is stronger in the sense that it limits privacy loss even in worst case scenarios which can lead to bombastic amounts of noise being required. In line, $( \epsilon, \delta )$ -differential privacy allows for potentially large privacy breaches but merely with probability $\delta$. Informally, you can think of an $( \epsilon, \delta )$ -DP method acting being $\epsilon$ -DP with probability $1-\delta$.

There are other definitions of differential privacy which bearing to weaken $\epsilon$ -differential privacy in unlike ways to make designing better mechanisms possible. The most celebrated of these is Renyi differential privacy which we ‘ll see in share II of our tutorial .

## Conclusion

In this first base part of this tutorial we have discussed the history of differential privacy, presented its ball definition, and showed how it can be used to construct differentially secret approximations to some common database queries. For more details of these basics, the best reservoir is this monograph .
In separate II we ‘ll cover some examples of recently developed mechanisms including ones that allow us to perform differentially individual machine learning while still building on standard ML tools. last, we ‘ll discuss differential gear privacy in the context of generative models and synthetic data generation .