This resource aims to support practitioners deploying differential privacy
in practice. It endeavors to leave the reader with a clearer intuition,
responsible guidance, and case studies of privacy budgets used
by organizations today.
a publicly available communal body of knowledge about differential privacy implementations that can be used by various stakeholders to drive the identification and adoption of judicious differentially private implementations.
Public Registry
The following table presents multiple systems with publicly advertised differential privacy parameters. We also generated estimates for their equivalent parameters in other DP variants and collected their respective sources. You can click on each entry to view more detailed information.
Note: Values with (*) were generated by us by using
this formula
to convert from Pure DP to zCDP and
this formula to convert
from zCDP to approximate DP, with ε = 1:
Private Use Cases
The table below lists private applications of differential privacy where
parameters are not publicly disclosed. You can click on each entry to view more detailed information.
Overview
Over the last five years, the use of differential privacy as an output
disclosure control for sensitive data releases and queries has
grown substantially. This is due in part to the elegant and theoretically
robust underpinning of the differential privacy literature, in part to
the prevalence of attacks on traditional disclosure techniques, and in part
to the adoption of differential privacy by those perceived to set the
"gold standard" such as the US Census
which acts as a form of social proof, giving greater confidence to other
early adopters.
As a reference, one way to classify the maturity and readiness of a
technology in industry is to consider the technology readiness level
of the technology .
Systems built with differential privacy guarantees can be found between
TRL 6-9. In other words, some industry applications of differential privacy have only been demonstrated in relevant domains, while others have been deployed and tested in operational environments.
As such, finding common ground on privacy
deployments appears to be an urgent challenge for the DP industry.
The purpose of this document is to support the responsible adoption of differential privacy in industry. Differential privacy, as introduced in an upcoming section, is a measure of information loss about data subjects or entities. However, there are currently few guidelines or recommendations for choosing thresholds that define a reasonable balance between privacy and query accuracy. Moreover, these thresholds are often context-specific, making their selection a critical decision for any organization implementing differential privacy in practice.
In this document, inspired by the authors of , who first proposed the creation of the Epsilon Registry, we outline key dimensions for characterizing
applications of differential privacy and present real-world case
studies based on their deployment context and chosen privacy budgets.
While this is not intended as an endorsement of any
particular application, we hope it will serve as a baseline for informed
debate, and eventually, the emergence of best practices.
Core to this document is a registry of case studies featured at the top.
Much of the initial identification work draws on excellent prior contributions
from personal blogs
, government publications, and
NGO guides
. Despite this existing groundwork, the motivation behind this document lies in
expanding the number and classification of these case studies in an
open-source manner, allowing the broader community to contribute and shape a shared understanding.
On the other hand, if the reader is more interested in an introduction to
differential privacy, there are excellent resources available, including books
and papers
, online lecture notes
,
and dedicated websites .
While this document introduces some of the terminology used in differential
privacy, it is not intended to be a standalone resource and will reference
common techniques and mechanisms only briefly, pointing to external materials for further learning.
Finally, and importantly, this document is not intended to be static. A core purpose is to periodically add new case studies to keep pace with the ever-evolving practices of industry and government applications, and to align with guidance from regulators, which is expected to become more prevalent in the coming years. If you would like to join the authors and support the registry, please visit
Contribute page.
Official Guidance and Standardization
Before diving into the main document, it is important to note that the two
prominent standardization bodies, NIST and ISO/IEC, have been active in
providing guidance and setting standards in the space of data anonymization,
and in particular differential privacy.
ISO/IEC 20889:2018: This
standard by the ISO/IEC focuses broadly on de-identification techniques,
including synthetic data and randomization techniques. While it is partially a
normative standard, differential privacy is introduced as a formal
privacy measure in the style of an informative standard. Only
\(\epsilon\)-differential privacy is considered alongisde Laplace, Gaussian and
Exponential mechanisms, as well as the concept of cumulative privacy loss.
Interestingly, although Gaussian noise is typically associated with
\((\epsilon, \delta)\)-differential privacy and zero-concentrated
differential privacy, as will be introduced in section (ε, δ)-Differential Privacy, these more
nuanced privacy models are not defined in the standard.
NIST SP 800-226 ipd: This
guidance paper extends far beyond ISO/IEC 20889:2018, covering multiple
privacy models, the conversion between
privacy models, basic mechanisms, and threat models for both local and
central models. It is an excellent resource for understanding
the nomenclature, security models, and goals of applying differential
privacy in practice. Throughout this document we endeavor to align the
terminology with the NIST guidance paper, leaving formal definitions to
the original source.
While the aforementioned resources are useful, neither explicitly provides guidelines on how to parameterize differential privacy models in terms of privacy budgets. Nor do they point to public benchmarks that could help the community converge on industry norms over the medium to long term. In the case of ISO/IEC 20889:2018, the definitions are also limited to the most basic formulation, which often oversimplifies real-world applications. Throughout this document, and where applicable, we link to the terminology of the standard to provide consistency for the reader.
Introduction to Differential Privacy
Randomized Response Surveys
Before the age of big data and data science, traditional data collection
faced the challenge knows as the evasive answer bias. That is, individuals
withholding honest responses to survey questions out of fear that their answers might be used against them. Randomized responses
emerged in the
mid-twentieth century to address this.
Randomized response is a technique to protect the privacy of individuals
in surveys. It involves adding local noise, such as flipping a coin
multiple times and assigning the responses of the individual based on the
coin-flip sequence. In doing so, the responses can be true in expectation
but any given response is uncertain. This uncertainty over the response of
an individual is one of the first applications of differential privacy,
although it was not called as such at the time and the quantification of
privacy was simply the weighting of probabilities determined by the
mechanism.
An example of using a conditional coin-flip to achieve plausible
deniability with a calibrated bias.
The approach of randomizing responses using a stochastic mechanism, such as coin flipping, remains at the core of differential privacy today.
ε-Differential Privacy
Pure epsilon-differential privacy (\(\epsilon\)-DP) is a mathematical
guarantee that enables the sharing of aggregated statistics about a dataset while
protecting individual privacy by adding random noise. Simply put,
it ensures that the outcome of any analysis is nearly the same,
regardless of whether any individual's data is either included or
removed from the dataset.
Formally, the privacy guarantee is quantified using the privacy parameter
\(\epsilon\) (epsilon). A randomized algorithm \(A\) is
\(\epsilon\)-differentially private if for all neighboring datasets
\(D_1\) and \(D_2\) (differing in at most one element), and for all
subsets of outputs \(S \subseteq \text{Range}(M)\)
This M algorithm will provide a set amount of noise, quantified by
\(\epsilon\), which would generate outputs with certain error from the
real value, which can be quantified by the following interactive widget.
Randomized Response was ε-Differential Privacy
Although randomized response surveys predate the formal definition of differential privacy by over 40 years, the technique directly maps to the binary mechanism used in modern differential privacy.
Assume you wish to set up the spinner originally proposed in
to achieve \(\epsilon\)-differential privacy. This can be done by asking
participants to tell the truth with probability
\(\frac{e^{\frac{\epsilon}{2}}}{1 + e^{\frac{\epsilon}{2}}}\). This is
known in the literature as the binary mechanism.
This mechanism is incredibly useful for building intuition among a non-technical audience. One of the simplest questions we can ask is: “Is Alice in this dataset?” Depending on the level of privacy, the probability of answering truthfully will vary. We illustrate this relationship below.
\(\epsilon\)
Probability of Truth
Odds of Truth
While the above odds are merely illustrative, they help convey the practical meaning of epsilon
in relation to the more intuitive randomized response mechanism. As a point of reference, theorists
often advocate for \(\epsilon \approx 1\) for differential privacy
guarantees to have a meaningful privacy assurance.
Intuition of the Laplace Mechanism
One of the most widely used mechanisms in ε-differential privacy is the
Laplace mechanism. It is used when we are adding bounded values
together such as counts or summations of private values, provided the
extreme values (usually referenced as bounds) of the private
values are known and hence the maximum contribution of any data
subject is bounded.
In practice, the true sum is first computed, then noise is sampled from the Laplace distribution, and finally this noise is added to the result.
For example, if you're counting and all the values lie within range in \((0, 1)\), the widget below shows how the
distribution of noise and expected error change as
\(\epsilon\) varies.
Note that the error is additive and so we can make claims about the
absolute error, but not the relative error of the final stochastic
result.
(ε, δ)-Differential Privacy
(ε, δ)-differential privacy is a mathematical guarantee that extends the
concept of pure epsilon-differential privacy by allowing for a small
probability of failure, with a second privacy parameter \(\delta\). Just
as we described pure DP in our previous section, it also ensures that the
outcome of any analysis is nearly the same, regardless of whether any
individual's data is present, but further includes a small allowance for a
cryptographically small chance of error.
Formally, the privacy guarantee is now quantified using both \(\epsilon\)
(epsilon) and also \(\delta\) (delta). A randomized algorithm \(M\) is
\((\epsilon, \delta)\)-differentially private if for all neighboring
datasets \(D_1\) and \(D_2\) (differing in at most one element), and for
all subsets of outputs \(S \subseteq \text{Range}(M)\)
The following widget describes the expected error for noise added under
\((\epsilon, \delta)\)-DP.
Intuition of (ε, δ)-Differential Privacy
Zero-Concentrated Differential Privacy
Zero-Concentrated Differential Privacy (zCDP) introduces a
parameter \(\rho\) (rho) to measure the concentration of privacy loss
around its expected value, enabling tighter control over cumulative privacy loss in repeated or iterative analyses.
This makes zCDP particularly useful in applications that require multiple queries or iterative data use.
Formally, a randomized algorithm \(M\) satisfies \(\rho\)-zCDP such that
for neighboring datasets \(D_1\) and \(D_2\) (differing in at most one
element), and for all \(\alpha\) in (1, ∞), the following holds:
The local model in differential privacy, as defined in the ISO/IEC
, is a threat model that provides strong
privacy guarantees before data is collected by a central entity. In this
model, each user adds noise to their own data locally (for example, on
their own phone or laptop, before it is sent to a processing server). This
ensures their privacy is protected even if the data is intercepted in
transit or in the case they do not trust the central curator.
Since the noise is added very early in the pipeline, local differential
privacy trades off usability and accuracy for stronger individual privacy
guarantees. This means that while each user's data is protected even
before it reaches the central server, the aggregated results might be less
accurate compared to global differential privacy where noise is added
after data aggregation.
In local differential privacy, each data subject applies randomization
as a form of disclosure control locally before sharing their outputs with the
central aggregator.
The Central Model
In contrast to the previous section, the central model refers to a setting where privacy mechanisms are applied centrally, after data collection. In this model, individuals provide raw data and place their trust in a curator, who is responsible for applying privacy protections during downstream processing. This approach is often referred to as the global model or server model, as defined in the ISO/IEC
.
In global differential privacy, each data subject shares their private information with a trusted aggregator, who applies randomization as a disclosure control before the data is shared more broadly.
Trusted vs Adversarial Curator
When defining a threat model, a central consideration is how much trust we place in the curator. A trusted curator is assumed to correctly implement differential privacy (DP),
whereas an adversarial curator is assumed to actively attempt to breach privacy.
These notions are closely tied to the locality of the DP model, previously defined as local and global differential privacy. In the local DP model, no trust is placed in the central curator, making it compatible with an adversarial curator—since privacy protections are applied by the user directly. In contrast, the global DP model relies on the curator to enforce privacy guarantees, and therefore assumes the curator is fully trusted.
Static vs Interactive Releases
These two concepts refer to the frequency of publishing differentially private (DP) statistics. A static release involves publishing a single, fixed output with no further interaction. In contrast, interactive releases allow for repeated access, for example, by supporting multiple queries on the dataset over time. While static releases are simpler to manage, interactive releases can offer greater utility but require more robust privacy accounting for each query due to the effects of composition.
Event, Group and Entity Privacy
An important aspect of differential privacy is clearly defining what we aim to protect. Typically, our goal is to protect the atomic data subjects of a dataset, such as individuals, businesses, or other entities. However, depending on the structure of the dataset, rows may represent different types of information, and individual subjects may influence multiple records.
Event-level privacy, as described in ,
applies when we aim to protect each individual row in the dataset. A row may represent either a full data subject or a single event, for example, a credit card transaction.
Group privacy arises in settings where multiple data subjects are linked in such a way that we wish to protect the contribution of the entire group. A common example is a household in a census dataset.
Finally, there is entity-level privacy. Similar to group-level
privacy, this is when multiple records can be linked to a single entity.
An example of this would be credit card transactions. One data subject may
have zero or multiple transactions associated with them, thus in order to
protect the privacy of the entity we need to limit the effect of all
records associated with each entity.
From a technical perspective, the mechanics of the tooling to deal with
groups and entities are the same so their terminology is often used
interchangeably.
Multiple Parties and Collusions
Involving multiple parties in DP releases requires careful accounting of the privacy budget. Just as we consider an adversarial curator in the threat model, we must also account for the possibility that a group of analysts may collude to undermine the intended privacy guarantees.
In practice, collusion refers to a scenario where multiple analysts—each allocated a portion of the overall privacy budget—collaborate to combine their queries. Through composition, this can amplify the information they extract from the dataset.
Periodical Releases
This concept, closely related to the area of continual observation, involves generating multiple differentially private releases for a dataset that evolves over time.
Achieving this can be challenging
as each release must be carefully accounted for in the privacy budget.
Organizations that support DP analysis of continually updated datasets, such as some listed in our table, must carefully manage privacy budgets across both users and time periods to ensure sustained protection.