Differential Privacy Deployments Registry

Inspired by Differential Privacy in Practice: Expose your Epsilons! by Cynthia Dwork, Nitin Kohli, and Deirdre Mulligan, this registry provides:

a publicly available communal body of knowledge about differential privacy implementations that can be used by various stakeholders to drive the identification and adoption of judicious differentially private implementations.

Public Registry

The following table presents multiple systems with publicly advertised differential privacy parameters. We also generated estimates for their equivalent parameters in other DP variants and collected their respective sources. You can click on each entry to view more detailed information.

Note: Values with (*) were generated by us by using this formula to convert from Pure DP to zCDP and this formula to convert from zCDP to approximate DP, with ε = 1:

Private Use Cases

The table below lists private applications of differential privacy where parameters are not publicly disclosed. You can click on each entry to view more detailed information.

Overview

Over the last five years, the use of differential privacy as an output disclosure control for sensitive data releases and queries has grown substantially. This is due in part to the elegant and theoretically robust underpinning of the differential privacy literature, in part to the prevalence of attacks on traditional disclosure techniques, and in part to the adoption of differential privacy by those perceived to set the "gold standard" such as the US Census which acts as a form of social proof, giving greater confidence to other early adopters.

As a reference, one way to classify the maturity and readiness of a technology in industry is to consider the technology readiness level of the technology . Systems built with differential privacy guarantees can be found between TRL 6-9. In other words, some industry applications of differential privacy have only been demonstrated in relevant domains, while others have been deployed and tested in operational environments. As such, finding common ground on privacy deployments appears to be an urgent challenge for the DP industry.

The purpose of this document is to support the responsible adoption of differential privacy in industry. Differential privacy, as introduced in an upcoming section, is a measure of information loss about data subjects or entities. However, there are currently few guidelines or recommendations for choosing thresholds that define a reasonable balance between privacy and query accuracy. Moreover, these thresholds are often context-specific, making their selection a critical decision for any organization implementing differential privacy in practice.

In this document, inspired by the authors of , who first proposed the creation of the Epsilon Registry, we outline key dimensions for characterizing applications of differential privacy and present real-world case studies based on their deployment context and chosen privacy budgets. While this is not intended as an endorsement of any particular application, we hope it will serve as a baseline for informed debate, and eventually, the emergence of best practices.

Core to this document is a registry of case studies featured at the top. Much of the initial identification work draws on excellent prior contributions from personal blogs , government publications, and NGO guides . Despite this existing groundwork, the motivation behind this document lies in expanding the number and classification of these case studies in an open-source manner, allowing the broader community to contribute and shape a shared understanding.

On the other hand, if the reader is more interested in an introduction to differential privacy, there are excellent resources available, including books and papers , online lecture notes , and dedicated websites . While this document introduces some of the terminology used in differential privacy, it is not intended to be a standalone resource and will reference common techniques and mechanisms only briefly, pointing to external materials for further learning.

Finally, and importantly, this document is not intended to be static. A core purpose is to periodically add new case studies to keep pace with the ever-evolving practices of industry and government applications, and to align with guidance from regulators, which is expected to become more prevalent in the coming years. If you would like to join the authors and support the registry, please visit Contribute page.

Official Guidance and Standardization

Before diving into the main document, it is important to note that the two prominent standardization bodies, NIST and ISO/IEC, have been active in providing guidance and setting standards in the space of data anonymization, and in particular differential privacy.

ISO/IEC 20889:2018 : This standard by the ISO/IEC focuses broadly on de-identification techniques, including synthetic data and randomization techniques. While it is partially a normative standard, differential privacy is introduced as a formal privacy measure in the style of an informative standard. Only \(\epsilon\)-differential privacy is considered alongisde Laplace, Gaussian and Exponential mechanisms, as well as the concept of cumulative privacy loss. Interestingly, although Gaussian noise is typically associated with \((\epsilon, \delta)\)-differential privacy and zero-concentrated differential privacy, as will be introduced in section (ε, δ)-Differential Privacy, these more nuanced privacy models are not defined in the standard.

NIST SP 800-226 ipd : This guidance paper extends far beyond ISO/IEC 20889:2018, covering multiple privacy models, the conversion between privacy models, basic mechanisms, and threat models for both local and central models. It is an excellent resource for understanding the nomenclature, security models, and goals of applying differential privacy in practice. Throughout this document we endeavor to align the terminology with the NIST guidance paper, leaving formal definitions to the original source.

While the aforementioned resources are useful, neither explicitly provides guidelines on how to parameterize differential privacy models in terms of privacy budgets. Nor do they point to public benchmarks that could help the community converge on industry norms over the medium to long term. In the case of ISO/IEC 20889:2018, the definitions are also limited to the most basic formulation, which often oversimplifies real-world applications. Throughout this document, and where applicable, we link to the terminology of the standard to provide consistency for the reader.

Introduction to Differential Privacy

Randomized Response Surveys

Before the age of big data and data science, traditional data collection faced the challenge knows as the evasive answer bias. That is, individuals withholding honest responses to survey questions out of fear that their answers might be used against them. Randomized responses emerged in the mid-twentieth century to address this.

Randomized response is a technique to protect the privacy of individuals in surveys. It involves adding local noise, such as flipping a coin multiple times and assigning the responses of the individual based on the coin-flip sequence. In doing so, the responses can be true in expectation but any given response is uncertain. This uncertainty over the response of an individual is one of the first applications of differential privacy, although it was not called as such at the time and the quantification of privacy was simply the weighting of probabilities determined by the mechanism.

An example of using a conditional coin-flip to achieve plausible deniability with a calibrated bias.

The approach of randomizing responses using a stochastic mechanism, such as coin flipping, remains at the core of differential privacy today.

ε-Differential Privacy

Pure epsilon-differential privacy (\(\epsilon\)-DP) is a mathematical guarantee that enables the sharing of aggregated statistics about a dataset while protecting individual privacy by adding random noise. Simply put, it ensures that the outcome of any analysis is nearly the same, regardless of whether any individual's data is either included or removed from the dataset.

Formally, the privacy guarantee is quantified using the privacy parameter \(\epsilon\) (epsilon). A randomized algorithm \(A\) is \(\epsilon\)-differentially private if for all neighboring datasets \(D_1\) and \(D_2\) (differing in at most one element), and for all subsets of outputs \(S \subseteq \text{Range}(M)\)

\[ \Pr[M(D_1) \in S] \leq e^{\epsilon} \cdot \Pr[M(D_2) \in S] \]

This M algorithm will provide a set amount of noise, quantified by \(\epsilon\), which would generate outputs with certain error from the real value, which can be quantified by the following interactive widget.

Randomized Response was ε-Differential Privacy

Although randomized response surveys predate the formal definition of differential privacy by over 40 years, the technique directly maps to the binary mechanism used in modern differential privacy.

Assume you wish to set up the spinner originally proposed in to achieve \(\epsilon\)-differential privacy. This can be done by asking participants to tell the truth with probability \(\frac{e^{\frac{\epsilon}{2}}}{1 + e^{\frac{\epsilon}{2}}}\). This is known in the literature as the binary mechanism.

This mechanism is incredibly useful for building intuition among a non-technical audience. One of the simplest questions we can ask is: “Is Alice in this dataset?” Depending on the level of privacy, the probability of answering truthfully will vary. We illustrate this relationship below.

\(\epsilon\)	Probability of Truth	Odds of Truth

While the above odds are merely illustrative, they help convey the practical meaning of epsilon in relation to the more intuitive randomized response mechanism. As a point of reference, theorists often advocate for \(\epsilon \approx 1\) for differential privacy guarantees to have a meaningful privacy assurance.

Intuition of the Laplace Mechanism

One of the most widely used mechanisms in ε-differential privacy is the Laplace mechanism. It is used when we are adding bounded values together such as counts or summations of private values, provided the extreme values (usually referenced as bounds) of the private values are known and hence the maximum contribution of any data subject is bounded.

In practice, the true sum is first computed, then noise is sampled from the Laplace distribution, and finally this noise is added to the result. For example, if you're counting and all the values lie within range in \((0, 1)\), the widget below shows how the distribution of noise and expected error change as \(\epsilon\) varies.

Note that the error is additive and so we can make claims about the absolute error, but not the relative error of the final stochastic result.

(ε, δ)-Differential Privacy

(ε, δ)-differential privacy is a mathematical guarantee that extends the concept of pure epsilon-differential privacy by allowing for a small probability of failure, with a second privacy parameter \(\delta\). Just as we described pure DP in our previous section, it also ensures that the outcome of any analysis is nearly the same, regardless of whether any individual's data is present, but further includes a small allowance for a cryptographically small chance of error.

Formally, the privacy guarantee is now quantified using both \(\epsilon\) (epsilon) and also \(\delta\) (delta). A randomized algorithm \(M\) is \((\epsilon, \delta)\)-differentially private if for all neighboring datasets \(D_1\) and \(D_2\) (differing in at most one element), and for all subsets of outputs \(S \subseteq \text{Range}(M)\)

\[ \Pr[M(D_1) \in S] \leq e^{\epsilon} \cdot \Pr[M(D_2) \in S] + \delta \]

The following widget describes the expected error for noise added under \((\epsilon, \delta)\)-DP.

Intuition of (ε, δ)-Differential Privacy

Zero-Concentrated Differential Privacy

Zero-Concentrated Differential Privacy (zCDP) introduces a parameter \(\rho\) (rho) to measure the concentration of privacy loss around its expected value, enabling tighter control over cumulative privacy loss in repeated or iterative analyses. This makes zCDP particularly useful in applications that require multiple queries or iterative data use.

Formally, a randomized algorithm \(M\) satisfies \(\rho\)-zCDP such that for neighboring datasets \(D_1\) and \(D_2\) (differing in at most one element), and for all \(\alpha\) in (1, ∞), the following holds:

\[ D_{\alpha}(M(D_1) \parallel M(D_2)) \leq \rho \alpha \]

Defining the Trust Model

The Local Model

The local model in differential privacy, as defined in the ISO/IEC , is a threat model that provides strong privacy guarantees before data is collected by a central entity. In this model, each user adds noise to their own data locally (for example, on their own phone or laptop, before it is sent to a processing server). This ensures their privacy is protected even if the data is intercepted in transit or in the case they do not trust the central curator.

Since the noise is added very early in the pipeline, local differential privacy trades off usability and accuracy for stronger individual privacy guarantees. This means that while each user's data is protected even before it reaches the central server, the aggregated results might be less accurate compared to global differential privacy where noise is added after data aggregation.

In local differential privacy, each data subject applies randomization as a form of disclosure control locally before sharing their outputs with the central aggregator.

The Central Model

In contrast to the previous section, the central model refers to a setting where privacy mechanisms are applied centrally, after data collection. In this model, individuals provide raw data and place their trust in a curator, who is responsible for applying privacy protections during downstream processing. This approach is often referred to as the global model or server model, as defined in the ISO/IEC .

In global differential privacy, each data subject shares their private information with a trusted aggregator, who applies randomization as a disclosure control before the data is shared more broadly.

Trusted vs Adversarial Curator

When defining a threat model, a central consideration is how much trust we place in the curator. A trusted curator is assumed to correctly implement differential privacy (DP), whereas an adversarial curator is assumed to actively attempt to breach privacy.

These notions are closely tied to the locality of the DP model, previously defined as local and global differential privacy. In the local DP model, no trust is placed in the central curator, making it compatible with an adversarial curator—since privacy protections are applied by the user directly. In contrast, the global DP model relies on the curator to enforce privacy guarantees, and therefore assumes the curator is fully trusted.

Static vs Interactive Releases

These two concepts refer to the frequency of publishing differentially private (DP) statistics. A static release involves publishing a single, fixed output with no further interaction. In contrast, interactive releases allow for repeated access, for example, by supporting multiple queries on the dataset over time. While static releases are simpler to manage, interactive releases can offer greater utility but require more robust privacy accounting for each query due to the effects of composition.

Event, Group and Entity Privacy

An important aspect of differential privacy is clearly defining what we aim to protect. Typically, our goal is to protect the atomic data subjects of a dataset, such as individuals, businesses, or other entities. However, depending on the structure of the dataset, rows may represent different types of information, and individual subjects may influence multiple records.

Event-level privacy, as described in , applies when we aim to protect each individual row in the dataset. A row may represent either a full data subject or a single event, for example, a credit card transaction.

Group privacy arises in settings where multiple data subjects are linked in such a way that we wish to protect the contribution of the entire group. A common example is a household in a census dataset. Finally, there is entity-level privacy. Similar to group-level privacy, this is when multiple records can be linked to a single entity. An example of this would be credit card transactions. One data subject may have zero or multiple transactions associated with them, thus in order to protect the privacy of the entity we need to limit the effect of all records associated with each entity.

From a technical perspective, the mechanics of the tooling to deal with groups and entities are the same so their terminology is often used interchangeably.

Multiple Parties and Collusions

Involving multiple parties in DP releases requires careful accounting of the privacy budget. Just as we consider an adversarial curator in the threat model, we must also account for the possibility that a group of analysts may collude to undermine the intended privacy guarantees.

In practice, collusion refers to a scenario where multiple analysts—each allocated a portion of the overall privacy budget—collaborate to combine their queries. Through composition, this can amplify the information they extract from the dataset.

Periodical Releases

This concept, closely related to the area of continual observation, involves generating multiple differentially private releases for a dataset that evolves over time. Achieving this can be challenging as each release must be carefully accounted for in the privacy budget. Organizations that support DP analysis of continually updated datasets, such as some listed in our table, must carefully manage privacy budgets across both users and time periods to ensure sustained protection.