On the other hand, global efforts on the 2020/2021 census round are peaking right now, with many important (and urgent) contact points to this article. This serves two distinct motivations: On the one hand, treating only unweighted counts simplifies many technicalities without touching key issues of the noise discussion. We further focus on population and census-like statistics with typical outputs being unweighted person counts possibly arranged in contingency tables. This article aims to first address all these different notions separately, and then to present a consolidated discussion from both risk and utility perspectives. On the other hand, strictly differentially private output mechanisms require unbounded noise distributions with infinite tails, which may have particularly negative effects on utility. For instance, the cell key method originally proposed by Fraser and Wooton (2005), Marley and Leaver (2011), and Thompson, Broadfoot, and Elazar (2013) can be turned into a (relaxed) differentially private mechanism ( Bailie and Chien 2019). However, many other noisy output mechanisms, using bounded or unbounded noise distributions, can be set up to give at least a relaxed differential privacy guarantee too ( Dwork, Kenthapadi, et al. Now a first strict line must be drawn between differential privacy as a risk measure, and differentially private (noisy) output mechanisms that are engineered to manifestly guarantee a given differential privacy level. (2008), Hardt and Talwar (2010), Ghosh, Roughgarden, and Sundararajan (2012), Dwork and Roth (2014), Dwork and Rothblum (2016), and Rinott, O’Keefe, Shlomo, and Skinner (2018). Differentially private noise mechanisms were then picked up and developed further to test and improve its use for (official) statistics see for example, Machanavajjhala et al. 2006) initially as a rigorous privacy or risk measure addressing consequences from the database reconstruction theorem. In its wake, “differential privacy” was proposed in 2006 ( Dwork, McSherry, et al. BACKGROUNDĪfter Dinur and Nissim (2003) published their seminal database reconstruction theorem almost two decades ago, it has shaped and accelerated research activities across many domains involved with data protection, data privacy, and confidentiality, including disclosure control in official statistics. The article finally notes that strictly differentially private approaches are severely over-constrained by such a combined risk/utility analysis in typical census scenarios like EU 2021, that is, there is no straightforward parameter choice to simultaneously ensure acceptable risk and utility properties of the statistical output. We also present methods to infer quantitative limits on generic noise parameters, like noise variance and magnitude bound. Then some noise distributions and output mechanisms currently discussed for population statistics and censuses are analyzed from typical risk and utility perspectives. The analysis also shows that strictly differentially private mechanisms would be severely constrained in this scenario.įirst, the article carefully delimits various statistical confidentiality concepts that often get mixed up, such as risk measure, noise distribution, and output mechanism.
Finally, the article analyses some typical attack scenarios to constrain generic noise parameter ranges that suggest a good risk/utility compromise for the 2021 EU census output scenario. On the other hand, bounded noise distributions, such as the truncated Laplace or the cell key method, can contribute effectively to safeguarding these unique census features while controlling disclosure risks in census-like outputs. In particular, it is argued that unbounded noise distributions, such as plain Laplace, may jeopardize key unique census features without a clear need from a risk perspective. The article also remarks on utility and risk aspects of some specific output mechanisms and parameter setups, with special attention on static outputs that are rather typical in official population statistics. A particular focus is on a stringent delineation between different concepts influencing the discussion: we separate clearly between risk measures, noise distributions, and output mechanisms-putting these concepts into scope and into relation with each other. The article discusses various approaches to statistical disclosure control based on random noise that are currently being discussed for official population statistics and censuses.