EDPS IPEN workshop on Synthetic Data (on 16 June 2021)

Link to workshop with slides and videos:

A few gems:

Unsorted links from chat

Various statements from chat

  • Of course, there are things to consider and, for example in 2019 there was an array of papers criticizing that differential privacy in general can re-inforce biases: “But it turns out that in reality the matter is actually much more complicated, as pointed out by latest research highlighting an inherent relationship between privacy and fairness. In fact, it becomes apparent that guaranteeing fairness under differentially private AI model training is impossible when one wants to maintain high accuracy. Such incompatibility of data privacy and fairness would have significant consequences. With respect to the potential of unfairness of some of the standard deep learning models, when it comes to fairness, the current differentially private learning methods fare even worse, reinforcing the biases and being even less fair to a great degree. Results like that should not exactly come as a surprise to implementers and deployers of the technology. Hiding data of small groups is actually among the features of differential privacy. In other words, it is not a bug but a feature of differential privacy. However, this feature leading to decrease of precision might not be something desirable in all use cases.” (Source (with link to papers): https://edps.europa.eu/press-publications/press-news/blog/inviting-new-perspectives-data-protection_en
  • But fairness AND privacy most certainly are possible. And when we look at synthetic data in particular, there were some promising presentations and talks about fair synthetic data generation at this year’s ICLR conference (e.g. by Amazon) https://www.jmir.org/2020/11/e23139
  • As might be expected, the speakers’ company gives an elaboration and opinion on synthetic data’s anonymity: https://www.replica-analytics.com/web/default/files/public/tutorials/privacy-law-and-synthetic-data/presentation_html5.html
  • I am a bit surprised by the references. This talk seems to ignore a vast literature on synthetic data (including an advanced analyses of privacy risks), e.g., https://arxiv.org/pdf/2011.07018.pdf ?
  • It is really important you can quantitatively measure the re-identification risk at the output level. Otherwise, there is going to be a lack of confidence that identifiability issues have been truly addressed. lt could be open to abuse if not done properly (i.e. claims that it is not personal data when it in fact it is personal data). Done well -it has great value for certain use cases.
  • totally agree that when epsilon is large, the formal DP guarantee is basically meaningless. That said, two comments: a) you always have to use enough noise so that the epsilon is small, b) sometimes the actual privacy DP gives is stronger in practical attack scenarios, see https://arxiv.org/pdf/2101.04535.pdf
  • Does de-identification require consent under the GDPR and English common law? authored by Khaled el Emam, Mike Hintze and Ruth Boardman
    https://iapp.org/news/a/does-anonymization-or-de-identification-require-consent-under-the-gdpr/ also “Does de-identification require consent under the GDPR and English common law?, Authors: El Emam, Khaled 1 ; Hintze, Mike 2 ; Boardman, Ruth 3 ;Source: Journal of Data Protection & Privacy, Volume 3 / Number 3 / Summer 2020, pp. 291-298(8), Publisher: Henry Stewart Publications, https://www.ingentaconnect.com/content/hsp/jdpp/2020/00000003/00000003/art00007

  • Company: www.intuite.ai
  • Company: statice
  • Gesundheitsdatenschutz.org – GMDS Arbeitsgruppe „Datenschutz und IT-Sicherheit im Gesundheitswesen“ (DIG)

    Really good resource with many useful and free working aides, including

  • DPIA example (hospital information system)
    https://gesundheitsdatenschutz.org/html/dsfa-beispiel.php with DPIA, also Risk matrix (but fails to mention patient impact)

  • On Anonymization and Pseudonymization

  • On dealing with Schrems II

  • Remote Support for medical IT systems (Anforderungen an die (Fern) Wartung medizinischer IT-Systeme)

  • Exchange of health data (Austausch von Gesundheitsdaten – Datenschutzrechtliche Anforderungen an Datenaustauschplattformen im Gesundheitswesen)

  • Guide to a Data Protection Concept (Leitfaden zur Erstellung eines Datenschutzkonzeptes)

  • Guide to a Security Concept(Leitfaden zur Erstellung eines IT-Sicherheitskonzeptes)

  • Guide to a Data Deletion/Data Retention Concept (Leitfaden für die Erstellung von Löschkonzepten im Gesundheitswesen)

  • Guide to monitoring and logging, audit trails (Praxishilfe zur Protokollierung und zur Erstellung von Protokollierungskonzepten im Gesundheitswesen)

  • Medical research (Medizinische Forschung unter der DS-GVO)

  • Clinical registries (Klinische Register und Datenschutz)

  • Clinical studies (Datenschutz bei Klinischen Studien)

  • Checklists and templates

  • Working aides (Praxishilfen)
    A massive list of links .. e.g. for data protection impact assessments!
  • EU Commission published new Standard Data Protection Clauses for international data transfers

    New set of Standard Data Protection Clauses (SCC) for international data transfers allowing businesses to transfer personal data to non-EU countries!

    Standard contractual clauses for controllers and processors(incl. SCC in Annex)

  • Standard contractual clauses for international transfers

  • The 2020 GDPR evaluation report

    “These texts are final working documents. The only official text will be the one that will be published in the Official Journal in the coming days. ”



  • Germany: Coordinated assessment of international data transfers (Schrems II)

    Several German Supervisory Authorities are setting out on a coordinated assessment of international data transfers – starting with questionnaires being send to some companies.

    Here are the questionnaires (in German):

    In detail:
    * on email – Zum Einsatz von Dienstleistern zum E-Mail-Versand (PDF)
    * on web hosting – Zum Einsatz von Dienstleistern zum Hosting von Internet-Seiten (PDF)
    * on web tracking – Zum Einsatz von Webtracking (PDF)
    * on processing of job applicant data – Zum Einsatz von Dienstleistern zur Verwaltung von Bewerberdaten (PDF)
    * on enterprise internal exchange of customer/employee data – Zum konzerninternen Austausch von Kundendaten und Daten der Beschäftigten (PDF)