Article: In wake of the Schrems II, CNIL challenges use of Microsoft cloud storage to host public health data lakes (the Health Data Hub case – Part 1 and 2)

Good HoganLovell summary of French DataHub case.

https://www.engage.hoganlovells.com/knowledgeservices/news/in-wake-of-the-schrems-ii-cnil-challenges-the-use-of-microsoft-cloud-storage-to-host-public-health-data-lakes-the-health-data-hub-case-part-1_1

https://www.engage.hoganlovells.com/knowledgeservices/news/french-court-refuses-to-suspend-microsofts-hosting-of-a-public-health-data-lake-despite-cnil-opinion-the-health-data-hub-case-part-2

EDPB: Criteria for an acceptable DPIA

From Annex 2 of wp248 rev.01 Guidelines on Data Protection Impact Assessment (DPIA) and determining whether processing is “likely to result in a high risk” for the purposes of Regulation 2016/679 at https://ec.europa.eu/newsroom/article29/items/611236:

Annex 2 – Criteria for an acceptable DPIA
The WP29 proposes the following criteria which data controllers can use to assess whether or not a DPIA, or a methodology to carry out a DPIA, is sufficiently comprehensive to comply with the GDPR:

  • a systematic description of the processing is provided (Article 35(7)(a)):
    • nature, scope, context and purposes of the processing are taken into account (recital 90);
    • personal data, recipients and period for which the personal data will be stored are recorded;
    • a functional description of the processing operation is provided;
    • the assets on which personal data rely (hardware, software, networks, people, paper or paper transmission channels) are identified;
    • compliance with approved codes of conduct is taken into account (Article 35(8));
  • necessity and proportionality are assessed (Article 35(7)(b)):
    • measures envisaged to comply with the Regulation are determined (Article 35(7)(d) and recital 90), taking into account:
      • measures contributing to the proportionality and the necessity of the processing on the basis of:
      • specified, explicit and legitimate purpose(s) (Article 5(1)(b));
      • lawfulness of processing (Article 6);
      • adequate, relevant and limited to what is necessary data (Article 5(1)(c));
      • limited storage duration (Article 5(1)(e));
    • measures contributing to the rights of the data subjects:
      • information provided to the data subject (Articles 12, 13 and 14);
      • right of access and to data portability (Articles 15 and 20);
      • right to rectification and to erasure (Articles 16, 17 and 19);
      • right to object and to restriction of processing (Article 18, 19 and 21);
      • relationships with processors (Article 28);
      • safeguards surrounding international transfer(s) (Chapter V);
      • prior consultation (Article 36).
  • risks to the rights and freedoms of data subjects are managed (Article 35(7)(c)):
    • origin, nature, particularity and severity of the risks are appreciated (cf. recital 84) or, more specifically, for each risk (illegitimate access, undesired modification, and disappearance of data) from the perspective of the data subjects:
      • risks sources are taken into account (recital 90);
      • potential impacts to the rights and freedoms of data subjects are identified in case of events including illegitimate access, undesired modification and disappearance of data;
      • threats that could lead to illegitimate access, undesired modification and disappearance of data are identified;
      • likelihood and severity are estimated (recital 90);
    • measures envisaged to treat those risks are determined (Article 35(7)(d) and recital 90);
  • interested parties are involved:
    • the advice of the DPO is sought (Article 35(2));
    • the views of data subjects or their representatives are sought, where appropriate (Article 35(9)).

Bavaria: Data Protection Checklists (incl. Guidance on TOMs)

The DPA of Bavaria has published the following checklists (in German)
at https://www.lda.bayern.de/de/checklisten.html:

Paper: Bitkom: Anonymisierung und Pseudonymisierung von Daten für Projekte des maschinellen Lernens

Anonymization and Pseudonymization of data used in Machine Learning Projects

https://www.bitkom.org/sites/default/files/2020-10/201002_lf_anonymisierung-und-pseudonymisierung-von-daten.pdf

Examples given:

  • Processing of geolocation profiles (movements)
  • Google’s COVID-19 Community Mobility Reports
  • De-coupled pseudonyms, e.g. for manufacurers remote monitoring machine performance at customers
  • Speech recognition as example of federated learning
  • Anonmyization and pseudonymization of medical text data using Natural Language Processing
  • Use of sematic anonymization of sensitive data with inference-based AI and active ontolgies in the financial industry

Key words:

    • Anonymization of structured data
        • Approaches
        • Aggregation approach
          • Generalization, Microaggregation
          • k-anonymity, l-diversity, t-closeness
          • Mondrian algorithm, MDAV method (Maximum Distance to Average Vector)
        • Randomization approach
          • Adding noise
        • Synthetic approach
          • (Creating a synthetic model based on original data to generate “matching” synthetic data)
      • Attacks
        • Was personal data of a known person used to genrate the anonymous data?
        • Which data in the anonymous data relates to personal data of a known person?
        • Predicting attributes of a known person
      • Static anonymization, Dynamic anonymization, Interactive anonymization
      • Pseudonymization
        • Format preserving encryption, Tokenization, Trusted third party, Pseudonymous Authentication (PAUTH), Oblivious transfer
      • Anonymization of texts
        • Ensure that free text inlcudes no identifying terms (e.g. via organizational measures)
        • Masking of identifying terms as part of post-processing
        • Structuring via Natural Language Processing
        • Caveat: Author might be identifiable based on writing style
      • Anonymization of multimedia data
      • Privacy via on-prem analysis and decentralization (see also: federated learning)
        • Homomorphic encryption: fully homomorphic, partially homomorphic, somewhat homomorphic
        • Secure multi-party computation
        • Garbled circuits
      • Privacy risks related to machine learning and controls
        • Identification of persons
        • Deanonmymization of data (e.g. of blurred images)
        • Memmbership inference
        • Model inversion
        • Defeating noise, others..
    • Federated learning
      • (Moving algorithms to the local data – instead of moving data to central algorithm)
      • (Local data doesn’t leave device)
      • AI models as personal data
      • Legal advantages of federated learning
    • Attacks and controls
      • Model inversion
        • Querying the trained AI model to reconstruct its training data
      • Membership inference
        • Was a given data point used to train the model?
      • Model extraction
        • “Stealing” the trained model – by cloning the behaviour and predictive capabilities of a given AI model
      • Adversial examples (creating inputs that trigger unintended responses)
      • Countermeasures
        • Restriction son outputs
        • Adversarial Regularization
        • Distillation
        • Differential Privacy
        • Cryptography
        • Secure multi-party computation (MPC)
        • Federated machine learning
        • Differential Private Data Synthesis (DIPS) (e.g. via Copula functions, Generative Adversarial Networks)