Building an Anonymization Pipeline

Book description

How can you use data in a way that protects individual privacy but still provides useful and meaningful analytics? With this practical book, data architects and engineers will learn how to establish and integrate secure, repeatable anonymization processes into their data flows and analytics in a sustainable manner.

Luk Arbuckle and Khaled El Emam from Privacy Analytics explore end-to-end solutions for anonymizing device and IoT data, based on collection models and use cases that address real business needs. These examples come from some of the most demanding data environments, such as healthcare, using approaches that have withstood the test of time.

  • Create anonymization solutions diverse enough to cover a spectrum of use cases
  • Match your solutions to the data you use, the people you share it with, and your analysis goals
  • Build anonymization pipelines around various data collection models to cover different business needs
  • Generate an anonymized version of original data or use an analytics platform to generate anonymized outputs
  • Examine the ethical issues around the use of anonymized data

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why We Wrote This Book
    2. Who This Book Was Written For
    3. How This Book Is Organized
    4. Conventions Used in This Book
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
  2. 1. Introduction
    1. Identifiability
    2. Getting to Terms
      1. Laws and Regulations
      2. States of Data
    3. Anonymization as Data Protection
      1. Approval or Consent
      2. Purpose Specification
      3. Re-identification Attacks
    4. Anonymization in Practice
    5. Final Thoughts
  3. 2. Identifiability Spectrum
    1. Legal Landscape
    2. Disclosure Risk
      1. Types of Disclosure
      2. Dimensions of Data Privacy
    3. Re-identification Science
      1. Defined Population
      2. Direction of Matching
      3. Structure of Data
    4. Overall Identifiability
    5. Final Thoughts
  4. 3. A Practical Risk-Management Framework
    1. Five Safes of Anonymization
      1. Safe Projects
      2. Safe People
      3. Safe Settings
      4. Safe Data
      5. Safe Outputs
    2. Five Safes in Practice
    3. Final Thoughts
  5. 4. Identified Data
    1. Requirements Gathering
      1. Use Cases
      2. Data Flows
      3. Data and Data Subjects
    2. From Primary to Secondary Use
      1. Dealing with Direct Identifiers
      2. Dealing with Indirect Identifiers
      3. From Identified to Anonymized
      4. Mixing Identified with Anonymized
      5. Applying Anonymized to Identified
    3. Final Thoughts
  6. 5. Pseudonymized Data
    1. Data Protection and Legal Authority
      1. Pseudonymized Services
      2. Legal Authority
      3. Legitimate Interests
    2. A First Step to Anonymization
    3. Revisiting Primary to Secondary Use
      1. Analytics Platforms
      2. Synthetic Data
      3. Biometric Identifiers
    4. Final Thoughts
  7. 6. Anonymized Data
    1. Identifiability Spectrum Revisited
      1. Making the Connection
    2. Anonymized at Source
      1. Additional Sources of Data
    3. Pooling Anonymized Data
      1. Pros/Cons of Collecting at Source
      2. Methods of Collecting at Source
      3. Safe Pooling
      4. Access to the Stored Data
    4. Feeding Source Anonymization
    5. Final Thoughts
  8. 7. Safe Use
    1. Foundations of Trust
    2. Trust in Algorithms
      1. Techniques of AIML
      2. Technical Challenges
      3. Algorithms Failing on Trust
    3. Principles of Responsible AIML
    4. Governance and Oversight
      1. Privacy Ethics
      2. Data Monitoring
    5. Final Thoughts
  9. Index

Product information

  • Title: Building an Anonymization Pipeline
  • Author(s): Luk Arbuckle, Khaled El Emam
  • Release date: April 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492053439