Responsible Data Guidelines

Managing Privacy and Personally Identifiable Information in the Research Project Data Lifecycle

Intro

These Guidelines are intended to assist agricultural researchers handle privacy andPersonally Identifiable Information (PII) in the research project data lifecycle. They are premised on the following:

  • research participants have the right to know and consent to the collection of information that can directly or potentially identify them, including for what purposes it will be used, who it will be shared with and why;
  • any actual or potential harms associated with loss of privacy should be ethically acceptable, fully disclosed and should not be excessive in relation to the positive impacts of using PII;
  • privacy protection and confidential handling of PII is paramount unless waived or reduced by a research participant to advance legitimate research objectives; and
  • research ethics often involves consideration of principles that contain inherent tensions, such as individual privacy protection vs. societal benefit, leading to difficult decisions requiring professional judgement and which are the responsibility of individual researchers and their institutions.

These Guidelines are voluntary and aspirational in nature, intended as an aid for responsible decision making. Researchers need to be pragmatic in striking a balance between open data and privacy protection in order to maximize the benefits in agronomy and international development offered by the ‘big data revolution’, while minimizing the potential for social or personal harm. These Guidelines are not intended to be exhaustive and are no substitute for a robust institutional framework to guide and operationalize decision making concerning privacy, ethics and the handling of PII in the research project data lifecycle1.

Further information about the the method and evolution of the guidelines has been provided in the Background and Method section of this page

For example, as highlighted in a 2018 UN Global Pulse Report ‘Building Ethics into Privacy Frameworks For Big Data and AI’ operationalizing an institutional framework requires three key pillars: 1) a flexible approach to creating a privacy and ethics framework; 2) data ethics leadership within the organization; and 3) establishing tools for ethics impact assessments or risk assessments that incorporate ethics, to consistently evaluate company-wide ethics approaches.

These Guidelines are not meant to be exhaustive so if you have other tips, resources, comments to add, please email bigdata@cgiar.org.

“Data can be either useful or perfectly anonymous, but never both.”

Paul Ohm – UCLA Law review 2010 57 UCLA L. Rev. 1701).

Guidelines for the Data Cycle

 1. PLAN AHEAD

Read more

When: Planning & Approval (and continuously thereafter)

Develop a Data Management Plan to govern use of PII in the research project and beyond. Consider how PII will be used, stored, published, shared, archived or discarded, and take into account any compliance requirements that may apply to the collection or handling of PII (i.e. whether legal, regulatory, institutional or contractual).

 2. ANONYMIZE PII OR AVOID ITS COLLECTION

Read more

When: Planning & Approval | Collection

Only use PII if it is absolutely necessary to advance the legitimate research objectives of the project. You can maximize the participant’s privacy and minimize your compliance burden by anonymizing PII at the outset or not collecting PII in the first place.

 3. MINIMIZE PII AS FAR AS POSSIBLE

Read more

When: Planning & Approval | Collection

Minimize PII and its handling to the extent absolutely necessary for research project, (i.e. beyond which will significantly impair the data’s analytic potential, research objectives or benefit to the participant).

 4. DO NO HARM

Read more

When: Planning & Approval | Collection

Research ethics call for the safety of research participants and their communities to be prioritized above all other concerns. Risks associated with privacy loss and whether they are ethically acceptable should be evaluated in the context of each project. Projects which expose research participants to significant or disproportionate risk of harm should be subject to independent ethics review and approval.

  1. OBTAIN INFORMED CONSENT AND BE AS TRANSPARENT AS POSSIBLE

Read more

When: Collection | Storage

Research participants should be empowered to give, deny or revoke their consent to share PII based on a clear understanding of why, how, by whom and for how long their data will be used. Ensure consent is fully informed by being as transparent as possible regarding the research objectives; the specific purpose(s) for which PII will be used; how PII will be protected; benefits and risks to the research participant, their community and the public at large.

  1. Handle PII confidentially, including for transfer/access by third parties

Read more

When: Storage | Reuse and Transfer

Maximizing privacy minimizes risk, therefore PII should be handled confidentially unless a research participant has explicitly consented to the contrary. Ensure appropriate IT & security controls to protect the confidentiality of PII at rest and in transit. Transfers of PII should be undertaken on a confidential basis subject to appropriate legal and technological controls, and pro-privacy analytical tools should be used whenever feasible to do so.

 7. USE PII FAIRLY

Read more

When: Storage | Use and Transfer

Use PII fairly and in accordance with the participant’s consent

Check to ensure your use of the data is compatible with the purpose specification and scope consented to by the research participant, including any limitations or authorizations they may have specified or should reasonably expect regarding the use of their PII.

  1. Public-use datasets containing PII are the exception

Read more

When: Publishing and discovery

Public-use datasets containing PII are the exception

As a general rule, public datasets should be anonymized to maximize privacy and minimize risk. PII should be included only if absolutely necessary to preserve the data’s analytic potential, scientific utility or benefit to the participant, subject to prior informed consent and rigorous risk assessment.

 9. ARCHIVE OR DELETE PII

Read more

When: Archiving / Discarding

PII should be retained only for as long as is necessary to achieve a research objectives. All copies of PII should be deleted once no longer needed. However, recognizing the value of certain PII to persist with associated data (e.g. geolocation), if long-term or indefinite retention is justified, this should be clearly explained and explicitly consented to by research participants, subject to appropriate privacy and data security safeguards.

 10. REVIEW REGULARLY

Read more

When: Continuously

Periodically review the compliance landscape and seek expert support

Privacy protection and ethical research standards are fast evolving to keep pace with the rapid pace of technological change driven by Big Data. Periodically review institutional and other compliance requirements and don’t be shy in seeking support from subject matter experts at your institution. The Big Data Platform may also be able to connect you with knowledge resources or experts to help address any challenges you are facing.

PLANNING & APPROVAL


Click for Tips

COLLECTION

Click for Tips

STORAGE & ANALYSIS

Click for Tips

PUBLISHING & DISCOVERY

Click for Tips

ARCHIVING / DISCARDING

Click for Tips

REUSE & TRANSFER

Click for Tips

Key Concepts at a Glance

Familiarity with the following key concepts will assist in understanding these Guidelines. More detailed coverage of each concept is available in Privacy Protection and Ethics in the Handling of PII: Key Concepts.

Personally identifiable information (PII) refers to information that can be used to identify, distinguish or trace an individual, whether on its own or in combination with other information. It includes information which can be used to directly identify an individual (e.g. unique information such as a passport number) as well information that can potentially identify or indirectly identify an individual when coupled with additional information (e.g. sufficiently correlated location data such as a home address or an electronic address or id of a personal device). 

Research ethics refers to standards of conduct expected of scientific researchers, particularly in relation to ethical issues concerning research involving human subjects. Maximizing privacy and minimizing risk are considered a cornerstone of research ethics. Many scientific institutions have internal committees (variously referred to as institutional review board (IRB), independent ethics committee (IEC), ethical review board (ERB), or research ethics board (REB)) responsible for reviewing and approving the methods proposed for research to ensure that they are ethical.

Informed Consent refers to the any freely given and informed indication of an agreement by the data subject to the processing of their PII, which may be given either by a written or oral statement or by a clear affirmative action. Informed consent is also considered a cornerstone of research ethics.

Compliance in the context of research ethics and privacy protection refers to a broad range of sources that may impose obligations regarding the collection and handling of personally identifiable information. For example, such obligations may arise pursuant to national or regional legal frameworks (e.g. General Data Protection Regulation of the EU), Institutional policies and guidelines (e.g. as may address privacy, confidential information and/or or ethics review by Institutional Review Boards), as well as contractual requirements imposed by Donors and third-party (e.g. pursuant to funding, collaboration or data-sharing agreements).

Privacy by design’ refers to a methodology which places data protection at the heart of the design and building of systems and processes which involve collecting or processing of personal data,  whereas  ‘data minimization’ refers to the practice of limiting the collection of personally identifiable information to that which is directly relevant and necessary to accomplish a specified purpose and ensuring that such data is not retained longer than is necessary. In the context of research ethics and privacy protection project these concepts seek to maximize privacy and data protection to the extent necessary to achieve the legitimate scientific purpose underpinning the research project, and prioritize the safeguarding of research participants ahead of the public interest in promoting open data and scientific advancement.

Privacy protection and data security: a key principle underpinning the handling of PII is that data at rest, in transit, and in use should be proactively managed so as to protect privacy and ensure its security. Privacy protection includes data minimization measures intended to reduce the risk of identification, such as anonymization, aggregation, pseudonymization, encryption. Data security includes organizational, technical and physical measures for the access, control and protection of personal data throughout the data lifecycle.

Data subject rights in the context of ethics and privacy protection refers to the right of human participants in research projects (i.e. data subjects) to consent to the use of their PII for specified purposes and to retain rights related to subsequent use of the PII. These rights typically relate to access, processing, repurposing, rectification, erasure, portability decision making, profiling, compensation and damages.

Personally identifiable information (PII) refers to information that can be used to identify, distinguish or trace an individual, whether on its own or in combination with other information. It includes information which can be used to directly identify an individual (e.g. unique information such as a passport number) as well information that can potentially identify or indirectly identify an individual when coupled with additional information (e.g. sufficiently correlated location data such as a home address or an electronic address or id of a personal device). 

Research ethics refers to standards of conduct expected of scientific researchers, particularly in relation to ethical issues concerning research involving human subjects. Maximizing privacy and minimizing risk are considered a cornerstone of research ethics. Many scientific institutions have internal committees (variously referred to as institutional review board (IRB), independent ethics committee (IEC), ethical review board (ERB), or research ethics board (REB)) responsible for reviewing and approving the methods proposed for research to ensure that they are ethical. 

Informed Consent refers to the any freely given and informed indication of an agreement by the data subject to the processing of their PII, which may be given either by a written or oral statement or by a clear affirmative action. Informed consent is also considered a cornerstone of research ethics. 

Compliance in the context of research ethics and privacy protection refers to a broad range of sources that may impose obligations regarding the collection and handling of personally identifiable information. For example, such obligations may arise pursuant to national or regional legal frameworks (e.g. General Data Protection Regulation of the EU), Institutional policies and guidelines (e.g. as may address privacy, confidential information and/or or ethics review by Institutional Review Boards), as well as contractual requirements imposed by Donors and third-party (e.g. pursuant to funding, collaboration or data-sharing agreements). 

Privacy by design and data minimization: ‘privacy by design’ refers to a methodology which places data protection at the heart of the design and building of systems and processes which involve collecting or processing of personal data,  whereas  ‘data minimization’ refers to the practice of limiting the collection of personally identifiable information to that which is directly relevant and necessary to accomplish a specified purpose and ensuring that such data is not retained longer than is necessary. In the context of research ethics and privacy protection project these concepts seek to maximize privacy and data protection to the extent necessary to achieve the legitimate scientific purpose underpinning the research project, and prioritize the safeguarding of research participants ahead of the public interest in promoting open data and scientific advancement.

Privacy protection and data security: a key principle underpinning the handling of PII is that data at rest, in transit, and in use should be proactively managed so as to protect privacy and ensure its security. Privacy protection includes data minimization measures intended to reduce the risk of identification, such as anonymization, aggregation, pseudonymization, encryption. Data security includes organizational, technical and physical measures for the access, control and protection of personal data throughout the data lifecycle. 

Data subject rights in the context of ethics and privacy protection refers to the right of human participants in research projects (i.e. data subjects) to consent to the use of their PII for specified purposes and to retain rights related to subsequent use of the PII. These rights typically relate to access, processing, repurposing, rectification, erasure, portability decision making, profiling, compensation and damages. 

Background and Method

The CGIAR Platform for Big Data in Agriculture & Responsible Data Management

The CGIAR Platform for Big Data in Agriculture (the Platform) advocates open data for agricultural research for development. It considers that opening up research data for scrutiny and reuse confers significant benefits to society including accelerated scientific advancement, economic growth, and increased resource efficiency, strengthened public support for research funding and increasing public trust in research. The Platform also considers that open data, together with the application of new technologies and analytical approaches, will catalyze a big data revolution in agronomy and international development and is helping to position CGIAR at the forefront of such revolution by supporting an open and interactive community among open-science researchers, open-source developers, and the broader scientific community committed to opening up research data on a FAIR basis (i.e. in line with FAIR Guiding Principles to make data findable, accessible, interoperable and reusable).

However, the Platform appreciates that not all research data can be open and that a broad range of legitimate circumstances may require data to be restricted, for example, to address ethical considerations, confidentiality, privacy, proprietary or intellectual property rights, biodiversity-related access and benefit-sharing rights, security, public interest, among others.

As an integral component of its advocacy for open data, the Platform promotes responsible data management through the entire research data lifecycle from planning, collecting, storing, disclosing or publishing, transferring, discovery and archiving. This requires ongoing due diligence regarding legal, ethical and regulatory frameworks and disciplinary norms. Responsible data management need not be restrictive; in fact, anticipating issues that may arise in the data lifecycle allows data to be managed in ways that maximize trust and value, while minimizing risk.

Method

These guidelines were created from information collected from: review on best and emerging practices across various sectors in the fast changing landscape of privacy and ethics (130 external resources); privacy and ethic materials sourced from seven CGIAR centers; first draft was circulated for input and feedback across CGIAR and incorporated into this edition. It’s important to note that this is an evolving document, the next stage is to consult externally for further input.

PII at the crossroad of open and responsible data

The use of personally identifiable information (PII) in a research project gives rise to potential tensions between the Platform’s dual commitment to ‘open’ and ‘responsible’ data. Such tensions arise because including PII in open data may conflict with the ethical responsibility to protect the privacy of a participant and counter the risk of harm. Faced with this dilemma the safest approach is to strip the data of PII, that is, to de-identify the data in order to anonymize it so that individuals are no longer identifiable. However, while anonymization maximizes privacy and minimizes risk to research participants, it can also compromise the analytic potential and scientific utility of the data, the research objectives or benefits that may accrue to the participant.

Resolving these tensions requires careful consideration. In collecting or using PII there is always a balance to be made between utility and risk as “data can be either useful or perfectly anonymous, but never both” (Paul Ohm – UCLA Law review 2010).

This delicate balancing act must take into account not only the potential risks and harms that could result from improper action, but also those that could arise from inaction – i.e. the consequences of ‘misuse’ as well as ‘missed use’.

Additional Resources

What are you looking for?
Use the filters:
What are you looking for?
Use the filters:

Credits:
Author: Rodrigo Sara, 2019
Concept and web implementation: Stefanie Neno
Photos: 1 and 2 – Stephanie Malyon, 3 and 5 – Georgina Smith, 4 – Manon Koningstein

To reference these guidelines:
Guidelines for Managing PII in the Research Project Data Lifecycle (2018), by Rodrigo Sara on behalf of CGIAR Platform for Big Data in Agriculture.