Responsible Data Guidelines
Managing Privacy and Personally Identifiable Information in the Research Project Data LifecycleIntro
These Guidelines are intended to assist agricultural researchers handle privacy andPersonally Identifiable Information (PII) in the research project data lifecycle. They are premised on the following:
- research participants have the right to know and consent to the collection of information that can directly or potentially identify them, including for what purposes it will be used, who it will be shared with and why;
- any actual or potential harms associated with loss of privacy should be ethically acceptable, fully disclosed and should not be excessive in relation to the positive impacts of using PII;
- privacy protection and confidential handling of PII is paramount unless waived or reduced by a research participant to advance legitimate research objectives; and
- research ethics often involves consideration of principles that contain inherent tensions, such as individual privacy protection vs. societal benefit, leading to difficult decisions requiring professional judgement and which are the responsibility of individual researchers and their institutions.
These Guidelines are voluntary and aspirational in nature, intended as an aid for responsible decision making. Researchers need to be pragmatic in striking a balance between open data and privacy protection in order to maximize the benefits in agronomy and international development offered by the ‘big data revolution’, while minimizing the potential for social or personal harm. These Guidelines are not intended to be exhaustive and are no substitute for a robust institutional framework to guide and operationalize decision making concerning privacy, ethics and the handling of PII in the research project data lifecycle1.
Further information about the the method and evolution of the guidelines has been provided in the Background and Method section of this page
1 For example, as highlighted in a 2018 UN Global Pulse Report ‘Building Ethics into Privacy Frameworks For Big Data and AI’ operationalizing an institutional framework requires three key pillars: 1) a flexible approach to creating a privacy and ethics framework; 2) data ethics leadership within the organization; and 3) establishing tools for ethics impact assessments or risk assessments that incorporate ethics, to consistently evaluate company-wide ethics approaches.
These Guidelines are not meant to be exhaustive so if you have other tips, resources, comments to add, please email bigdata@cgiar.org.
“Data can be either useful or perfectly anonymous, but never both.”
Paul Ohm – UCLA Law review 2010 57 UCLA L. Rev. 1701).
Guidelines for the Data Cycle
1. PLAN AHEAD
Read more
When: Planning & Approval (and continuously thereafter)
Develop a Data Management Plan to govern use of PII in the research project and beyond. Consider how PII will be used, stored, published, shared, archived or discarded, and take into account any compliance requirements that may apply to the collection or handling of PII (i.e. whether legal, regulatory, institutional or contractual).
2. ANONYMIZE PII OR AVOID ITS COLLECTION
Read more
When: Planning & Approval | Collection
Only use PII if it is absolutely necessary to advance the legitimate research objectives of the project. You can maximize the participant’s privacy and minimize your compliance burden by anonymizing PII at the outset or not collecting PII in the first place.
3. MINIMIZE PII AS FAR AS POSSIBLE
Read more
When: Planning & Approval | Collection
Minimize PII and its handling to the extent absolutely necessary for research project, (i.e. beyond which will significantly impair the data’s analytic potential, research objectives or benefit to the participant).
4. DO NO HARM
Read more
When: Planning & Approval | Collection
Research ethics call for the safety of research participants and their communities to be prioritized above all other concerns. Risks associated with privacy loss and whether they are ethically acceptable should be evaluated in the context of each project. Projects which expose research participants to significant or disproportionate risk of harm should be subject to independent ethics review and approval.
|
Read more
When: Collection | Storage
Research participants should be empowered to give, deny or revoke their consent to share PII based on a clear understanding of why, how, by whom and for how long their data will be used. Ensure consent is fully informed by being as transparent as possible regarding the research objectives; the specific purpose(s) for which PII will be used; how PII will be protected; benefits and risks to the research participant, their community and the public at large.
|
Read more
When: Storage | Reuse and Transfer
Maximizing privacy minimizes risk, therefore PII should be handled confidentially unless a research participant has explicitly consented to the contrary. Ensure appropriate IT & security controls to protect the confidentiality of PII at rest and in transit. Transfers of PII should be undertaken on a confidential basis subject to appropriate legal and technological controls, and pro-privacy analytical tools should be used whenever feasible to do so.
7. USE PII FAIRLY
Read more
When: Storage | Use and Transfer
Use PII fairly and in accordance with the participant’s consent
Check to ensure your use of the data is compatible with the purpose specification and scope consented to by the research participant, including any limitations or authorizations they may have specified or should reasonably expect regarding the use of their PII.
|
Read more
When: Publishing and discovery
Public-use datasets containing PII are the exception
As a general rule, public datasets should be anonymized to maximize privacy and minimize risk. PII should be included only if absolutely necessary to preserve the data’s analytic potential, scientific utility or benefit to the participant, subject to prior informed consent and rigorous risk assessment.
9. ARCHIVE OR DELETE PII
Read more
When: Archiving / Discarding
PII should be retained only for as long as is necessary to achieve a research objectives. All copies of PII should be deleted once no longer needed. However, recognizing the value of certain PII to persist with associated data (e.g. geolocation), if long-term or indefinite retention is justified, this should be clearly explained and explicitly consented to by research participants, subject to appropriate privacy and data security safeguards.
10. REVIEW REGULARLY
Read more
When: Continuously
Periodically review the compliance landscape and seek expert support
Privacy protection and ethical research standards are fast evolving to keep pace with the rapid pace of technological change driven by Big Data. Periodically review institutional and other compliance requirements and don’t be shy in seeking support from subject matter experts at your institution. The Big Data Platform may also be able to connect you with knowledge resources or experts to help address any challenges you are facing.
Key Concepts at a Glance
Familiarity with the following key concepts will assist in understanding these Guidelines. More detailed coverage of each concept is available in Privacy Protection and Ethics in the Handling of PII: Key Concepts.
- Personally identifiable information (PII)
- Research ethics
- Informed Consent
- Compliance
- Privacy by design and data minimization
- Privacy protection and data security
- Data subject rights
Personally identifiable information (PII) refers to information that can be used to identify, distinguish or trace an individual, whether on its own or in combination with other information. It includes information which can be used to directly identify an individual (e.g. unique information such as a passport number) as well information that can potentially identify or indirectly identify an individual when coupled with additional information (e.g. sufficiently correlated location data such as a home address or an electronic address or id of a personal device).
Research ethics refers to standards of conduct expected of scientific researchers, particularly in relation to ethical issues concerning research involving human subjects. Maximizing privacy and minimizing risk are considered a cornerstone of research ethics. Many scientific institutions have internal committees (variously referred to as institutional review board (IRB), independent ethics committee (IEC), ethical review board (ERB), or research ethics board (REB)) responsible for reviewing and approving the methods proposed for research to ensure that they are ethical.
Informed Consent refers to the any freely given and informed indication of an agreement by the data subject to the processing of their PII, which may be given either by a written or oral statement or by a clear affirmative action. Informed consent is also considered a cornerstone of research ethics.
Compliance in the context of research ethics and privacy protection refers to a broad range of sources that may impose obligations regarding the collection and handling of personally identifiable information. For example, such obligations may arise pursuant to national or regional legal frameworks (e.g. General Data Protection Regulation of the EU), Institutional policies and guidelines (e.g. as may address privacy, confidential information and/or or ethics review by Institutional Review Boards), as well as contractual requirements imposed by Donors and third-party (e.g. pursuant to funding, collaboration or data-sharing agreements).
‘Privacy by design’ refers to a methodology which places data protection at the heart of the design and building of systems and processes which involve collecting or processing of personal data, whereas ‘data minimization’ refers to the practice of limiting the collection of personally identifiable information to that which is directly relevant and necessary to accomplish a specified purpose and ensuring that such data is not retained longer than is necessary. In the context of research ethics and privacy protection project these concepts seek to maximize privacy and data protection to the extent necessary to achieve the legitimate scientific purpose underpinning the research project, and prioritize the safeguarding of research participants ahead of the public interest in promoting open data and scientific advancement.
Privacy protection and data security: a key principle underpinning the handling of PII is that data at rest, in transit, and in use should be proactively managed so as to protect privacy and ensure its security. Privacy protection includes data minimization measures intended to reduce the risk of identification, such as anonymization, aggregation, pseudonymization, encryption. Data security includes organizational, technical and physical measures for the access, control and protection of personal data throughout the data lifecycle.
Data subject rights in the context of ethics and privacy protection refers to the right of human participants in research projects (i.e. data subjects) to consent to the use of their PII for specified purposes and to retain rights related to subsequent use of the PII. These rights typically relate to access, processing, repurposing, rectification, erasure, portability decision making, profiling, compensation and damages.
Personally identifiable information (PII) refers to information that can be used to identify, distinguish or trace an individual, whether on its own or in combination with other information. It includes information which can be used to directly identify an individual (e.g. unique information such as a passport number) as well information that can potentially identify or indirectly identify an individual when coupled with additional information (e.g. sufficiently correlated location data such as a home address or an electronic address or id of a personal device).
Research ethics refers to standards of conduct expected of scientific researchers, particularly in relation to ethical issues concerning research involving human subjects. Maximizing privacy and minimizing risk are considered a cornerstone of research ethics. Many scientific institutions have internal committees (variously referred to as institutional review board (IRB), independent ethics committee (IEC), ethical review board (ERB), or research ethics board (REB)) responsible for reviewing and approving the methods proposed for research to ensure that they are ethical.
Informed Consent refers to the any freely given and informed indication of an agreement by the data subject to the processing of their PII, which may be given either by a written or oral statement or by a clear affirmative action. Informed consent is also considered a cornerstone of research ethics.
Compliance in the context of research ethics and privacy protection refers to a broad range of sources that may impose obligations regarding the collection and handling of personally identifiable information. For example, such obligations may arise pursuant to national or regional legal frameworks (e.g. General Data Protection Regulation of the EU), Institutional policies and guidelines (e.g. as may address privacy, confidential information and/or or ethics review by Institutional Review Boards), as well as contractual requirements imposed by Donors and third-party (e.g. pursuant to funding, collaboration or data-sharing agreements).
Privacy by design and data minimization: ‘privacy by design’ refers to a methodology which places data protection at the heart of the design and building of systems and processes which involve collecting or processing of personal data, whereas ‘data minimization’ refers to the practice of limiting the collection of personally identifiable information to that which is directly relevant and necessary to accomplish a specified purpose and ensuring that such data is not retained longer than is necessary. In the context of research ethics and privacy protection project these concepts seek to maximize privacy and data protection to the extent necessary to achieve the legitimate scientific purpose underpinning the research project, and prioritize the safeguarding of research participants ahead of the public interest in promoting open data and scientific advancement.
Privacy protection and data security: a key principle underpinning the handling of PII is that data at rest, in transit, and in use should be proactively managed so as to protect privacy and ensure its security. Privacy protection includes data minimization measures intended to reduce the risk of identification, such as anonymization, aggregation, pseudonymization, encryption. Data security includes organizational, technical and physical measures for the access, control and protection of personal data throughout the data lifecycle.
Data subject rights in the context of ethics and privacy protection refers to the right of human participants in research projects (i.e. data subjects) to consent to the use of their PII for specified purposes and to retain rights related to subsequent use of the PII. These rights typically relate to access, processing, repurposing, rectification, erasure, portability decision making, profiling, compensation and damages.
Background and Method
The CGIAR Platform for Big Data in Agriculture & Responsible Data Management
The CGIAR Platform for Big Data in Agriculture (the Platform) advocates open data for agricultural research for development. It considers that opening up research data for scrutiny and reuse confers significant benefits to society including accelerated scientific advancement, economic growth, and increased resource efficiency, strengthened public support for research funding and increasing public trust in research. The Platform also considers that open data, together with the application of new technologies and analytical approaches, will catalyze a ‘big data revolution’ in agronomy and international development and is helping to position CGIAR at the forefront of such revolution by supporting an open and interactive community among open-science researchers, open-source developers, and the broader scientific community committed to opening up research data on a FAIR basis (i.e. in line with FAIR Guiding Principles to make data findable, accessible, interoperable and reusable).
However, the Platform appreciates that not all research data can be open and that a broad range of legitimate circumstances may require data to be restricted, for example, to address ethical considerations, confidentiality, privacy, proprietary or intellectual property rights, biodiversity-related access and benefit-sharing rights, security, public interest, among others.
As an integral component of its advocacy for open data, the Platform promotes responsible data management through the entire research data lifecycle from planning, collecting, storing, disclosing or publishing, transferring, discovery and archiving. This requires ongoing due diligence regarding legal, ethical and regulatory frameworks and disciplinary norms. Responsible data management need not be restrictive; in fact, anticipating issues that may arise in the data lifecycle allows data to be managed in ways that maximize trust and value, while minimizing risk.
Method
These guidelines were created from information collected from: review on best and emerging practices across various sectors in the fast changing landscape of privacy and ethics (130 external resources); privacy and ethic materials sourced from seven CGIAR centers; first draft was circulated for input and feedback across CGIAR and incorporated into this edition. It’s important to note that this is an evolving document, the next stage is to consult externally for further input.
PII at the crossroad of open and responsible data
The use of personally identifiable information (PII) in a research project gives rise to potential tensions between the Platform’s dual commitment to ‘open’ and ‘responsible’ data. Such tensions arise because including PII in open data may conflict with the ethical responsibility to protect the privacy of a participant and counter the risk of harm. Faced with this dilemma the safest approach is to strip the data of PII, that is, to de-identify the data in order to anonymize it so that individuals are no longer identifiable. However, while anonymization maximizes privacy and minimizes risk to research participants, it can also compromise the analytic potential and scientific utility of the data, the research objectives or benefits that may accrue to the participant.
Resolving these tensions requires careful consideration. In collecting or using PII there is always a balance to be made between utility and risk as “data can be either useful or perfectly anonymous, but never both” (Paul Ohm – UCLA Law review 2010).
This delicate balancing act must take into account not only the potential risks and harms that could result from improper action, but also those that could arise from inaction – i.e. the consequences of ‘misuse’ as well as ‘missed use’.
Additional Resources
Ag Data Transparent
By American Farm Bureau Federation | 2014
A set of privacy and security principles for companies collecting, storing, analyzing, and using, farmer data. “Core Principles” serve as basic guidelines that ag tech providers should follow when collecting, using, storing, and transferring farmers’ ag data. Includes Ag Data Transparent certification.
Anonymisation Decision-making Framework
By The UK Anonymisation Network (UKAN) | 2016
We present in full here for the first time the Anonymisation Decision-Making Framework, which can be applied, perhaps with minor modifications to the detail, to just about any data where confidentiality is an issue but sharing is valuable.
Anonymisation: managing data protection risk code of practice
By UK's Information Commissioner's Office (ICO) |
The code explains the issues surrounding the anonymisation of personal data, and the disclosure of data once it has been anonymised. It explains the relevant legal concepts and tests in the Data Protection Act 1998 (DPA). The code provides good practice advice that will be relevant to all organisations that need to convert personal data into a form in which individuals are no longer identifiable.
Anonymize Data Containing Personally Identifiable Information
By Paul Hendricks | 2016
Allows users to quickly and easily anonymize data containing Personally Identifiable Information (PII) through convenience functions. Online tool available on CRAN repository.
Building Ethics into Privacy Frameworks for Big Data and AI
By UN Global Pulse and the International Association of Privacy Professionals (IAPP) | 2018
The report provides an overview of how organizations can operationalize data ethics. It also discusses requirements and challenges with acquiring data access for scientific research, and how this could be done with due regard and protection of privacy.
Credits:
Author: Rodrigo Sara, 2019
Concept and web implementation: Stefanie Neno
Photos: 1 and 2 – Stephanie Malyon, 3 and 5 – Georgina Smith, 4 – Manon Koningstein
To reference these guidelines:
Guidelines for Managing PII in the Research Project Data Lifecycle (2018), by Rodrigo Sara on behalf of CGIAR Platform for Big Data in Agriculture.