Overview
This project at a not-for-profit technology company called Crisis Text Line utilized a data sharing agreement surrounding their text-based crisis intervention messaging software to promote secure and ethical data sharing with academic researchers.
Collaborators
Nitya Kanuri, BA
Bob Filbin, MA
Carlos Gallo, PhD
Madelyn Gould, MPH, PhD
Lisa Soleymani Lehmann, MD, PhD
Robert Levine, MD
John E. Marcotte, PhD
Brian Pascal, JD
David Rousseau, MPH
Shairi Turner, MD, MPH
Shirley Yen, PhD
Megan L. Ranney, MD, MPH
Status/Stage of Development
Planning
Funding Sources
Funding for this project came from the following sources:
The Robert Wood Johnson Foundation
The American Foundation for Suicide Prevention, Youth Who Text a Crisis Line: Understanding Needs and Help-seeking: Grant SRG-0-110-15
The National Institute of Mental Health: Grants K23MH101449 and K23MH098566
The National Institute on Drug Abuse: Grant DA027828
Practice Setting
Other: Not-for-profit Technology Company
National/Policy Context
- Many technology companies collect large amounts of health data from users. These collections of data have the potential to be useful for both science and society, but companies are often reluctant to share this data with academic researchers due to concerns about ethics and security.
- There is a lack of guidance for these companies on how to share data with academic researchers in an ethical and secure manner.
Local/Organizational Context
- Crisis Text Line (CTL) is a not-for-profit technology company that provides free, 24/7 text-based crisis intervention for individuals in the US. CTL addresses issues such as suicidality, bullying, self-harm, family conflict, and depression.
- CTL is the largest provider of text-based crisis intervention in the US, contacting more than 800,000 individuals with more than 1.7 million conversations. CTL tracks information on these conversations, including transcripts, metadata (timestamps and area codes), and post-conversation surveys.
- CTL sought to develop an effective way to share their data with researchers, both for scientific/societal benefits as well as to improve their own services.
- A pilot program was created to design and implement a framework of principles for effective, ethical, and secure user data sharing between CTL and academic researchers.
Patient Population Served and Payor Information
CTL serves any individual with text messaging capability in the US, regardless of insurance coverage. They do not charge individuals who utilize their services. For individuals with cell phone plans with AT&T, T-Mobile, Sprint, or Verizon, texting is free; for individuals with other carriers, individuals may have to pay standard text messaging rates.
Project Research + Planning
- CTL developed a pilot program to create protocols and necessary infrastructure for secure and ethical data sharing. This process was completed through the following steps:
- 1. Creation of an ethics committee: The CTL advisory board used literature reviews and personal recommendations to create a 15-member ethics committee of academic and technology sector experts in data security, research ethics, mobile health interventions, and psychology.
- The group, spanning 13 institutions, was chaired by CTL’s chief data scientist and collaborated in-person and via email to first identify challenges in research ethics and then form policy recommendation for potential solutions.
- Ultimately, these challenges included: (1) research ethics (respect for persons, beneficence, and justice), (2) user confidentiality and preventing possible reidentification/deanonymization of user data, (3) data security, and (4) business challenges (business reputation/perception and technology/administrative costs).
- The group, spanning 13 institutions, was chaired by CTL’s chief data scientist and collaborated in-person and via email to first identify challenges in research ethics and then form policy recommendation for potential solutions.
- 2. Creation of data-sharing principles and protocols: A set of key principles and protocols for data sharing was created for each of the identified challenges stated above.
- For instance, for the challenge of user confidentiality, a protocol was established to not only clear all data of personally identifiable information (e.g., names, addresses, emails, and social media handles) but also transform or coarsen any data that could potentially harm texter confidentiality (e.g., university name). The entire list of protocols is shown in Table 1 of the article by Pisani et al.
- 3. Development of infrastructure and program announcement: CTL hired an open data collaborations (ODC) manager to set up the needed infrastructure for these protocols. The ODC manager’s role was to work with potential research teams throughout the application and data use process (i.e., application review, negotiating data use agreements, onboarding to the data access environment). CTL announced the new data access program in February 2016 and began receiving and reviewing applications for data access from research teams.
- They received more than 100 applications over 5 quarters, with 20 ultimately accepted applications. The time for review took 9 months initially, but this was cut down to 3 months.
- 4. Iterative refinement: The ODC manager, data ethics committee, and CTL board and staff iteratively refined the application and review process for data access applications from academic researchers.
- Applications were reviewed first by CTL staff for team competency, proposal feasibility and value, texter confidentiality, data security, and research ethics.
- The proposals that were advanced were then reviewed by the data ethics committee using a standardized scoring rubric. The technical infrastructure was also iteratively refined.
- 5. Launch of pilot program: Approved teams were onboarded to access a custom dataset via an Amazon Web Services (AWS) virtual data enclave, a large cloud platform for data storage and computing.
- The ODC manager provided ongoing technical support for research teams and reviewed data outputs and publications for potential security/confidentiality breaches.
- 1. Creation of an ethics committee: The CTL advisory board used literature reviews and personal recommendations to create a 15-member ethics committee of academic and technology sector experts in data security, research ethics, mobile health interventions, and psychology.
- Evaluation of pilot program results: The program was evaluated based on the following criteria: value (to science and the organization), ethical principles and policies, ability to share data while maintaining data confidentiality, ability to provide secure access while maintaining control of data, and ability to support program with adequate financial, human, and infrastructure resources.
- Based on this evaluation, CTL found the pilot program to be promising, but limited specifically by a high financial and human resource burden, particularly given that CTL’s primary focus is not data sharing, but rather the crisis intervention services they provide.
- Exploration of alternative methods: The structure of the pilot program maximized openness to applications in order to encourage teams to apply. However, this structure was ultimately too costly in terms of financial, human resource, and infrastructure costs. In order to address this, CTL considered two alternative methods for allocation of resources. These models both have the advantage of requiring fewer resources, but both limit the scope of potential research.
- (1) Residential fellowship model: Researchers apply for a smaller number of residential fellowships, during which they have on-site data access. This would decrease potential teams and impose a geographical barrier, but would reduce resources required.
- (2) Research partnerships: CTL fosters close ongoing collaboration with select research partners only. This model has the additional advantage of increasing CTL’s involvement in research projects.
- CTL also considered two alternative management methods to reduce resource requirements, neither of which were ultimately implemented: (1) Use of a third-party vendor for data management, and (2) creation of a separate company created to focus on data sharing to decrease distraction and cost for the parent organization (CTL).
Tools or Products Developed
- Standardized scoring rubric: A standardized scoring system and associated guidance, which was iteratively refined, was created for the committee members to use when assessing research team applications for data access.
Training
- CTL hired an open data collaborations (ODC) manager, who worked with potential research teams throughout the application and data use process as described above in “Research and Planning – Development of infrastructure and program announcement.”
Workflow Steps
- The model that CTL ultimately decided upon after performing several iterations of refinement and modification is as follows: (graphical representation in Figure 1 below)
- Organizations first must determine whether the data that they have collected is valuable to science and whether sharing this data with researchers will further the organization’s goals.
- The organization must then answer the following 4 questions:
- Do we have access to research and ethics expertise to review policies, protocols, and proposals?
- Are our data of such a type that they can be deidentified effectively and shared with researchers in a manner that protects user confidentiality?
- Can we offer secure portals for accessing data?
- Do we have the financial resources, the human capital, and the physical and digital infrastructure to support ethical sharing of data without undermining other organizational priorities?
- Organizations that answer all four of these questions affirmatively are more likely to be successful in developing an appropriate data-sharing program.
- The organization must then decide on an appropriate model for data sharing. CTL outlines 3 potential models for data sharing programs: (1) open sharing program, (2) residential researcher, and (3) selective research partnership. As described above (see “Research and Planning – Exploration of alternative methods”), the open sharing program maximizes openness to applications and a broad research scope, but also requires the highest financial, human resource, and infrastructure cost. The second two models have the advantage of lower resource requirement but have the disadvantage of a more narrow research scope.
- The residential researcher model allows researchers to apply for a smaller number of residential fellowships, during which they have on-site data access. The research partnerships model fosters close ongoing collaboration with select research partners only.
- The organization must then develop protocols for ethical and secure data sharing, using the guidance of research and ethics experts to develop policies and protocols and review research applications. The protocols developed and used by CTL are listed in Table 1 of the article by Pisani et al.
Outcomes
- The program was evaluated based on the following criteria: value (to science and the organization), ethical principles and policies, ability to share data while maintaining data confidentiality, ability to provide secure access while maintaining control of data, and ability to support program with adequate financial, human, and infrastructure resources.
Benefits
- CTL developed a framework for ethical and secure data sharing between technology companies and academic researchers, providing a set of principles and protocols that can help guide other technology companies in effective data sharing.
- In addition, by addressing the challenges they faced in developing this framework, CTL explored a wide range of options for models and structures that may be used for data sharing. This guidance will assist technology companies in sharing user data with researchers and significantly expand the amount of health data available for research.
- Analysis of collected user data from technology companies not only has scientific and societal benefits, but can also help the company improve the services it provides.
Intervention-Specific Challenges
- The cost of maintenance was underestimated, costing as much if not more than the initial development/start-up cost. This included data hosting costs for each team (~$500/month). Furthermore, this cost is expected to increase as further refinement of infrastructure will still be needed.
- Human resources for program management and team support were underestimated, as teams needed custom datasets, ongoing support for analysis and troubleshooting, and review of research outputs for potential security/confidentiality breaches.
- These challenges were addressed by exploration and introduction of alternative methods for resource allocation, as detailed above in “Research and Planning – Exploration of alternative methods.”
- CTL also noted two additional limitations. First, for organizations that collect data passively (i.e., GPS information) instead of actively (i.e., a user actively shares information about a mental health crisis), there are additional complexities that must be considered in maintaining user privacy and obtaining informed consent. Second, organizations that have larger and more complex datasets must be aware that research teams will have high resource requirements to handle this data, and data must be presented to research teams in a format that is as accessible as possible.
Glossary
- Amazon Web Services (AWS): “[T]he world’s most comprehensive and broadly adopted cloud platform, offering over 165 fully featured services from data centers globally.”
- Virtual Private Network (VPN): “[A]n encrypted connection over the Internet from a device to a network. The encrypted connection helps ensure that sensitive data is safely transmitted. It prevents unauthorized people from eavesdropping on the traffic and allows the user to conduct work remotely. VPN technology is widely used in corporate environments.”
- Open Data Collaborations (ODC) manager: The ODC manager was hired to set up the infrastructure needed to fulfill the protocols developed for ethical and secure data sharing. The ODC manager also worked with potential research teams throughout the application and data use process, including application review, negotiating data use agreements, onboarding to the data access environment, providing ongoing technical support during data access, and reviewing data outputs and publications for potential security/confidentiality breaches.
Sources
- Pisani AR, Kanuri N, Filbin B, Gallo C, Gould M, Lehmann LS, Levine R, Marcotte JE, Pascal B, Rousseau D, Turner S, Yen S, Ranney ML. Protecting User Privacy and Rights in Academic Data-Sharing Partnerships: Principles From a Pilot Program at Crisis Text Line. J Med Internet Res. 2019 Jan 17;21(1):e11507. doi: 10.2196/11507. PubMed PMID: 30664452; PubMed Central PMCID: PMC6354196.
- Crisis Text Line. (n.d.). FAQ. Retrieved May 24, 2019, from https://www.crisistextline.org/faq
- Amazon Web Services, Inc. (2019). What is AWS. Retrieved May 24, 2019, from https://aws.amazon.com/what-is-aws/
- Cisco. (n.d.). What Is a VPN? – Virtual Private Network. Retrieved May 24, 2019, from https://www.cisco.com/c/en/us/products/security/vpn-endpoint-security-clients/what-is-vpn.html