In a recent perspective, published in Nature Genetics, Amalio Telenti and Xiaoqian Jiang argue that aspects of data generation, infrastructure and management are the key pillars of a modern data ecosystem. Thereby, this makes medical data a durable asset.
Researchers in academia and medical institutions use medical data in order to improve human health. For insurance companies, technology giants and countries, both medical and health data are considered valuable commodities. However, the accessibility and usability of medical data progresses slowly. As a result, this affects population and clinical genomic research which rely on the quality of data and access to a large number of individuals.
Incorporating genome analysis data with electronic medical record (EMR) data increases its value. For many well-supported initiatives, such as gnomAD, access to metadata is absent or limited. In addition, in many cases, the data itself only reflects a snapshot of a person in time. Telenti and Jiang emphasise that these limitations are still present despite medical institutions generating large amounts of qualitative and quantitative data.
Different cultures and technical worlds that use medical data often lack unified nomenclature. In addition, an understanding of effective and responsible use of information remains incomplete. Telenti and Jiang believe that progress is unequal. They highlight that there has been an increase in regulations and requirements that aim to uphold ethical use and protection of data, an increase in large-scale data that is highly digitised (e.g. medical images) and progress in artificial intelligence. However, bottlenecks only exist at very basic levels of data management, including storage and retrieval. There is also little awareness and lack of implementation of emerging technologies for data protection and secure analytics.
Data management and infrastructure
EMRs are complex and contain unstructured, structured and processed data. However, they do not include easy access to raw data. A classical EMR could contain 3.5 gigabytes of data compared to deep-phenotyping medical exams that may generate more than 120 gigabytes of data for a single individual. Telenti and Jiang believe that this diversity of records and fragmentation of data across platforms poses challenges within downstream analysis. They suggest that modernising the very basic infrastructure for data acquisition, storage and management is a top priority. Specifically, they call for standards for data lakes, databases and application programming interfaces to support diverse users.
Large volumes of data are continuously being generated by imaging and genomics, with hospital IT systems seemingly unprepared. Adoption of cloud services in the medical field is still being questioned. Telenti and Jiang highlight that cloud services must sign a business-associate agreement and must ensure that they are Health Insurance Portability and Accountability Act of 1996 (HIPAA) compliant. These services typically offer encryption of data in transit and at rest and require two-step authentication.
Access to data is generally not considered from an end-user or patient perspective. Telenti and Jiang argue that access to EMRs should be easily accessible for patients. Therefore, these files should also be built using consumer-centric technologies. They believe that this could be extended to clinical or biomedical researchers who could use search engines to rapidly query genome variants.
EMRs store a lot of information that can be turned into more valuable resources through processing and analysis. The increasing acquisition and storage of medical records in digital form has garnered a lot of traction from machine learners. For example, popular EMR learning algorithms include convolutional neural networks for image analysis. Telenti and Jiang suggest that new algorithms could improve classification accuracy (the ratio of the number of correct predictions to the total number of input samples). For example, when Google Translate changed from using statistical models to neural networks, the average accuracy of translation between English and other languages increased by 7%. However, Telenti and Jiang highlight that better algorithms may not substitute for the absence of large amounts of labelled, quality data, domain knowledge or solid data infrastructures.
Data portability and ownership
Healthcare data is unique; it can be copied and disseminated quickly, it is additive, non-depletive and replicable. Nonetheless, these properties create unique challenges in controlling data assets. Data portability is the user’s right to control free movement of their data between alternative service providers and is a critical component in the European Union’s General Data Protection Regulation (GDPR). Disruption of data portability can be impacted by the interests of data ‘controllers’, such as direct-to-consumer testing companies, complexity of IT infrastructure and lack of universal information exchange protocols. Telenti and Jiang believe a major issue for data portability is the adoption of standards. Although they do not discuss in detail the associated liability in managing data or potential risks of release of private information, Telenti and Jiang suggest that an informed-consent mechanism or educational approach is important to inform the general population about these ethical issues.
EMRs and electronic health records (EHRs) are often used interchangeably yet not all health attributes are medical. EHRs are a large concept as they include aspects such as activity, behavioural patterns and diet preferences. Health records are generated by individuals and belong to individuals as an asset; whereas, medical records are shared between providers and patients to provide necessary care and support. Personal health data is of interest to researchers, insurance companies and pharmaceutical companies to gain a deeper understanding of diagnostics, disease development and potential treatment options. Estimates suggest that the healthcare data analytics market is expected to grow to $47.7 billion by 2024.
Solving issues of privacy protection will be key for broad access to medical data. Telenti and Jiang discuss an emerging category of encryption frameworks called encrypted operations. These allow operations to be performed on encrypted data without exposing their content. Results are then returned in an encrypted format, ensuring zero information leakage during the entire life cycle of the data.
Homomorphic encryption is used when users want to access the analytic model built by others. It outsources encrypted data which a single entity could store and run algorithms upon. Secure multiparty computation has been applied in pharmacological collaboration and genome-wide association analysis. It converts data into secrets distributed among multiple entities to support secure collaboration. The secure enclave model is highly flexible and can accommodate many tasks. However, it does not offer the same level of security protection as the other frameworks. These frameworks can be used in combination to provide better efficiency for certain tasks.
Bringing algorithms to data
Many hospitals are now moving their data to the cloud. As a result, third-party algorithms can be run on the data without moving it. For end-users, edge devices, such as smartphones, are becoming increasingly powerful. Many apps are detecting personal health information, such as exercise, to analyse personal health. Telenti and Jiang believe that bringing algorithms to data has a dual benefit for distributing the computation and protecting privacy.
Another opportunity of portable medical and health data is the ability to create smart contracts with third parties. The emergence of blockchain technology provides personal vaults to store digital information, providing an ecosystem for information exchange and utilisation. However, Telenti and Jiang note that the combination of data and algorithms moving to edge devices has profound implications for the health industry because individuals will become in full control of their data. As a result, medical service providers may no longer act as central data custodians for personal medical information.
Improved data infrastructures have been created that support access to the heterogenous content within clinical records. Telenti and Jiang believe that the next challenges are within the implementation and acceptance of encryption technologies that allow for collaboration and research, without compromising data security and privacy. The opportunities of medical data extend to the private domain, including apps, where there is potential to secure ownership of data. In summary, Telenti and Jiang emphasise that data is a durable asset that has value that extends beyond the original purpose of why it was collected.