The next generation EHR is here and it comes with many new features
One of the most promising developments in digital health and in the application of artificial intelligence techniques in medicine over the past decade, has been the systematic establishment of a global conceptual and methodological platform for the so-called Big data applications, in healthcare and medical research, and most recently Long data applications, which is short for longitudinal.
This article presents an analysis showing that this platform and component contributions have in fact for quite a while been serving the next generation of Electronic Health Records (EHR), which were originally conceived to serve the needs of applications such as the ones found in the Big data paradigm. This new generation which is already underway is thus a result of developments in this area, such as outcomes-based research, longitudinal data, blockchain technologies and new data sharing governance models.
Big data applications are picking up the pace in digital health, and with the blockchain for healthcare market forecast to gain considerable market growth over the next few years, this is an opportune time and framework to reassert a key requirement for interoperable EHRs and Big data quality and reliability: that of maintaining persistence in the clinical context and the structural semantics within peers as much as from peer to peer, toward an EHR paradigm as a global distributed and semantically federated data source (DIFEDS) or data sharing ecosystem.
Data storage has changed fundamentally over the past decade, and so have business and societal models around data. Access to global data can cure cancer, stop pandemics, and discover vaccines and cures to pandemics in a fraction of the time. As systems scale up to distributed, semantically federated health data platforms, come risks and threats - information leaks and data breaches, but mostly an inevitable compromise in data reliability and equitable access. Data governance is expected to curb these issues and with the capacity to manage global data and global health events now well understood and studied, delivering global data semantics is now an imperative. We have to act now to exploit this momentum for change, to enable seamless connectivity and data sharing and this cannot be done without addressing the issue of federated data quality and reliability, without context standardisation and content regulation for coherent clinical data. This is now a strategic priority for the next generation of EHRs and for digital health innovation platforms such as Big data.
Clinical context for data semantics revisited
Throughout the ongoing pandemic, artificial intelligence in medicine, a pioneer in basic and applied research in the field, has been once again under the spotlight, primarily on account of opportunities and challenges in a very broad application domain, that of Big data.
Big data, such as is the amount of tweets per day, combine a suite of multidisciplinary techniques that originate in computational science, biomedical informatics, data science and cognitive science, including data mining, machine learning, and natural language processing, the latter of which has evolved into very powerful conversational AI applications or chatbots. One such application which was used in national large-scale deployments in the context of the pandemic is the IBM Watson platform (1, 2).
Big data is growing fast.
Developments in the domain over the past decade, coupled with the urgency for actionable results in clinical research related to the SARS-Cov-2 and COVID19, have led to the establishment of a new paradigm in medical data use, one which contrary to intimidation leverages the rapid rate of patient data proliferation and harnesses modern computational power to mine knowledge and develop advanced health intelligence and research assets.
Regardless of scale, spanning populations or cohorts, said health intelligence and clinical research applications draw their data from care records and fragments of records that document outcomes and outcome-related data, in order to test and support various clinical reasoning hypotheses or research hypotheses with real-life evidence, thereby advancing new paradigms in medical research (3, 4), healthcare (5) and digital health. On a first read this is the purpose of any AI powered digital health application. There is however a fundamental difference between the new and the old. Big data are here to rediscover clinical context and semantics and reinvent the way in which these can be built into globally distributed real-world data, during searches, using AI powered techniques, or by distributing federated semantics as part of a global EHR, as proposed in this article. In each case, new data governance paradigms, models, policies and regulation are needed and are already underway.
Same old Artificial Intelligence.
Older applications of AI are now almost ubiquitous in modern healthcare systems and collaborate with limited scope data sets, such as those coming from Electronic Medical Records (EMR), to assist with routine healthcare provider tasks such as screening and classification for triage (6) or risk classification (7), or to assist with flagging cases of abnormal radiology that merit further investigation (8, 9, 10). Such tools have been around for a while and are successful mostly because they face clinicians in narrow speciality domains, instead of facing patients (in the latter case as automatons that would replace clinicians), and mostly because clinicians they face have been convinced they ease the burden of simple routine tasks.
Utility, safety, trust, acceptance.
Proving utility to clinicians, and health systems, and gaining their trust and acceptance, is a lengthy process, same as in the case of pharmaceuticals regulation in the industry. However, when cognizant tools are not viewed as automatons or robots, one can demonstrate utility and safety relatively easily, by adjusting hypothesis testing sensitivity values subjectively rather than objectively, or based on policy.
For instance, with case flagging tools, hypothesis testing sensitivity might be adjusted to flag only those cases which merit further investigation based on a set of criteria, while accepting the risk that some might go undetected. A thoroughly reported such case is the use of polymerase chain reaction tests to detect SARS-Cov-2 infections based on variable criteria for inclusion and exclusion (11). What is expectedly characteristic in the evaluation of AI tools, of their knowledge bases, and of the reasoning results in such cases, is an observed exacerbation of existing biases (6). This is expected in the sense that machines do as instructed to do, unless they are instructed to learn.
Longitudinality found in larger and deeper pools of data.
Big data AI applications are no exception to this rule, as they embody heuristic bias in their designs and thus face the same significant challenges and limitations toward what is clearly a distant future of successful integration into clinical practice as health system automatons. To illustrate this, one may consider the many chatbot applications or algorithms which assist their users with the analysis of their symptomatology per some suspected disease.
More specifically, in order to engage people in health awareness campaigns around COVID-19, as well as, in some cases, in screening and testing campaigns, many chatbots were deployed as part of outbreak management programmes to assist with symptom checking and diagnosis. What has been observed is that, these chatbots have been providing advice which may have diverse healthcare consequent outcomes, ranging from disease diagnosis confirmation to complete dismissal; and this when presented with the same symptoms by the same person (12). Bias, in these cases, may lead to important unintended and adverse consequences for the health system and for health outcomes, even within the bounds of the specific function the applications were designed to support (6, 12). The risks associated with influencing decisions, directly or indirectly, skyrocket when applications are facing patients instead of physicians, in which case the influence is also less evident and the risk harder to mitigate.
So if the set of ethical and safety concerns is same between traditional and Big data decision impacting technology, why the hype? Well, there are two reasons why Big data AI applications have been comparatively successful and intensely researched (13). And these have nothing to do with healthcare support value. One reason is that they have been very efficient in supporting research. The other reason is that, in doing so, they tap into larger and deeper sources of data. Sources which end up being richer than the organised and, at least in theory, readily accessible EMRs. Sources which may support the continuity context.
That makes them very attractive, primarily to the research community.
And if this holds true, it is clearly a very important observation to factor in when mapping out the future of digital health and healthcare innovation.
The argument is that, by tapping into diverse and expanded sources, Big data apps are attempting to solve the problem of continuity in medical or health data and records, and furthermore conclude data journeys with an outcome. This is very important, and even if EMRs and EHRs are designed with best intentions, outcomes cannot be documented within their true clinical context, not without longitudinality integrated into designs as a quality in data collection.
More data doesn't mean better data.
Yet, despite the fundamental conceptual and methodological differences, there are also common pitfalls in the validity of the data which fuel both development and performance.
In many different ways, the Big data paradigm implies attempt to circumvent or work around the complexities involved in data and records interoperability, which has and continues to be a major stumbling block for sustainable progress in digital health and healthcare innovation.
The idea is that Big data applications can avoid being concerned with records design and the complex and dirty details of semantics therein, and instead focus on using computational ability and performance to mine purpose-specific semantics. Nonetheless, the mining is done by AI-driven tools and algorithms and these are prone to the same flaws.
Software regulated as medical devices (SaMD), Big data or not, fail to be admitted into health system routines not only as a matter of design validation, which is in turn a matter of flawed knowledge or reasoning models, but also as a matter of subsequent failure during evaluation or clinical trials, which is in part due to the use or consumption of data within flawed models. Eventually SaMD fail to inform decisions at an acceptable level of safety, efficacy and effectiveness, at least one which is on a par with the level of achievement expected from other medical devices.
On the side of the reasoning engine, there might be flaws in the heuristic knowledge and models built into AI systems, such as evoking strengths with gold standard diagnoses (14), or due to inadequate or poorly construed datasets used for training and testing in machine learning (15).
On the side of data, there is the matter of the validation of the information models and of the data semantics built into models, failure of which to reflect clinical coherence leads to cognitive or reasoning engines that produce poor interpretative results; be that with regard to data used for training and evaluation or data used during reasoning in real-life. In both cases, the rule of Garbage In Garbage Out applies.
As long as Big data apps feed on poor data quality, reliable performance shall never be attained, as computers are good at using context for data interpretation, but cannot distinguish one interpretative context from another.
For instance, an AI cannot tell if an episode of severe postpartum haemorrhage was part of a placenta accreta, increta or percreta episode, unless this information is provided as context for the haemorrhage in the records, something which is not always the case, no matter how simple it sounds.
And this is not about simple data semantics standardisation, such as required for data-level interoperability - for instance in the case of the International Classification of Diseases standard coding, but with the higher level semantics that are built, standardised and deployed to reflect patient-centred care, longitudinality and integration, and eventually clinical coherence through persistent clinical context implementations. These data object relationships have to be stored away with attributes and parameters in care, to ensure context is maintained and tools can carry out useful tasks with accuracy and safety.
Researchers also warn there are potential methodological pitfalls associated with machine learning and other AI approaches to prediction and interpretation, especially those introduced by the overfitting of predictive algorithms, recommending that large amounts of highly contextual data are necessary in order to expose and mitigate those risks (16).
Finally there are the matters of ethics and of ensuring an ethical and competent integration of these tools within routine clinical practice, including the confidentiality of inferred data and consent in use and sharing (19).
Before dwelling further into matters of enablement for sharing, let us consider for a moment Big data uses in order to establish a better understanding of the range of data needed and of the importance of clinical context persistence to enable continuity as a context.
Context is important.
The importance of clinical context is well demonstrated in systems medicine. For example with the role it has in comparative clinical pharmacogenomics studies that show phenotype multiplexing approaches identify greater differences in drug metabolism capacity, than predicted by genotyping alone, with important implications for precision and systems medicine (17, 18).
This observation has been driving one of the better known and promising domains for Big data with AI impact, that of biomedical and clinical pharmacology research, which is geared toward systems medicine and precision or personalised medicine applications deployment. In particular, clinical pharmacogenomics and pharmacogenetics research, exploiting phenotyping, genotyping, and biomarkers development for targeted, precision treatment in, among other applications, cancer, diabetes and biological psychiatry.
Big data is also a powerful tool with which to replace the current approach to clinical trials as well as to investigate healthcare costs in relation to utilization and therapeutic options related to outcomes, among other analyses of clinical economics.
In the area of precision medicine, consider, for example, the enhanced inference capacity delivered by the combination of the study of genetic variations occurring in liver enzymes required for drug metabolism (genotyping) together with the study of enzyme-related pharmacokinetic variations from person to person, and as a result of the individual’s ability to metabolize specific drugs (phenotyping) depending on the former (called heterogeneity of treatment response). Such inference can be used for the purpose of predicting tailored treatment responses, for targeting therapy or determining recommended drug dosages across populations or cohorts; as carried out both by physicians and by pharmaceutical companies.
In this area, recent evaluations have indicated machine learning can be productively integrated with biomarkers-driven, in this case pharmacogenomics-driven, prediction of response to treatment and treatment outcomes (16, 20). In both of these studies, results reported productivity improvements for clinicians in terms of limiting the multitude of factors and measures which are most prognostic of treatment outcomes in a given patient. This is all possible due to context specific data interpretation and context being reliably present in data sources - phenotyping data as opposed to genotyping data.
There are big improvements directly leveraged by the industry too. In pharmaceuticals and vaccine development Big data aims to circumvent the need for enrolling participants in randomised controlled trials or to support with reliable real-world data pharmacovigilance studies to improve, for example, cancer treatment (21). Big data applications have also been leveraging AI to accelerate vaccine research (22, 23, 24), or to read the amino acid sequences forming the vast number of human proteins and protein complexes associated with disease, thereby accelerating the determination of the shape of proteins and revolutionising relevant medical research (25). And then there is the kind of digital phenotyping application which is a touch closer to the futuristic idea of Big data, such as the use of Facebook data to mine emotions in order to predict postpartum depression (26).
The bottom line is that, in addition to enabling new research paradigms and challenging established epistemologies across the sciences (27), Big data enable safer and better AI, by providing large data sets for training algorithms and exposing design flaws (SaMD clinical trials), hence improving overall performance and productivity. And together with delivering measurable improvements in the routine processes they inform and support, Big data applications of all sorts (28) have over the past decade been on a steep rise (13, 29, 30, 31).
There are also indirect benefits coming from Big data analytics and AI applications, which materialise as a result of the effected paradigm shift, and which will be enabling and driving unprecedented digital health innovation for years to come.
On the one side Big data push for richer and bigger data sets, which in turn shall drive the development and adoption of data sharing and governance regulation, models and policies.
On the other side and in the background, as decision and research utility increases, reliability and performance are exposed to more stringent assessment criteria, among which seems to be, once again, the presence of adequate a priori data context and interpretation semantics within data sources, such as patient, treatment planning and outcome contexts.
In the long run data quality and reliability shall improve and this will benefit the entire health sector, including digital health, research, public health and health systems innovation.
A-priori versus a-posteriori semantics enforcement
Across the sciences, Big data came about together with the rapid millennial increase in processor speeds, disk storage space, and the internet connectivity. With Web search engines they have common roots and goals (32), but are also closely interrelated. This is because at the end of the day Big data will be searched (33), Big data will be generated by Web search engines and user searches (34), and Big data will be generated by search engines as business intelligence and vice versa (35, 36). Web search engines also use natural language processing, content processing techniques such as pattern matching and statistical models, semantics, and other AI-driven techniques to process user searches, to return results and relationships from an increasingly machine readable semantic Web (32, 37), to yield information and knowledge (38). Particularly advanced AI techniques are also being researched for Web browser application, for instance deep learning (39).
Albeit this comparison is beyond scope here, it nonetheless points out that the Big data paradigm reflects our connected world technology paradigm, in the sense that it is an innovation which leverages volumes of so-called real-world data (RWD), as opposed to the lean and restricted data managed by institutions such as banks and hospitals.
And given it relies on RWD, the Big data paradigm furthermore reflects a shift from using the hardcoded and disposable data relationships and interpretative contexts, or semantics, of engineered realities such as relational databases, to reusing interoperable data object relationships and semantics in order to create or reconfigure relationships and discover semantics while navigating the real world.
This shift is essentially one which was prompted by the need to move onto information and knowledge discovery on the basis of semi-structured and unstructured data. Structured data have been traditionally relational databases designed with little, if any, reusability in mind. Semi-structured data are electronic documents (XML and JSON) which are abundant in medicine primarily because of HL7. Unstructured data represent around 80% of data and include text, images, audio, or Web pages. This latter, by comparison massively extended data set, often includes public sources such as social media platforms and the data generated during posts and mined for health purposes, including content explicitly shared by users (26).
It is this latter source of data, together with semantic mining, which is mostly associated with the notion of RWD. And RWD in this context seems to be an important component definition of Big data. But it is also one which is far from being standardised and thus subject to considerable variability in interpretation (40). This means it is this type of data which is most in need of semantic mining (a posteriori) or semantic anchoring (a priori) to perform in accordance with expectations, and hence Big data have a long way to go before being established as a prevailing platform paradigm for digital health.
As far as context and semantics are concerned, after its original use, a stored away, single COVID19 lung CT image tells very little in comparison with one captured and stored together with a subsequent outcome, a diagnosis, treatment, comorbidities and previous lung injuries, as well as other clinical information acquired as evidence and context. Of course, if at all reliable, this relationship can be mined or reconstructed during use.
Until now, the experimental, mining model for semantics enforcement using AI driven search and association between RWD, has been prevailing in Big data exploration.
The anchoring model is the future.
To illustrate the need for semantics enforcement, consider the developed definitions. Some researchers define RWD on the basis of the curation of data before being presented to health supporting applications for interpretation or analysis and use. Within this wider context of study related data (41), some researchers adopt a definition which originates in clinical research; specifically to signify data which is collected in a non-randomized controlled trial setting (40). This particular definition is in turn subject to the definition of expectations from curation or may signify a complete departure from traditional study-specific curations. Furthermore, curation may refer to a particular method for organisation or data structure, or might refer to the way in which the data are obtained, or the type of patient the data belongs to, such as, for example, in the case of context-framing a specific retrospective or prospective study.
By adopting a wider view of curation, the above definition escapes the narrow confines of medical research and ventures into the realm of data interoperability and informing real-time clinical decisions.
This definition, however, constitutes a contradiction in terms. Big data are not supposed to require the complex data curation process that characteristically burdens institutions when performing studies for public health and health policy and management. The value rendered by this curation ranges from retrospective annotation, to prospective annotation and ground truths (41a).
A challenging definition.
Another definition which relates to the definition of Big data as a paradigm is that of data which is collected for the purpose of measuring and observing outcomes; specifically (42). And this is the most challenging definition. According to this definition, when information on patients such as symptoms, pathology results, radiology, clinical notes, electronic health records, medical claims or billing activities databases, registries, patient-generated data, mobile devices and other relevant information are linked to an outcome, then associations and relationship can be drawn from these data.
This is the essence of cost-benefit analysis which is hard to come by in today's data environment, due to complex curation requirements and incomplete data sets. The cost may be a service outcome, such as problem resolution by surgery, a diagnostic test, a treatment, a preventative measure, and so on. Big data claims to do this efficiently and reliably by means of a posteriori curation and in real-time. The latter ability is currently more of a target than a reality, as real-time semantic discovery will nonetheless require a priori curation or semantic data anchoring to ensure accuracy and reliability (see example above).
And since the study of outcomes by definition requires data structure, curation, standardisation, or information modelling, and semantics, this definition helps to reduce some of the confusion around the role of structured data in RWD.
Structured data are fundamental.
Without some structure, ontology or semantics, which may either be curated on an ad hoc basis or made interoperable, i.e. generally available for any (re)use, it is impossible to use any data related to outcome, or any data relationship for that matter. This doesn't mean that institutional, provider data-based applications, or Big data applications cannot redefine existing semantics or relationships and thus be used to mine new semantics which are not built into the original data source.
This principle is demonstrated in data exchanges which take place between different ontologies and databases, for example with the use of the Resource Description Framework (RDF) or the HL7 Fast Healthcare Interoperability Resources (FHIR), both used for semantic annotation or re-annotation (43, 44, 45). An example is the use of the RDF to map the ICD-10 ontology and semantics to ICD-9, including a vocabulary for local options and variants (46).
As discussed below, another possibility is to use interoperable semantics; that is semantics which may be reused in any context in an object-oriented design model, and which may be used to mine or rediscover new semantics extremely efficiently.
The idea is that interoperable semantics give rise to data objects which are always shared with their persistent, standardised context data for interpretation and to thus assist in enforcing clinical logical coherence in reannotation and to address precision and reliability in semantics reconstruction.
Simplest example is the ICD standardised coding used to annotate data with persistent semantics. An interoperable semantics system which is part of a DIFEDS such as the internet accessed Big data, would include, for a disease classification object constituting part of a larger interoperable semantics clinical data model (CDM), the object's own semantics definition or annotation details, which include the code system used and a mapping resource identifier. In this direction, Schema.org (37) created for COVID19 a special announcement mark-up which includes some data reported by hospitals to the US CDC (47).
Back to the definitions.
A definition for RWD which combines the purpose of outcome-related research with clinical innovation to scale is: data derived from a number of sources that are associated with outcomes in a heterogeneous patient population in real-world settings. Same as in the case of the precision medicine illustrations above, the innovation comes from the idea of real-world evidence (RWE), which is evidence derived from RWD, denoting observational data obtained outside experimental designs and thus generated during routine clinical practice (42).
Outcomes, scale and routine clinical practice.
When combined, the above definitions of RWD imply the availability of real-time data on healthcare outcomes within a heterogeneous population; data which may also be readily used to conduct and test study experiments prospectively or retrospectively once data safeguards and standards have proliferated into global RWD.
The keynote in defining RWD is thus the elimination of the tedious data doctoring or curation process typically associated with experiments and studies rather than real-world settings. This means semantics must be either mined very efficiently and in near real-time using AI, or built into data.
Building semantics into longitudinal data is the EHR, and that has been failing.
This is essentially where Big data applications come in handy. To bridge the gap between reliable and accurate real-time RWD, with and without observation-specific data curation, Big data applications employ AI-driven techniques and computational power, in order to approach the conditions of real-time intelligence generation and synthesising outcome-related data.
As discussed above, to varied degrees, these techniques involve mining or enforcing semantics. However, with the exponentially growing volume of Big data, efficiency based on AI has its limits. Furthermore there is the matter of AI being unable to distinguish between good and bad quality of data.
This article proposes a solution to this predicament which combines semantic structure for RWD anchoring with a-posteriori semantics enforcement for the problem-specific re-orientation of data. With this approach, data misinterpretation risks are mitigated by means of strong context or semantic expressions which are interoperable and hence transferable as reference information together with content.
The higher the underlying availability of interoperable semantic ontologies, models and data elements, the safer and more accurate this approach becomes.
The capabilities unleashed by an approach to Big data which combines reusable and interoperable Big data, implementing a DIFEDS, on the basis of which other data can be either semantically anchored or more efficiently mined, combined with, AI-driven if necessary, problem-oriented searches, are vast.
The EHR rediscovered.
If one looks closer into the Big data paradigm, it is evident we are observing a time-lapse of a continuous design, innovation, improvement cycle, involving diversion, conversion, discovery and definition, development and delivery (48). For instance, the Semantic Web and RDF supported Web data interoperability certainly points in this direction. Global efforts to develop EHRs led this design diversion and discovery, including matters of longitudinality and clinical context persistence to support coherence, continuity-of-care and care integration. While attempting to deliver those, the world focused on advancing semantic modelling technique and data governance. And with exposure to scale due to Big data, these concepts and methodologies have developed and matured to a point that the EHR can be revisited with enough experience and the next working generation produced to connect global health data into a patient-centric, semantic superset.
If the aim is to tap into a consortium of fully digitised and structured, context-specific data, which holds together within a clinically coherent space, other richer and less digitised data of all sorts, the highest level context being the patient, then clearly there is no difference between a search for RWD, and access to a global array of successful implementations and deployments of the EHR.
Perhaps then this is the most opportune time for an EHR paradigm shift, to depart from the unsuccessful model of local record keeping, or at best national, in order to envisage and deliver a successful model which combines learnings from various Big data components toward a global scale DIFEDS, as the definition of the next-generation EHR.
A round trip from the EHR to big RWD and back
In concept, EHRs are interoperable data collection tools that incorporate both longitudinal data integration - which in turn explicitly includes various outcomes, and the integration of data from multiple providers. Longitudinality builds patient clinical context and integration occurs as a result shared context instances across collaborating providers. For example, a problem managed with the oversight of a GP may be concurrently managed by any number of specialists, to address for instance hypertension and glaucoma in diabetes. Shared context in this instance is created by sourcing data within one integrated record from multiple providers. Shared context is also the history of the particular patient, together with treatments that may interact and affect the decisions of each collaborator.
This essentially means that within a longitudinal arrangement of patient events, a multitude of primary care events - including prevention, vaccination, or screening, pregnancy and delivery, hypertension monitoring and management, and dispensary follow-up, drive and coordinate specialist care events which are observed and documented in the context of the former. An example of extra-institutional data infusion within this formalised contextual organisation of data is the acquisition of measurements from wearables related to a particular condition or health problem which is remotely monitored and managed.
Such formalisms have been under development for years, together with the concepts required to deliver tools for shared and seamless care, and standards are available for implementation (an array of standards developed on the basis of the terminology and concepts of ISO13940), including standards for reannotating data during transmission from one data use entity to another (the HL7 family of messaging interoperability standards).
Building on this conceptual architecture foundation called the EHR, data sharing is furthermore enabled by the new digital data governance paradigms discussed below. Working together, these formalisms and paradigms leverage the patient, embrace the patient, and engage the patient, together with public health and clinical collaborators, as collective data governance agencies; agencies that through their collaboration, they design, standardise and regulate digital health evolution with new uses of data and further data and care sharing implementations and concepts.
With this interoperability and sharing enablement definition of EHRs in mind, Big data applications may simply be defined as ones which develop research and care assets to inform decisions using data from the EHRs of heterogeneous patient populations. And through this pathway, the compartmentalised definition of global health data sharing platforms surfaces, both technological and conceptual, to include standards, models, regulation, governance, legislation and legalisation, ethical sharing and use, institutionalisation, heath and public health policies and systems.
With RWD being a keynote in Big data, the Big data paradigm essentially represents a melting pot of all changes necessary, away from institutional and national boundaries and toward the establishment of global health data sharing platforms. Along the way, new data sharing and governance mechanisms are already being legislated and established. More data is produced along the way and Big data apps drive their further proliferation. And all this means there is a pressing urgency to establish mechanisms that validate data inputs to deliver reliable outputs.
The only way to achieve this without jeopardising the progress achieved so far with this very successful paradigm is to anchor the deep end of the conceptually infinite pool of Big data onto the logical backbone, the semantics and the longitudinal clinical coherence characteristically encountered in EHRs.
So then, the round trip of digital health platform concepts development from the EHR paradigm to the RWD paradigm and back is complete.
It is ironic that the data sharing enablement objective which has been fundamental in EHR assimilation programmes such as the US meaningful use programme, has failed to attract commensurate with strategic value support by institutions. Instead, the EHR is now erroneously almost equated with all other forms of medical data records, as a limited organisational capacity entity that pales in comparison with the Big data platforms which are envisaged. At the same time, data sharing governance is a strategic policy objective for digital health and health sector legislature.
Some basic truths.
- Big data has been formulating the new data sharing and use paradigm in health and has also been shaping the corresponding proposed governance models and policies.
- EHRs are by definition the only organisational entity that may reliably capture longitudinally arranged patient-centric clinical data models, and as such shall continue to be a pivotal concept for Big data development.
- Since Big data application numbers are rising fast, it is imperative to understand and respond to the need for reliable and reusable patient data with a global context, rather than to insist on shaping disposable dataset instances with temporary contexts around health system transactions, such as issuing and handling an electronic referral document.
- As far as implications for EHR standards and systems development are concerned, Big data signifies a paradigm shift from provider bound data to globally interoperable health information and collaboration.
Based on the non-RCT experimental data definition of RWD, an assessment of the Real-World Data Policy Landscape in Europe 2014 report, revealed several barriers that restrict further development toward full exploitation of RWD potential (49):
- Absence of common standards for defining the content and quality of RWD;
- Methodological barriers that may limit the potential benefits of RWD analysis;
- Governance issues underlying the absence of standards for collaboration between stakeholders;
- Privacy concerns and binding data protection legislation which can be seen to restrict access to and use of data.
Incidentally, these are the very same barriers which impede progress in the design and deployment of standardised and interoperable EHRs. Common standards concern content and content interoperability - that is, the ability of data platforms to define and share a mutually agreed interpretation and representation of data context and to thereby eliminate data misinterpretation and enable at the same time reusability. It is only by doing so that healthcare agents may freely collaborate while maintaining a shared understanding of the evolving healthcare matters pertaining to a patient as well as shared goals. Governance issues are about doing everything else to enable, implement, legislate and adopt standards. Both of these barriers are discussed below.
The fact that these are now the concerns of another major digital health paradigm is down to the fact that, in the process of building interoperability into EHRs in order to cross semantic and regulatory borders and to share information for care continuity and integration, the growth of provider and purchaser information needs outpaced the ability of EHRs to satisfy them.
For this reason, the balk of effort and investment was redirected toward solving the component problems of interoperability, such as data sharing governance models, data trusts and regulatory enablement which are fundamental factors inhibiting access to data pools of appropriate size and quality for AI-driven or other decision informing initiatives. Once these issues have been addressed, and together with the connectivity of handheld, intelligent data capture, patient engagement and data communication devices (mHealth) further accelerated and simplified, the search for data shall soon enough abandon true EHR interoperability in pursuit of unrestricted big data sources for all intents and purposes.
However much we believe that scale shall expose the lack of consensus regarding regulatory and governance matters, and thus mitigate them, the key issue remains. Operating on data which is inadequately verified or may be incomplete, regardless of volume, is a big risk for limited scope sources such as the EHR and an even bigger risk for the big data source concept, both of which risks eventually lead to unintended consequences and to exposing patients and their data to significant ethical and legal challenges (50, 51, 52).
In the current Big data environment, enforcing data quality regulation, together with governance, privacy and confidentiality, is at the end of the day a simple matter of enforcing digitisation and digitalisation standards, including formal validated CDMs. With the new next-gen EHR paradigm proposed, digitalisation is delegated to intelligent Web agents that may be deployed by any authorised healthcare agent, whereas digitisation and CDMs are reformulated as part of the DIFEDS component of the semantically federated EHR Web (FEHR).
EHR obstacles and challenges
Researchers in Big data and precision medicine make the following observations regarding the data harvested from the EHR (5).
Albeit EHRs are designed to arrange their content longitudinally, and are therefore ideal for phenotyping studies and controlled experiments, one of the largest challenges with harmonizing Big data remains the definition of cases (disease) and controls (health). The EHR should be providing this capability. Furthermore, while the EHR provides a unique opportunity to structure data in a manner which allows the study of a spectrum of phenotypes, the data it contains generally is not as rigorous and complete as that collected in cohort-based studies. This is not surprising as misclassification is often encountered within the EHR and missing data abounds. Researchers continue to observe that, although the EHR provides opportunities to study virtually any disease as well as pleiotropic influences of risk factors such as genetic variation, among the formidable challenges related to leveraging the resources of the EHR is assurance of data quality. And with the study of many conditions relying on mining narrative text with natural language processing, rather than more objective testing such as laboratory measures and genomic sequencing, the situation is far from ideal.
What is particularly interesting to note is that, contrary to the actual designs underling the EHR, researchers believe EHRs are not originally designed to produce evidence, and therefore leveraging the needed data is fraught with challenges related to the collection, standardisation, and curation of data. Therefore, as we move toward use of machine learning and artificial intelligence, the use of controlled vocabularies is perceived as critical. Even more important is the need for robust definitions of the clinical phenotypes and diagnosis that accompany these samples so as to ensure accurate comparison between cases and controls. Clearly then, EHRs have failed to provide semantic anchoring for EMR data and hence have failed to deliver the most fundamental uses underlying their original design.
So why is this experience described in the field when EHRs should be a critical component in structuring RWD for Big data applications in research and healthcare?
At the same time as Canada acknowledges failure to deliver interoperability as a key defining characteristic of any national EHR (53), and hence failure of its national EHR programme altogether, evaluations of the US nationwide EHR adoption programme show assimilation has failed to reach key targets and objectives (54, 55). With that, a next generation of EHR paradigm and designs are already being proposed by researchers in the field. Proposals include their transformation from transaction-based and document-based systems into systems which support a full array of complementary apps that “wrap around” appropriately modified EHRs and provide significant care-plan and intelligence support (56), and EHRs with external, shared and universal data, where the a subset of data about a patient’s story would not be confined to a single enterprise (57). Sounds a lot like reuse and Big data, together with very valid goals. However the question remains as to the scope and scale of operations to be covered by these designs and how to achieve the transition to a new generation.
What is certain is that the Big data paradigm shall continue to influence this transition for the foreseeable future. Hence basic direction is already there. And while advances in Big data analytics and AI applications shall continue to offer unprecedented medical research and healthcare innovation, EHR models and systems shall never be out of the picture of this promising future, as they represent all things necessary to structure data in a manner which enables sharing and collaboration, which may leverage a digital health platforms as a platform for seamless care, and which communicate clinical context in a manner which is intuitive for clinical coherence and clinician collaboration support.
For these reasons, it is suggested that the two approaches are combined in one to deliver a new generation of EHRs systems which provide for and collectively implement a global ecosystem of distributed, semantically federated health data.
Up until now, there has been a fundamental problem in the way EHRs have been deployed, let alone designed, which has led to data being collected with no provisions for longitudinality nor support for continuity-of-care. This is primarily due to the fact that, in the US, but also to a lesser degree in the EU, the term EHR often appears in literature to be synonymous with the term EMR. And while EHRs are essentially a collection of EMRs, the way in which the EMRs are organised within EHRs is standardised, intuitive, interoperable, and sophisticatedly simple. EMRs are associated on the basis of processes observed in care continuity and integrated care, processes which are essential for the support of collaboration and the seamless transfer of care responsibility, be that by referral or by following a complex care plan. EHRs are not an EMR repository or a repository of EMR documents.
Setting aside the pivotal issue of implementing proper CDMs in EHRs, there has also been the obstacle of data sharing governance, with many key initiatives being set in motion as this is article is written, as well as the issue of the alignment of policies in digital health with those in healthcare and public health. In order to implement EHRs, health systems must be committed to change into integrated care systems that truly leverage preventative and primary care to reduce hospitalisations and provide for continuous and integrated care, the primary objective being transformation of health systems by adoption of the value-based care model toward universal health coverage. However not all health systems can adapt this model, nor do they perform best under this model.
However attractive the simplicity of transaction systems such as electronic referral and prescription Web services might be - see for example Denmark, Greece, Estonia, the data being collected in the process shall never fulfil the purpose and goals described in here. In fact it is best to remove the process complexity from the asset development platforms altogether, and keep the semantic anchoring role of the EHR in a global scale of distributed and semantically federated health data. Processes and transactions can be assigned to digital service brokers. In that way the complex issues that collaborating federated entities have to navigate regarding their local policy alignment, reform, and governance are reduced, and competing incentives removed by giving way to a global dialogue.
Governance rediscovered
Data governance refers to a set of rules and means to use data, for example through sharing mechanisms, agreements and technical standards. It implies models, structures and processes to share data in a secure manner, including through trusted third parties (58).
One of the purposes of data governance is hence to safeguard the quality and completeness of the data and evidence harvested by means of the EHR and other record structures. And, as evidenced above, what is certain is that data governance within and by the EHR has failed to deliver the opportunities expected, because of inadequate standardisation, including those standards that derive from data governance concerns and designs. For example, supporting longitudinality in healthcare and record keeping as a matter of the policies and the health services delivery models built into digital health.
European single market for data
With the development of data governance regulation, the EU aims to establish a single European market for data, and to support the development of common European data spaces (58). The regulation will ensure access to more data for the EU economy and society and provide for more control for citizens and companies over the data they generate.
The EU data governance regulation aims to strengthen Europe's digital sovereignty in the area of data and put in place rules and means for trusted data altruism. At the same time, citizens will gain more control over their data and decide on a detailed level who will get access to their data and for what purpose. Businesses are expected to benefit from new opportunities as well as from a reduction in costs for acquiring, integrating and processing data, from lower market entry barriers, and from a reduction in time-to-market for novel products and services.
Common European Data Spaces, as well as data use between them, in nine strategic domains, were set out in the February 2020 data strategy (59), including health, environment, energy, agriculture, mobility, finance, manufacturing, public administration and skills. Mechanisms shall be in place to protect and facilitate data sharing, including the General Data Protection Regulation (GDPR) and the Open Data Directive (EU) 2019/1024 of 20 June 2019 on Open data and the re-use of public sector information (recast), by defining, among other things which data and how can be made available to whom and for what use (60, 61). The purpose is to ensure a stable and predictable environment with free flow of data at the global level, and the privacy protection for personal data.
Governance for Digital Health Europe
Particularly in health, a digital health Europe consultation paper seeks to address the complexities and barriers in citizen-controlled data sharing and to propose models which unlock the innovation potential of a single market for data in health and of a European and global data spaces (62). The consultation paper identifies interoperability and data quality as key barriers and considers all aspects involved, from the policy and societal framework leading to citizen-centred data sharing governance models, how models, such as health cooperatives, respond to current data control challenges, the role of sharing initiatives and campaigns, and good practices for benchmarking, adaptation, and adoption.
Following public consultation the paper sets forth for consideration different elements and data sharing governance models, or concepts, based on the level of individual agency over the data, and the benefits the individual gets from the data sharing. Specifically:
- Citizens as the owners of the data: concept assumes citizens have full agency.
- Citizens as integrators: concept assumes mechanisms are available for citizens to manage the integration of their data, directly or with intermediaries or brokers, managing at least data sharing authorisation, and possibly also personal data records, including the semantics for implementing the approved transfers.
- Citizens as the donors of data for public good: concept assumes that anonymised data becomes a public good or community resource which may be managed by charitable trusts or other third parties and review boards, health data cooperatives or medical research organisations (63, 64, 65, 66).
It is in fact by combining all three approaches and concepts as constituent elements of data sharing governance, that the vision of shared health data spaces can be fulfilled, from the governance perspective and as a critical step forward toward introducing global reform in data uses and regulation, and toward digital health and innovation.
With the citizen having full agency over their data, competing policies, incentives and practices are removed, and the citizen is free to function as facilitator, integrator and donor. And while platforms for managing sharing authorisation are an important infrastructural step (67), full agency for personal health data integration together with the vision of global health data pools, will eventually lead us back in the direction of resolving matters associated with the semantics of data records and the effective anchoring of data within a DIFEDS for effective and reliable data sharing and reuse. To this end, a standardisation dialogue must be appended to that pertaining to data governance.
One of the core principles and concepts of any EHR design has been, by default, the reframing of data governance and data sharing regulation, in order to facilitate institutional dialogue in the direction of enabling information flows by enabling patient-centred collective agency, to support and effect the necessary changes, and to finally overcome the institutional barriers that entrench fragmentation.
Full patient agency is not an adequate means for progress, however, in addition to enabling the reframing of policy and regulation, engaging the patient in innovative health services delivery channels and virtual care, using for example mHealth applications and telehealth, shall both safeguard patient-centric data and evidence are available and enrich longitudinally arranged data sets with information that healthcare institutions cannot provide with current means. Once full agency has been enabled in data governance models, collective and collaborative agency shall be achieved. For this second level of unlocking data flows a context maintenance platform, implementing relevant standards, is required.
In order to get there, standards, technological platform implementations and data governance should form a macro-cycle driving, in a spiral, the WEF consortium governance model presented below.
Data trusts and global data brokerage
On the side of the healthcare enterprise, participation in data sharing governance and data space development and maintenance facilitation has for and against arguments both on the organisational and the financing side of things. Overall, data trusts may bring together providers and research organisations into a collaborative agency arrangement geared toward regulating data quality and availability; neither can be guaranteed with the participation of citizens alone.
Data trusts can provide a legal and governance setup for this purpose; one which obliges trust administrators (the “fiduciaries”) to represent and prioritize the rights and benefits of the data providers when negotiating and contracting access to their data for use by data consumers (68). The legal fiduciary duties may be to keep or use the property for the benefit of the individual donor or the public on the basis of biobanks setup (69) or another organisation, provider or research agency. Depending on this data flow direction, different trust roles are expressed. For instance, a fiduciary who regulates data sharing with a client healthcare provider is a (global) medical record broker operating as an interoperable EHR.
From the perspective of supplying data to AI for training and testing purposes, among other uses, data trusts can oversee and enforce the ethical and consent compliant governance of data. In a cloud-native peer-to-peer architecture with shared computational resources, transparency, process coherence and accountability in electronic data governance can be achieved by enforcing healthcare agent mandates (chain of responsibility) by application of distributed ledger technology (e.g. blockchain) as part of the technology stack, thus removing the considerable legal and technological friction that currently exists in data sharing (68).
With this architecture and technology stack in mind, the interoperable EHR becomes a next-generation EHR in the form of a DIFEDS.
The European Organisation for Rare Diseases (EURORDIS) is one such data trust initiative, collaborative or data consortium which engages the whole spectrum of data sources, organizations and individuals, who are active in the field of rare diseases, and promotes research on rare diseases and commercial development of orphan drugs (70). With EURORDIS the patient engages by sharing their own rare disease data for the purpose of research. By doing so they are collaboratively with other patients building an evidence base for improving clinical outcomes, for supporting the development of drugs and devices, for enabling discovery that helps speed up the diagnostic process, improves its accuracy and consequently reduces health costs (71). EURORDIS enforces as fiduciary ethics and trust through researching and implementing measures for regulating and controlling access to data, to promote the use of this health improving data channel and to reassure patients with rare diseases that they will not put their personal information at risk by participating in such data-sharing initiatives. And by ensuring that patients have the ultimate control of their data, institutional obstacles are overcome and patients gain reciprocal access to research outcomes which can boost innovation and provide hope for whole communities.
Global federated data (trust) consortia
The World Economic Forum (WEF), the Australian Genomics Health Alliance (72) and Canada's Genomics4RD (73), the United Kingdom (Genomics England) and the US Intermountain Healthcare health system, joined forces to develop a genomic data consortium governance model, in order to drive data collaboration and innovation in the study of rare disease, to mitigate potential risks for rare disease and to provide the means to globally advance data standardisation and interoperability in genomics toward personalised healthcare. This being one of the Big data cases discussed above, the goal is once again the aggregation of larger than medical and health record scope, deidentified clinical datasets, linked to genomic data for phenotyping discovery and prediction (74).
The governance model is based on a range of proof of concept examples led by the Global Alliance for Genomics and Health (75) on the technological model of federated data systems (FDS) that aim to address the challenge posed by the data storage requirements set forth by the formidable volume of data representing multiple genomic variants each coupled with a longitudinal health record for phenotyping.
In order to break barriers to health data sharing, both legal and policy, the project recognises both the need for data ontologies and standards to become universal over the coming years, and the challenges involved in establishing interoperability between EHRs with different ontologies and standards.
Compounding the challenges posed by interoperability and fragmentation, including that in policies, the project identified the growing lack of trust on the part of the public and providers of data as a major impediment toward shared data.
And to mitigate risks arising from semantic mining effectiveness and curation efficiency, the project suggests a federated data consortium which establishes a limited data trust around each research or use case. This includes identifying resourcing, structuring data, and deploying the API technology (74).
To overcome the obstacle of global semantics, as distinct from technological interoperability, the project created and promotes a governance model which fosters a cohesive, symbiotic relationship between institutions with otherwise differing models of consent, operations, security and technology, with the view to optimise for the best outcomes policy and implementations. The goal is that, beyond technical frameworks, the operational, policy, legal and ethical frameworks of genomic data sharing will allow for transformation in healthcare delivery, enriched reference databases representative of all ethnicities, large genomic data resources available for clinical deliberation and scientific discovery, and an informed, empowered patient community (76, 77).
FDS are defined as systems which allow authorized users to perform queries on the data within a federated network of organizations and to retrieve result sets but not original data (76). Hence two elements are distinct: the query system and the distributed data sources. The element of federation clearly then refers to the reconstruction of semantics by means of queries or the mining of semantics in distributed data. Once global standards and ontologies are adopted and implemented, a transition to federated semantics as proposed in this article and to DIFEDS shall be possible. Currently, the idea of the FDS reflects the basic conceptual and methodological ideas underlying the Big data paradigm.
The following major benefits and constraints are identified, also identified for Big data above. Federated data enable the ability to analyse larger datasets to gain richer insights and facilitate cross-border data sharing without intervening in local governance and legal regulations. On the other side, they create new types of “infomediaries” with additional risks, liabilities and constraints.
As also extensively analysed above, this collaboration model shall nonetheless, by itself, not overcome the obstacle of global semantics, calling for a variant of this approach which federates semantics instead of requiring the mining of semantics, while hoping for a future common ontologies and standards. But then again, close to twenty years of EHR development have failed to deliver those. A next generation EHR is necessary to that end, one which has a global outlook as is the case with the WEF project.
Given the FDS is defined here with an accent on the governance side of thing, much like with the success of the Big data paradigm, FDS as proposed in the WEF methodology, rely to a large degree on the clear decoupling of the digitisation platform (data and data structures coupled with context and semantics) from the digitalisation platform (software services and processes), or the data availability mechanisms from the interpretation and intelligence mechanisms. This decoupling comes with a shift from local data platforms with centralised national sharing regulation to global data platforms with special interest groups formulating data sharing policies and data governance principles for their collective agencies. What was once an effort to develop a better EHR and a better data sharing platform is now an effort to facilitate and ensure global availability of shared data and to feed such data through sharing initiates and trusts to global digital health innovation activity and business. These are the fundamental observations and truths driving the silent development of the next generation EHRs. On this basis, different data governance models can be applied, depending on the occasion.
Interoperability background
Loosely defined, interoperability is a quality of digital health and healthcare innovation whereby the different systems and elements, including digital and human agents and the operations which comprise the digital health ecosystem, can work together at any time without the need for any intervention in order to instantly deliver new uses and new support concepts.
Over the years there have been numerous definitions of interoperability and analyses as to how to achieve it. Without dwelling into this highly complex domain, it is necessary to point out there are two distinct levels for interoperability development: the level of the ecosystem member and the level of member communication. The former concerns the annotation of data with persistent semantics and the latter the reannotation of data for transmission from one data use entity to another. When implementing health data reannotation for the transmission of data from one health agent to another, which is called messaging interoperability, the industry standard is provided by HL7 standardisation body, now in the 3rd major iteration - that of the Fast Health Interoperability Resources (FHIR). This family of standards essentially provides for interoperable, reusable or common, widely accepted as industry standard data connections between data capture and use entities. To achieve this, HL7 standardises the structure of data within messages transmitted form one element to another, as well as the data itself. The structure of data is provided either by means of FHIR or the older Clinical Document Architecture (CDA, HL7 v3) templates such as templates for the transmission of a inpatient discharge summary data, and the data standardisation is specified in terms of relevant standards for digitisation, such as ICD, LOINC, SNOMED, or ATC.
The structure templates are particular to the HL7 engine and since they have been designed for the purpose of transmission of care instances they do not embody some important concepts which enable persistent patient-centric semantics is global data spaces. And without these or the governance mechanisms necessary to serve the purposes defined in this article messaging interoperability leads nowhere.
Garbage in, Garbage Out applies once again, this time in the case of the HL7 pipeline system. How good is this highly standardised transmission if the data into the pipeline is poor, does not comply to the common messaging standard, or is incomplete?
To ensure interoperable messaging therefore we must ensure the interoperability of the reannotated elements before they engage into collaboration. EHRs should be designed for machine collaboration. EMRs at best are not designed for collaboration. This collaboration enabling domain in standardisation is not the concern of HL7 but an interoperability domain that HL7 aims to influence and drive.
Messaging or electronic document interoperability is nonetheless important and has been extensively tested, by contrast to collaboration interoperability or global clinical context semantics for interoperable agents. By far the most well known case study is the EU Patient Summary project which started out as the European Patients Smart Open Services project (78) and developed into an EU CEN standard (79) and subsequently an international standard (80). The ISO standard International Patient Summary (IPS) document is defined as an EHR extract containing essential healthcare information about a subject of care. As specified in EN 17269 and ISO/DIS 27269, it is designed for supporting the use case scenario for unplanned, cross border care, but it is not limited to it.
Recently, the EU, recognising the need to establish interoperability between member state EHRs extended the reach of this initiative to include electronic prescriptions (81). Nonetheless, this extroversion lacks a necessary introversion, since the EU has adopted in this way standards for the interoperable communication between national ecosystems but not for the establishment of interoperability within the national ecosystems or national EHRs.
What might go wrong is the CDA documents or FHIR resources sent from one country to the other would remain stored and processed for administrative purposes without populating national EHRs with data in the format adopted within the national system. For this to be a concern there needs to be an application. This is an important lesson learned from Big data applications. In the case of the IPS, the EU project most certainly did not design an application of HL7 technology for the use by citizens resident on one side of a border and commuting to the other side for work.
EHR interoperability is key, as in addition to the reannotation pipelines of any scale, EHR data feed the multitude of Big data applications, including AI, as well as any other digital health value supply chain that may be established in the future. Once exposed to a global or larger scale pipeline such as EU Data Spaces or Trusts, the defences of an already weakened interoperability immune system shall be tested beyond their limits. This level of interoperability is openly acknowledged in the US (82), in Canada (83), and systematically ignored in the EU. This is ironic when one considers the standard developed for this purpose is original a European Standard (ISO/CEN13940)
One might argue that since national ecosystems and national EHRs are suffering shortcomings associated with data governance standardisation and policy alignment, a process necessary to derive a clear specification of requirements and their implications in the national context, the exercise of exposing needs and requirements to a global scale shall force an equal measure of introversion and national progress. While this is certainly the case, there is the risk of driving the development of other components to such a disproportionate extent that a great reset shall be necessary to resolve the conflicts caused by the many layers of attempted and failed connectivity interoperability.
Conceptual architecture elements
One of the key strategy components in the ongoing development of Big data platforms over cloud solutions has been blockchain and other distributed ledger technology (DLT). The global blockchain in healthcare technology market size was estimated at 231 million USD in 2018 and is anticipated to expand at a compound annual growth rate of around 70% over the next few years to reach 10 billion by 2025, with a 15% total blockchain market share (65 billion by 2025). Factors driving this growth include the increasing incidence of information leaks and data breach coupled with the rising requirement to curb these issues, strategic initiatives by key players, high demand to reduce counterfeit drugs, and the need for efficient health data management in medical research (84, 85).
With a view to build on current efforts to exploit blockchain networks and other DLTs as well as various reannotation technologies such as FHIR on URLs, the proposed next generation of EHRs is one where data are separated from processes, digital health services, applications and decision supporting tasks, building on the paradigm of Big data and the need to further develop structured data.
In this new generation data are distributed to comprise a global EHR, with data governance models implemented to regulate either business models or the scale, scope and uses of data on an ad hoc basis. To ensure common data semantics are available for instant use, global standards are to be adopted to implement a distributed but federated reusable semantic anchoring platform instead of mining or reconstructing semantics for each use.
The elements of the DIFEDS architecture which is proposed, deploys data anchors for each subject of care and annotates by inheritance all objects with object identifiers in order to produce the necessary RWD structure, including context formulation capacity and longitudinality in world-wide, semantically distributed web EHRs.
Key building blocks of the architecture are as follows.
1 - Agency
Citizens are provided with or purchase an account with a distributed EHR provider, a data trust, a personal health data space, consortium or other fiduciary or data management service and generate a global identifier or key which is used to identify and annotate their data objects. This could be a personal, patient or other identifier, public encryption key, or other unique key. Providers are invited to join this data sharing and digital health collaboration platform that essentially manages access and identity.
2- Provenance and validation
EHRs rely on the transfer of care responsibility and the validity of corresponding mandates to collect and share reliable and complete information. The blockchain transaction validation system guarantees provenance and hence the implementation of secure mandate propagation and validation, thus removing a significant barrier for ensuring high quality and availability of patient-centred data in EHRs.
Each transaction receives its own digital signature, which is associated with a specific patient. Signatures related to a specific patient are combined together into a digital fingerprint which uniquely identifies the set of transactions, by both the patient and the provider, and only those with access to the fingerprint can view the data and a copy of the ledger which consists of the validated data is sent to the provider to store locally.
Therefore each patient-provider relationship has its own fingerprint. Which means blockchains also help with validating data integration. The patient has control over the types of data released and through the unique tree structure of the protocol, a provider can validate both information coming from the patient as well as from other collaborating providers, thus establishing provenance, as digital signatures reflect both the origin of the data and the access trail.
This validation process may for example be used to maintain clinical context during referral and discharge and to securely link episodes of care for clinical coherence on the basis of signature enforced mandates propagation. This is what makes blockchain so suitable for implementing Big data medical research applications and for the development of outcome measures (86).
3- Immutable semantics
Blockchain and other DLTs are deployed to manage a public or permissioned ledger of transactions which have generated global EHR data. The DLT identifies securely the subjects who are referred to in an object, including the patient and the healthcare provider (87). Furthermore, blockchain guarantees data cannot be changed by anybody including mandated healthcare professionals or the patient, internally and natively, making data immutable and hence secure. Immutability in the data life-cycle also guarantees the semantic context of data is not altered, a critical property for DIFEDS and the EHR.
Based on this immutable logical backbone which ensures clinical coherence, the DIFEDS EHR may use FHIR and URLs to create a virtual global EHR which references local and institution-specific EMRs on the basis of the adopted Blockchain-DLT implementation.
4- Semantics indexing and structural overview
The DIFEDS EHR maintains a summary record or a composite index of record data stored in the ledger which contains digitised data, that is data coded in internationally used nomenclature systems and classification/ identification codes. Each summary record object is assigned a unique and interlinked OID. Different objects are annotated to define associations between the objects. Episode of care (EOC) IDs can be used for this purpose as specialised OIDs. An EHR comprises nested EOCs and nested DLTs/blockchains. Nesting is implemented by means of OIDs, annotation and DLTs.
5- Privileges and consent
DLTs are used and managed to assign and revoke privileges for entire segments or entire EHRs. The management of privileges and consent is general, refers to the account and uses of the data and does not refer to the generation of records or blockchain networks.
6- Semantic anchoring of global data sources
The nested structure of distributed data objects is held together or anchored by means of the summary virtual EHR and Universal Clinical Object Semantics Identifiers (COSID) encoding semantic context structure. Semantic anchoring as described in this article may be implemented by means of a combination of blockchain techniques and object annotation in ledgers.
7- Problem-oriented restructuring
DIFEDS can be mined and problem-oriented records can be constructed using the original immutable records brought together with federated semantics. These records are temporary and do nor reflect in any way the original local records.
8- Service providers and brokers
Service providers may connect to the DIFEDS EHR and implement electronic prescription and referral services as well as generate and export patient summaries based on consent and mandate information.
9- Business models
Different business models for the data spaces and the service layers are being developed, some for profit and some as proof-of-concept, with the bulk of the disruption yet to come.
Ventures, platforms and risks
Existing implementations of this data sharing concept architecture include MedVault's secure platform for storing personal health records (88), Guardtime's world’s first EU-eIDAS regulation (89) accredited blockchain-based trust service (90) with its HSX API platform for building secure distributed health applications (91), Gem Health Network, a blockchain healthcare ecosystem connected to universal data infrastructure bringing together the global community of companies and individuals that take part in the continuum of care (92, 93), MedRec, which enables patients with a single access point to their data across multiple providers with smart contracts and aggregate data pointers to patient-provider relationships (94, 95), and many more (96, 97, 98).
As history has a tendency to repeat itself, what is important is that the structural facet of interoperability is addressed in this new approach to data sharing platform development (99). It is also equally important that this facet is addressed at a separate level to that of messaging interoperability and FHIR resources, as it should. With the new generation already underway, handbooks should include an extensive overview of considerations and means to deploy interoperability, and this is still not the case (100).
The good news is that while significant barriers to healthcare adoption of blockchains have existed (101), including the perception of immature technology and skills, regulatory constraints, hesitation in executive buy-in, a lack of clear return on investment, and insufficient business cases, these are on the way out. Business cases and models evidently are on a sharp rise together with ROI demonstrations. As for regulatory constraints, COVID19 has taken care of that.
Cuoco presso Baretto
1y@
Agreed. WHO Europe and the European Commission might be a start in developing standards?