
Adverse drug reactions (ADRs) constitute key factors in determining successful medication therapy in clinical situations. Integrative analysis of electronic medical record (EMR) data and use of proper analytical tools are requisite to conduct retrospective surveillance of clinical decisions on medications. Thus, we suggest that electronic medical recording and human genetic databases are considered together in future directions of pharmacovigilance. We analyzed EMR-based ADR studies indexed on PubMed during the period from 2005 to 2017 and retrospectively acquired 1161 (29.6%) articles describing drug-induced adverse reactions (e.g., liver, kidney, nervous system, immune system, and inflammatory responses). Of them, only 102 (8.79%) articles contained useful information to detect or predict ADRs in the context of clinical medication alerts. Since insufficiency of EMR datasets and their improper analyses may provide false warnings on clinical decision, efforts should be made to overcome possible problems on data-mining, analysis, statistics, and standardization. Thus, we address the characteristics and limitations on retrospective EMR database studies in hospital settings. Since gene expression and genetic variations among individuals impact ADRs, pharmacokinetics, and pharmacodynamics, appropriate paths for pharmacovigilance may be optimized using suitable databases available in public domain (e.g., genome-wide association studies (GWAS), non-coding RNAs, microRNAs, proteomics, and genetic variations), novel targets, and biomarkers. These efforts with new validated biomarker analyses would be of help to repurpose clinical and translational research infrastructure and ultimately future personalized therapy considering ADRs.
Large electronic medical datasets that include patients’ medical records have already proven useful in clinical research and have become essential for the analysis of patient medication in the era of big data in healthcare (1). Various medication-related decision processes have been implemented to ensure the efficiency and safety of medicines, such as dose guidance, drug allergy, and drug-drug interactions (2). Various parameters, including disease, age, the liver and kidney functions of patients, effects of excipients, additives, or preservatives, and food-drug interactions, should also be considered for the proper administration of medications.
Several studies have shown the value of pharmacovigilance research by using electronic medical record (EMR) data for use as decision support tools. EMR data may include passive and active referential information, reminders, alerts, and guidelines related to adverse drug reactions (ADRs). Thus, EMR data may have great potential in pharmacovigilance research and could enable the rapid identification of patients for inclusion in interventional and observational studies. The use of EMR data and data-mining processes may enable us to produce effective decision support tools for the prediction of ADRs (2), thereby having the potential to repurpose clinical and translational research infrastructure.
ADRs are commonly used as an important factor to determine the success of a therapy. In particular, as clinical data of ADRs are now documented electronically, efforts to compile information on ADRs have received widespread support (2). However, few studies using EMRs have observed significant benefits on patient outcomes (3), perhaps owing to the small sample sizes or short investigational times, which did not allow clinically important events to be revealed. The exploitation of efficient medication depends on the accuracy of information in the EMR database (4). For example, duplication of medications, contraindication of patient’s condition, and changes in the efficacy and toxicity of drugs with respect to their pharmacokinetic and pharmacogenomic characteristics may often be proficiently assessed by clinicians and pharmacists by using EMR data. As their use becomes more widespread, it is increasingly important to have better ways of analyzing EMR data to ensure the validity of the studies (5).
In our analysis of articles indexed in the PubMed databases, we assessed a set of reports published between 2005 and 2017 by using the following key words: EMRs, drug therapy, drug-related side effects, adverse reactions, drug, medication, pharmacoepidemiology, or pharmacology. The number of papers describing ADRs of major organs in patients is listed in Fig. 1. Drug-induced liver injury (DILI) constituted the greatest percentage of ADRs, as shown in Fig. 1; the gastrointestinal tract, cardiovascular system, and kidney were the next most vulnerable locations to ADRs. There were no significant annual differences in the total number of ADRs. To assess the target organ, we counted the number of papers archived for different organs. From a total of 1421 potentially relevant publications, 1161 retrospective full-text publications were obtained after screening titles and abstracts using keywords presenting ADR events and target organs; 513 reported occurrences of ADRs only, 908 reported both efficacy and ADRs, and 648 (45.6%) focused on ADR reports based on EMR database mining. In addition, 86, 892, and 443 papers showed ADRs in subjects who were healthy, had disease( s) directly associated with the ADRs, and had disease( s) indirectly associated with ADRs, respectively. This review process was conducted based on the PRISMA guidelines (Fig. 2). In Fig. 2, ADR reports from EMR-based quantitative analyses were ~56% of all ADR studies, suggesting that EMR data-mining is still widely used for ADR assessment.
The papers referring to ADRs currently have several shortcomings for determining a conclusion on incidence rates or the severity of ADRs in sub-groups. For example, it was difficult to make a sophisticated decision on ‘drug-induced liver injury’ and ‘no drug-induced liver injury’ and to calculate the mismatched occurrence rate of DILI (6). Especially, when researchers try to predict ADR occurrence rates and severity of liver injury depending on subgroups after combining the results from retrospective and prospective studies, the total number of patients exposed to target drugs cannot be calculated exactly. In addition, information on the characteristics of patients in each subgroup was not easy to obtain from retrospective EMR studies (6,7). Moreover, ADRs were further categorized to sub-groups, such as DILI and gastrointestinal tract injury. In these cases, various incidence rates of ADRs can be calculated. As reports of ADRs emerge for new drugs or for various clinical cases (8), the analytical methodology to combine larger scale reports of ADRs is a prerequisite.
ADRs have become a clinical issue and a concern to public health system; they are responsible for 6.5% of all hospital admissions (e.g., approximately one quarter of the patients had a risk of death) (9,10). As knowledge of the occurrence and the content of these ADR reports has increased (11,12), drug safety evaluation, including ADR monitoring, has become an important issue in pharmacotherapy (13). Thus, signal detection for ADR alerts in hospitals is currently regarded as active drug safety surveillance. To reduce ADRs, it is essential to identify the causal relationship between drug medications and the incidence of ADRs (14). Data acquisition using laboratory signals from patients’ EMRs is the first step in the identification of ADRs and other conditions in patients (15). Reviews of EMR data may be used to assess the incidence of ADRs (16). Three major methods are often used for ADR reports: 1) retrospective chart reviews; 2) ADR reviews based on patients’ EMRs (17); and 3) spontaneous reports by clinicians or patients, which can be focused on for additional chart reviews.
The analytical processes of the EMR database can be summarized into four steps: Institutional Review Board (IRB) approval; data extraction from the EMR database; data-mining of extracted dataset; and statistical analysis. The procedures for data extraction, analysis, and evaluation of potential ADRs are prerequisites to generate drug safety information. Clinical outcomes from hospitals and efficacy/safety data submitted by pharmaceutical companies can be adjusted to define the proper information on ADRs (Fig. 3).
Here, we focus on how to identify ADRs by using hospital data. First, the study design should be approved by the IRB of the respective hospital. The protocol submitted to the IRB includes the purpose of study, the researchers’ certificate numbers taken from the IRB educational program, and a detailed description of the study. The descriptions should mention the following factors: administered drugs; drug administration period; patient information, such as age, gender, diseases, and exclusion criteria; required laboratory signals of the study; and analytical methods using acquired dataset. Secondly, after IRB approval, data extraction is conducted; data collection process means that the required dataset is extracted from EMR database in electro-medical team in hospitals. If necessary, ADR signals defined by World Health Organization (WHO), as well as potential signals in the EMR database, can be used. In particular, specific ADR hits and signals may raise hypotheses on safety information, which may affect regulatory decisions (12). In our data extraction from Seoul National University Hospital (SNUH), twenty laboratory signals, such as phosphate, glucose, hemoglobin A1c (HbA1c), blood urea nitrogen (BUN), serum creatinine (Scr), cholesterol, protein, albumin, uric acid, total bilirubin, aspartate transaminase (AST), alanine aminotransferase (ALT), alkaline phosphatase (ALP), gamma-glutamyl transpeptidase (GGT), C-reactive protein (CRP), alphafetoprotein (AFP), white blood cells (WBC), absolute neutrophil count (ANC), hemoglobin test (Hb), hematocrit (Hct), platelet count (PLT), international normalized ratio (INR), sodium (Na+), potassium (K+), and calcium (Ca+2), were retrieved. In addition, patients’ disease, co-administered drugs, prescription patterns (e.g., dose, period, and route of administration), age, or gender can be extracted as candidates of general and potential signals from EMR database (12).
To assess the quality of dataset, the reporting number, rate, and percentage of serious adverse events should be considered (12). Using the extracted dataset, pharmacists and/or researchers initiate data-mining to predict ADRs. The data-mining and acquisition process exemplified in this article may be important for clinicians to understand ADRs. Moreover, the data processing and scoring systems are required to assess the ADR alert system, through which clinicians optimize medication schedules at prescription. The procedures of ADR data-mining and the acquisition methods using laboratory signal hits are also applicable to assess other effects. The scoring method is more advanced if other parameters, such as comorbidity, polypharmacy, gender, and age are combined, which allows us to obtain more reliable information on the severity, onset time and/or duration of ADRs, and incidence rate. The prevention of ADRs may ultimately contribute to the reduction of unnecessary healthcare costs.
To apply ADR reports to prevent medication errors and to improve the quality of pharmaceutical care, causality assessments of ADRs and the development of criteria for ADR reports are required. To accomplish this, the validation of the ADR reporting process and the provision of feedback to medical teams on the potential harmful effects of the prescriptions are required. However, there are several points to overcome when using EMR data for the identification of proper ADRs.
First, there may be intrinsic errors, such as incorrect reporting, reporting of false positive laboratory results, or incorrect grading (18). Patient compliance may also be an issue (e.g., gap between patients’ drug administration and prescription) in the following situations: 1) duplicated prescriptions ordered from different departments, even within a single hospital; 2) no prescription information in previous hospital(s); 3) no revised information after prescription withdrawal; and/or 4) no information on drug administration time (e.g., exact time for administration of different drugs). In addition, laboratory signals may be affected by the duration time of drug treatment, although it is not easy to retrieve information on the exact treatment time or duration. In particular, treatment time and duration are critical factors for laboratory signals: a generalized hypothesis for the treatment or duration may cause confusion regarding the occurrence of drug-induced adverse effects. Incorrect reporting, reporting of false-positive laboratory results, and/or incorrect grading result in erroneous use of the EMR database. An example of incorrect reporting would be a description error, such as hypokalemia vs hyperkalemia. Matching laboratory data to a description can be a solution; this is not easy, may be time consuming, and sometimes impossible. The reporting of false-positive laboratory results may occur owing to improper specimen collection, inaccurate normalized values of laboratory report, and/or the occurrence of concurrent opposite symptoms. The errors should be identified and rechecked for the addition of information, as well as extracting and mining EMR datasets.
Second, the application of a data acquisition method using laboratory signals in patients’ EMRs allows us to rapidly and conveniently monitor ADRs and other patient conditions (16,19). To create a data-acquisition method by using EMRs and its systematical application, such as patient laboratory signals, age, organ functions, pathological factors, co-administered drugs, and lifestyles should be considered (15,20–22). Because this method is retrospective, laboratory signal hits have been used for analysis. However, there are limitations to the process of extracting the EMR database because of difficulty in using commercial database programs for data-mining (i.e., the necessity of customized programs). Despite calls for greater transparency (20–23), code lists have seldom been reported in published papers (21,22). Thus, one of possible reasons for the poor quality of reporting is the varied and inconsistent clinical coding. EMR studies adopt definitions of clinical entities. When patient information, including laboratory signals is provided from medical staff in hospitals, the lists of clinical codes are generally extracted and converted into “.csv” or “Excel” files. However, drug medication data vary and cannot be perfectly extracted by automated systems. Moreover, descriptions by clinicians are sometimes important factors to observe patient conditions during medication and should be extracted after communications between clinicians and medical informatics staff. For example, inconsistency in code selection and the resulting small errors in the analysis of laboratory signals in EMR data-mining process may cause large numbers of misclassified patients and a large degree of potential inaccuracy in ADR prediction, causing biased outcomes and errors affecting conclusions in unpredictable ways (24). In particular, clinical definitions may change over time during the observation period, which may necessitate changes in the code lists in patients’ information.
To improve ADR prediction from the EMR database by using new analytical approaches, “prescription sequence symmetry analysis” (PSSA) may be used as a signal detection method (25). This method employs a simple algorithm, which is computationally rapid, and requires a minimal dataset of only three factors, such as drug name, date of supply, and a patient identifier. Another approach is the simultaneous consideration of the rate of drug prescription and ADR occurrence for ADR prediction (26). As the disproportionality of drug prescription and ADR reports has limitations for the prediction of sensitivity, it would be useful to create priority-based listings for signal detections in databases (26). In this method, receiver-operating characteristics curves are used, including the specificity and sensitivity of ADRs (Fig. 2).
Emerging evidence from numerous biomedical studies and advancements in scientific technology in recent decades have suggested the potential value of novel biomarkers in the prediction and/or diagnosis of ADRs. As ADRs affect many organs in humans and vary broadly in severity, diverse biological events (e.g., gene expression and signaling pathway activation) could be changed (27,28). This concept proposes the importance of the identification of novel targets and/or biomarkers based on elucidation of complex molecular mechanisms and their utilization in coordination with the EMR database analysis, to allow applications in clinical situations (29). The integration and understanding of the factors may enable us to provide directions for pharmacovigilance studies, as suggested in Table 1.
One of the problems in the use of routine EMR data is the limitation of traditional biomarkers for the proper prediction of ADRs. Circulating protein biomarkers have been used for the diagnosis and/or prognosis of ADRs, which associate with specific tissue damage, irrespective of the etiology of the diseases and/or drugs (30–34). For example, the measurement of ALT and AST activities for liver injury is an example. Although the traditional biomarkers of tissue injury have been continuously and widely used, several limitations exist: the markers may become elevated only when the tissues are significantly damaged (i.e., low sensitivity), and the markers may be produced by various organs and/or toxic stimuli (i.e., low specificity). In addition, the markers are incomplete for determining the precise mechanism of ADRs and/or the specific cell types affected. ALT often shows greater specificity than AST (31), but it has disadvantages, such as low sensitivity and possible alterations by other comorbid conditions. To overcome these limitations, the identification and utilization of novel biomarkers is necessary for future pharmacovigilance studies. Indeed, recent improvements in the discovery of new biomarkers based on genetic variations [i.e., genome-wide association studies (GWAS), non-coding RNAs, microRNAs], proteomics, gene network, and signal pathways have enabled us to understand the benefits of new ADR indicators (8,30).
In addition, the mechanistic approaches to the discovery of biomarkers can certainly be applied to various tissues and organs commonly affected by ADRs, such as the liver, in which toxicity may be specific to certain cell types (30). High-mobility group box 1 (HMGB1) has been considered as a prognostic biomarker for ADRs. It is released from necrotic hepatocytes and activated immune cells, which is an important link between cell death, inflammation, and the disease progression (30,35). It has been shown that HMGB1 isoforms were more sensitive than ALT for the prediction of DILI development and adverse reactions caused by hepatotoxicants, such as acetaminophen overdose (35). Therefore, despite the limitations [i.e., time-consuming diagnostic assays (MS/MS) or inability to distinguish between different acetyl and redox isoforms of HMGB1 (ELISA)] (35), HMGB1 is a promising candidate novel biomarkers. ADRs possess strong genetic predisposition, although it is difficult to discern the genetic components underlying any particular ADR. For example, the human leukocyte antigen (HLA) alleles are highly polymorphic and are associated with different types of ADRs (36). Others include genes encoding for drug metabolizing enzymes and drug transporters (37,38). Thus, the identification of genetic factors and the implementation of new genetic approaches contribute to the safer use of drugs, as shown in some clinical practice (37,38).
Genome-wide association studies (GWAS) continue to be used to offer a more comprehensive view of drug responses and ADRs (39). GWAS for ADRs are characterized by smaller sample sizes than GWAS for common diseases; often, only dozens of cases and hundreds of controls are used, in comparison to GWAS for common diseases, which usually need thousands of cases and controls (39,40). ADRs are often related to immunological features, as many drugs or metabolites can interact with major histocompatibility (MHC) molecules, and these associations have been detected in the early GWAS of ADRs (39,41). Despite various adverse reactions of drug hypersensitivity, recent studies have reported the links of ADRs with loci outside of the MHC region, such as human leukocyte antigen (HLA) alleles (e.g.,
The human genome encodes RNAs that do not translate to proteins, known as non-coding RNAs (ncRNAs). These comprise of microRNAs (miRNAs, a type of small non-coding RNAs), intronic RNAs, repetitive RNAs, and long noncoding RNAs (lncRNAs). Their functions are related to the control of the transcription, stability or translation of transcripts for the
The most well-studied ncRNAs are miRNAs, but many other types of ncRNAs with various lengths and characteristics may also have roles in the regulation of cellular homeostasis and disease progression (42,46). The functional impacts of ncRNAs on human disease have been well described from the research to discover the abnormal expression patterns of miRNAs (46). For example, certain miRNAs can control various processes of tumorigenesis, neurodegenerative diseases, or cardiovascular disorders (46,47). In addition, a small nuclear RNA (snoRNA) and lncRNA functions are also impaired (46).
MiRNAs control diverse biological processes through the post-transcriptional regulation of their target genes. Specifically, miRNAs may modulate the expression of proteins accountable for the regulation of pharmacokinetics, which involves P450 metabolizing enzymes, and ABC or SLC transporters (48–53). Therefore, drug metabolism and disposition can be affected by miRNA-dependent alterations in gene expression and the consequent changes in biological functions. Human CYP3A4 is the most abundant in organs such as the liver and small intestine and metabolizes > 50% of drugs, such as benzodiazepines, antivirals, and steroids (50). miR-27b and miR-298 act directly on the 3′ untranslated region of CYP3A4 mRNA (51). Other types of cytochrome P450 genes are also regulated by certain miRNAs [e.g., the inhibition of CYP7A1 by miR-122a and miR-422a (54), and CYP24A1 by miR-125b (55)]. Therefore, miRNAs may become valuable biomarkers for the detection of ADRs, as well as targets for drug discovery.
Many types of toxicants alter the expression of miRNAs in target organs. miRNAs are considered to be stable in the plasma; thus several miRNAs originated from various tissues (e.g., the liver) can be released into the bloodstream (56). Thus, the dysregulation of miRNAs can be found in easily obtainable biological fluids. The advantages of using miRNAs, in combination with the early responses to toxic challenges and their stability, allow the molecules to act as novel biomarkers for drug safety assessment (57–61). Recent developments in the discovery of biomarkers for DILI support the view that the newly identified biomarkers have enabled us to overcome the limitations of traditional markers for the diagnosis and comprehension of the etiology of ADRs (30). Specifically, miR-122, as a liver-enriched miRNA, is a promising target. Circulating miR-122 has been shown to be specific for acute hepatocyte injury in acetaminophen overdose and is more sensitive for the early detection of liver injury than traditional tests (58,59). Further research is necessary to validate the utilization of miR-122 as a diagnostic or prognostic indicator of late-onset idiosyncratic DILI (30). Therefore, the precise understanding of the basis underlying miRNA biology in the context of drug responses would provide an opportunity to gain insight into the specific tissue damage and the pathogenesis of injury (30). This would also provide valuable information for drug development during preclinical testing and early phase human trials.
Given the functional roles of proteins in most of cellular processes and the diverse proteomic patterns in response to environmental and/or chemical stress, including drugs, proteomics may also represent a new model for ADR assessment. A recent study applying large-scale proteomics of blood samples and pathway analysis discovered the unknown effects of torcetrapib on the immune and inflammatory functions, in addition to the changes in the endocrine systems, indicating the improved assessment of drug safety through proteomic analysis (62). A comprehensive proteome scale approach was developed to predict drug-protein interactions, providing the information on ADRs as well as drug repositioning (63). In addition, a method was also suggested for the expectation of ADRs through the integration of protein-protein interaction (PPI) networks with drug structures, which showed that the integration significantly improved the prediction of ADRs (64). Compared with genomics, proteomic techniques may convey different information and offer the advantage of early detection of ADRs; thus, these approaches complement each other (65,66).
Many complex traits for drug responses are associated with alterations in various biological pathways rather than single gene changes (67). A predictive framework was presented, in which gene expression data were captured into activity states of signal transduction circuits as sub-pathways connecting receptor proteins to the ultimate effectors activating reactions in the cell (67). These mechanism-based biomarkers may provide insight into the molecular basis of drug actions (67).
Advancements in genetic studies have allowed the mapping of individual genetic variations based on human genome sequencing (68). A number of reports suggested that genetic differences might be related to the progression of disease, responses to drugs, and ADRs (68,69). Recently, the use of the pharmacogenomic database has been considered to explain variations in biological events among individuals in the context of pharmacokinetics, pharmacodynamics, and adverse reactions (70,71). Indeed, the findings from pharmacogenetic studies have already been utilized in clinical applications for the purpose of future customized therapies (70,72). In addition, technical advances in laboratory works and bioinformatics have allowed genetic research outcomes to be applied to more complex genetic diseases (69), suggesting their relevance to the evaluation of drug responses and ADRs, especially in the areas of oncology, neurodegenerative and cardiovascular diseases, and metabolic disorders.
Although the research data showing the impact of pharmacogenomic biomarkers on drug responses have rapidly expanded, the clinical applications of evidence-based findings are small and thus only a small fraction of the studies identifying predictive novel biomarkers of ADRs has been translated to clinical practice (30,71). It is important to utilize the information on novel biomarkers for pharmacovigilance studies and to coordinate EMR data with multiple biomarker panels (genomic/non-genomic), which can be integrated through the application of different ‘-omics’ technologies (e.g., transcriptomics, proteomics, and metabolomics) and thereby provide more detailed information on drug responses (30).
Given the roles of the molecules in biological functions, genetic alterations can significantly affect pharmacokinetics, drug actions, and other factors involving therapeutic outcomes and ADRs (71,73). Of note, unpredictable and individual ADRs are considered a major risk factor for safe and successful therapy (69,74). Therefore, the discovery of genetic factors affecting ADRs would be of great help to reduce the medical problems and mortality for a subset of the population (69,75). Among drug-metabolizing enzymes, genetic variations in cytochrome P450s (CYPs) have been well studied. One good example is the CYP2D6 subtype, which belongs to the main enzymes responsible for drug biotransformation (i.e., 20%–25% of clinical drugs) (76). According to the differences in gene copy numbers of
Advancements in technology have allowed us to expand the application of novel biomarkers into the EMR database. For example, the ultimate tool for the identification of ADRs in patients with different phases of diseases would require a combination of specific biomarkers of each disease and traditionally monitored EMR parameters (80). Key factors for the successful clinical applications of pharmacogenomic data include the development of clinical guidelines to guarantee consistent interpretation and prescribing practices, in addition to evidence-based information databases and the appropriate educational programs for decision making (71,76).
This work was supported by the Education and Research Encouragement Fund of Seoul National University Hospital.
Potential new targets and/or biomarkers for pharmacovigilance studies
Category | Targets and/or biomarkers | Pharmacological effects and/or ADRs | References |
---|---|---|---|
Genetic variations (traditional) | CYP2D6 | Drug metabolism (rapid or slow metabolizers) | (75,76) |
SLCO1B1 | Statins (myopathy) | (77) | |
VKORC1 | Warfarin (anti-coagulant effect) | (78,79) | |
miRNAs (novel) | miR-122 | Acetaminophen (hepatotoxicity) | (57) |
miR-27b, miR-298 | Drug metabolism [possibly effects on drugs metabolized by CYP3A4 (e.g., benzodiazepines, antivirals and steroids)] | (50,51) | |
miR-378 | Drug metabolism [possibly effects on drugs metabolized by CYP2E1 (e.g., acetaminophen, isoniazid)] | (52,53) | |
miR-122a, miR-422a | Bile acid synthesis (CYP7A1); possibly effects on ADRs affecting liver and biliary system, and/or responses to statins) | (54,82) | |
miR-125b | Vitamin D3 metabolism (CYP24A1); possibly effects on ADRs affecting cancer susceptibility and/or calcium homeostasis | (55,83) | |
miR-124, miR-18a-5p | Skin blistering reactions (SJS/TEN) | (60,61) | |
Secretory proteins (novel) | HMGB1 | Tissue injury, immune response, acetaminophen (hepatotoxicity) | (35,38) |