Hubble - linked data hub for clinical decision support
Hubble: Linked Data Hub for Clinical Decision
Rinke Hoekstra1,3, Sara Magliacane1 Laurens Rietveld1, Gerben de Vries2,
Adianto Wibisono2, and Stefan Schlobach1
1 Department of Computer Science, VU University Amsterdam, The Netherlands
2 Department of Computer Science, University of Amsterdam, The Netherlands
3 Leibniz Center for Law, University of Amsterdam, The Netherlands
Abstract. The AERS datasets is one of the few remaining, large pub-licly available medical data sets that until now have not been publishedas Linked Data. It is uniquely positioned amidst other medical datasets.
This paper describes the Hubble prototype system for clinical decisionsupport that demonstrates the speed, ease and flexibility of producingand using a Linked Data version of the AERS dataset for clinical practiceand research.
Keywords: linked data, adverse event, clinical decision support, healthcare
This paper describes a prototype system for clinical decision support, Hubble,that demonstrates the ease with which (medical) legacy data can be turnedinto RDF, linked to other data sets, and used to support both clinical researchand clinical practice. At the heart of the system lies a Linked Data version ofthe Adverse Event Reporting System (AERS) dataset of the Federal Drug Ad-ministration (FDA). To some extent, this exercise can be categorised under theheading ‘yet another exposing of data as Linked Data'. However, the systemconvincingly demonstrates three important sales pitches of Linked Data: inter-operability, interlinking, and tool availability. In particular, the system showsthe huge difference in usability between the original ‘dead' dataset and the ‘live'Linked Data version, it shows how quickly this can be achieved using standardtools, and it validates the standard three-tier architecture that separates data,application logic and presentation. We validate the quality of the dataset bycomparing it to results of a real use-scenario on the AERS dataset.
Clinical Decision Support Clinical decision support (CDS) can be defined as"the use of the computer to bring relevant knowledge to bear on the health careand well being of a patient"[2]. Clinical guidelines play a central role in CDSsystems; they contain the consolidated knowledge on patient treatment. How-ever, guidelines are slow movers, decided upon in periodic conferences where
new evidence is weighed for updating a guideline document. The evidence itself,however, accumulates at a tremendous pace: there are numerous clinical trialsand well over 10 thousand publications on breast cancer every year. Therefore, aCDS should bring together patient information, relevant guidelines and impor-tant new findings in clinical research. In this context, the key challenge is: how toensure that the information presented by the CDS is relevant and trustworthy?
The Data The fields of health care and life science (HCLS) have tradition-ally seen a lot of attention from the Semantic Web community, and vice versa:semantic web languages, and their predecessors have proven to be a convenientparadigm for representing biomedical knowledge. Vocabularies in the HCLS fieldare highly standardised; computer analysis, and computer-based information ex-change are ubiquitous throughout the field (viz. the Humanities). As a result,many (bio)medical databases and terminologies are now published as linkeddata, taking up about a fourth of the Linked Data cloud. Examples are medicalvocabularies such as SnomedCT, MeSH, MedDRA, and the NCI Thesaurus (allpart of the Unified Medical Language System (UMLS)),4 and datasets such asLinkedCT (clinical trials), Sider, Drugbank and RxNorm (drug information),Uniprot (protein sequences), to name but a few.
The AERS datasets is one of the few remaining, large publicly available
medical data sets that until now have not been published as Linked Data. Anadverse event (AE) is an adverse change in health or side effect while the patientis receiving treatment. A serious adverse event (SAE) is life-threatening and,amongst others, may result in death, requires hospitalisation or prolongation ofexisting hospitalisation and will result in persistent or significant disability orincapacity. Known chemotherapy-related SAEs in breast cancer (US only) werelinked to 22% of hospitalisations. Clearly, from a clinical perspective, seriousadverse events are very important: this is where CDS can make a huge difference.
System Description
The architecture of Hubble follows a three tiered architecture: a) a 4Store triplestore,5 containing the AERS dataset (AERS-LD), CTCAE,6 a selection of DBPe-dia, Sider, and Drugbank.7; b) a set of SPARQL 1.1 queries and some server-sidecode; and c) a Java Smart GWT framework client interface.8 This section brieflydiscusses the way we convert and link data, annotate documents and present theresult to a user.
Data Conversion & Linking The AERS data files are published on a quarterlybasis, as zip files containing dollar separated tables. These zip files are roughly
4 See http://www.nlm.nih.gov/research/umls/.
5 See http://4store.org.
6 CTCAE, subset of MedDRA, lists AEs for cancer therapy: http://bit.ly/zOVPUt.
7 See http://dbpedia.org, http://www4.wiwiss.fu-berlin.de/sider/ and http:
8 See http://code.google.com/p/smartgwt/
20MB in size, and available from the FDA website from two separate static web-pages.9 Converting this data is a five step process: 1) scrape the FDA website,download and unzip the data dump; 2) check integrity of the files, applying fixesif necessary;10 3) import the data into a MySQL database; 4) dump the datato RDF following a D2RQ mapping;11 and 5) import the data into 4Store.12This conversion was implemented as a pipeline called through a Python prove-nance wrapper. This wrapper generates provenance information expressed inthe PROV-O vocabulary.13 Due to hardware limitations we had to restrict thedataset to the years 2011 and 2012 (first two quarters), resulting in a total sizeof 80M triples.
The AERS dataset is uniquely positioned amidst other HCLS datasets, pro-
viding opportunities for linking to drug, location, patient and diagnosis relatedinformation. Furthermore, reports in AERS are filled in by hand. Linking out toother datasets could help in identity reconciliation (e.g. drug names, marketingnames, and chemical substances) as well as detecting misspellings (e.g. in man-ufacturer names). We specified mappings between the UMLS, Sider, LinkedCT,Drugbank, DBPedia and CTCAE datasets using the SILK link specification lan-guage [4], resulting in over 60K links based only on exact string matches.14
Annotations Step two is the automatic annotation of scientific publications andclinical guidelines (available as PDF files) using the vocabularies in the repos-itory. This process has three steps: stripping of PDF documents to plain text,indexing the plain text documents and generating annotations. We use the PDF-Box library15 for conversion to plain text. Each document is then divided intoseparate paragraphs, dubbed ‘chunks'. For each chunk we store the coordinatesof its bounding box in the PDF. The chunks are then indexed for terms (includ-ing synonyms) from the CTCAE ontology, using Lucene.16
The Annotation Ontology (AO) is a vocabulary for annotating scientific pub-
lications and documents on the Web. AO has a lightweight provenance model,which allows storing information about the authors, curation and different ver-sions for each annotation. We use the AO format [1] to represent an annotationfor every term found, using a prefix-postfix selector to identify its position insidea chunk. The advantage of using this method is that annotations will persistacross different manifestations of the same document (pdf, html, xml, etc.). Westore an image selector that uses the chunk bounding box: this image selectoris then used to highlight part of the PDF document.
9 See http://1.usa.gov/uyoAI.
10 For instance, some rows contain line breaks in the wrong places, do not properly
escape the separator character or span fewer columns than expected.
11 See https://github.com/cygri/d2rq12 Unfortunately, exposing through D2R Server turned out to be too slow.
13 PROV-O-Matic, see http://github.com/Data2Semantics/, currently in alpha
stages of development. PROV-O: See http://www.w3.org/TR/prov-o/.
14 Using less exact matching on drug names can have unwanted consequences.
15 See http://pdfbox.apache.org/.
16 See http://lucene.apache.org
Fig. 1: Examples of possible visualisations (a) and Hubble interface (b)
User Interface The Hubble interface is our prototype CDS system.17 It listspatients, shows information about a selected patient, and presents more de-tailed information about specific parts of the patient information to the bottom(Fig. 1b). This is done in several steps, each consisting of a single SPARQL query.
First, we retrieve a list of available patient records. Second, when the user selectsa record we retrieve actual patient data: detailed patient information (diagnoses,drugs, age, etc.), enriched with information from the Linked Life Data (LLD)endpoint (e.g. a drug description tooltip).18 At the same time, we retrieve anno-tations that match the patient description, and depict small snippets of clinicalguidelines and relevant literature. We can drill-down from this detailed informa-tion to: more information about a diagnosis (taken from LLD), similar cases inAERS-LD, drug information such as its chemical structure (from LLD and Dru-gank) and common AEs related to the drug, provenance information about anannotation, and the underlying text. We have intentionally limited the amountand diversity of information presented through the interface, pending feedbackfrom expert users.
The AERS-LD repository is publicly accessible through its SPARQL end-
point, and can be browsed through a customised Pubby browser interface.19Arguably a more actionable presentation than dollar-separated files. The end-point turned out to be very well suited for various visualisations of the underlyingdata (Fig. 1a).20
17 See http://aers.data2semantics.org/prototypeInterface.
18 See http://linkedlifedata.com.
19 See http://aers.data2semantics.org for more information. Pubby: http://www4.
20 Built directly from the endpoint using Sgvizler, http://sgvizler.googlecode.com/
Fig. 2: Number of co-occurrences of Adverse Events and 5-FU (Y-axis). The X-axis represents the ranking of AEs based on the one in [3].
Evaluation and Discussion
The Hubble CDS prototype was built in a very short period (literally over Christ-mas), and is already showing real potential for clinical research. We validated thedataset by comparing results from AERS-LD with a study into the co-occurrenceof AEs with two drugs (5-FU and Capecitabine) [3]. This study was preceded bya labor intensive effort to clean the dataset: consolidation of multiple names fordrugs and removal of duplicate submissions and non-drug entries. We comparedAE-drug co-occurrence on the same selection of AEs in [3] with and withouttaking advantage of the 60K links (see above) in AERS-LD. We did not applyany other data cleaning or harmonisation and used only a limited dataset. Theresult, depicted in Fig. 2, although far from perfect, shows that the Linked Datacloud provides a huge bootstrap for improving the quality of results.
Future work includes publishing the full AERS-LD dataset (all 7 years), in-
creasing both the breadth and depth of annotations through supervised annota-tion of guidelines, combined with large scale annotation of scientific publications,improving selection and ranking of query results in the Hubble interface basedon annotations, citation indexes, and provenance related information.
1. Ciccarese, P., et al.: An open annotation ontology for science on web 3.0. Journal
of biomedical semantics (2011)
2. Greenes, R.A.: Clinical Decision Support: The Road Ahead. AP/Elsevier Science
and Technology (January 2007)
3. Kadoyama, K., et al.: Adverse event profiles of 5-Fluorouracil and Capecitabine:
Data mining of the public version of the FDA adverse event reporting system,AERS, and reproducibility of clinical observations. Ing. J. Med. Sci. 9, 33–39 (2012)
4. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk – a link discovery framework for
the web of data. In: 2nd Workshop about Linked Data on the Web (LDOW2009)
Source: http://laurensrietveld.nl/pdf/Hubble_Linked_Data_Hub_For_Clinical_Decision_Support.pdf
"Charging Lithium-Ion Batteries: Not All Charging Systems Are Created Equal" By Scott Dearborn Principal Applications Engineer Microchip Technology Inc. 2355 West Chandler Blvd Chandler, AZ 85224 INTRODUCTION Powering today's portable world poses many challenges for system designers. The use of batteries as a prime power source is on the rise. As a result, a burden has been placed on the system designer to create sophisticated systems utilizing the battery's full potential. Each application is unique, but one common theme rings through: maximize battery capacity usage. This theme directly relates to how energy is properly restored to rechargeable batteries. No single method is ideal for all applications. An understanding of the charging characteristics of the battery and the application's requirements is essential in order to design an appropriate and reliable battery charging system. Each method has its associated advantages and disadvantages. It is the particular application with its individual requirements that determines which method will be the best to use. Far too often, the charging system is given low priority, especially in cost-sensitive applications. The quality of the charging system, however, plays a key role in the life and reliability of the battery. In this article, the fundamentals of charging Lithium-Ion (Li-Ion) batteries are explored. In particular, linear charging solutions and a microcontroller-based, switch-mode solution shall be explored. Microchip's MCP73843 and MCP73861 linear charge management controllers and PIC16F684 microcontroller along with a MCP1630 pulse width modulator (PWM), shall be used as examples. LI-ION CHARGING The rate of charge or discharge is often expressed in relation to the capacity of the battery. This rate is known as the C-Rate. The C-Rate equates to a charge or discharge current and is defined as:
CONTRATO RED HAT ENTERPRISE AGREEMENT This Red Hat Enterprise Agreement, including all referenced El presente Contrato Red Hat para Empresas, incluidos todos los anexos appendices and documents located at URLs (the "Agreement"), is a que se ha hecho referencia y documentos ubicados en los URL (en between Red Hat Limited ("Red Hat") and the purchaser or user of