Enhanced Data Quality Improved Predictive Modeling at a World Renowned Hospital

Share Via


Sagence’s expertise in data quality, data visualization and enterprise data warehousing helped a world-renowned hospital successfully utilize predictive models to improve outcome measures by targeting the frequency and location of data entry errors, creating an enterprise data warehouse to connect disparate systems and developing robust analytic dashboards. Our solution enabled clinicians, data scientists and administrators access to data previously siloed across the hospital. With improved data quality increasing the accuracy of the data being ingested by the models and dashboards and increasing the ability to visualize and share data, our client is making data-driven decisions that make a difference in peoples lives.


The hospital invested in data scientists to build predictive models. The models were designed to enable psychometricians to refine instruments and improve outcome measures. But, as is common across all industries, there was little confidence in the quality of the data feeding those models. According to a recent report published by CrowdFlower, data scientists still spend 53% of their time collecting, labeling, cleaning and organizing data. Only 9% of their time is spent refining algorithms, 10% on mining data for patterns and 19% on building and modeling data. As reported in the 2016 Harvard Business Review, IBM’s estimate of the yearly cost of poor data quality was $3.1 trillion in the US alone. The hospital knew they needed to improve data quality.


Remediating the data quality issues presented a challenge for the data scientists. Identifying when, where and by who poor data was being entered into the hospital’s EHR (the hospital uses Cerner as their health information technology platform) was nearly impossible because it relied on controls set in the front-end of Cerner. Further complicating the hospital’s ability to make data-driven decisions was a lack of data visualization. While a data scientist can quickly make sense of numbers in a spreadsheet, hospital administrators found the task time consuming and wanted a better solution that would work across departments and job functions.


The root of the hospital’s problem wasn’t a lack of vision, unqualified data scientists, or a badly defined data strategy. They were aware that experience and expertise matter and they knew they required a data specialist to help them. They turned to Sagence to find a solution to the data quality and data visualization challenges and to build a data warehouse to bridge existing data silos.


Sagence first set out to solve the data quality issue, the hospital’s barrier to data-driven decision-making. The team scanned 1000+ data fields across 50+ therapeutic assessments and five disciplines to determine how to visualize the data and to provide plots of numeric and categorical data. We then conducted interviews with therapists, nurses, and clinicians to learn acceptable thresholds for outcome measurements. For example, nurses were asked to provide limits for heart rates, body temperatures and other vital signs that, when exceeded, would be life threatening. With this information, the team was able to create a valid values table to detect data quality issues. We wrote data quality rules and used them to develop two analytic dashboards visually representing the data. We also evaluated business intelligence tools for the hospital, which led to a software selection.


The first dashboard had an immediate impact on the ability of the data scientists to confidently use their predictive models. The dashboard tracks data quality issues and monitors the overall health of the hospital’s therapeutic assessment and clinical measures data stored in Cerner by analyzing the frequency and location of data entry errors. For example, this dashboard demonstrates when and where data entry errors occur and enables timely data remediation opportunities; meaning the data used by the data scientists is either clean before going into the models, or the model can later be modified and adjusted for known data quality issues.


The second dashboard enables nurses, doctors, therapists, statisticians, and researchers to understand basic usage statistics for various therapeutic assessments. For example, the time it takes to climb a flight of stairs and groups of clinical measures, like oral temperature or heart rate.


It should also be noted that therapists often rely on academic papers to provide norms for patient performance on assessments; however, these papers frequently cite low sample sizes (50-100) and have sample populations that are not consistent with a hospital’s patient mix. Our client is now using the collected data from the enterprise data warehouse to establish hospital-specific norms, providing a more comprehensive context of patient performance and progress. Therapists can view plots of 400+ data fields, including information on the distributions of assessment scores for the hospital’s population, relative frequency of clinical measures regarding patient functionality, and potential outliers in the data.


The enterprise data warehouse virtually eliminates the need to collect data by providing it directly to the analysts. With the new dashboards clinicians, data scientists and administrators are observing distributions and basic statistics for numerous clinical measures. This information is providing an understanding of the data and associated value ranges, enabling data-driven decision-making, re-education, and continuous improvement opportunities in areas like outlier detection, execution of therapeutic assessments and data intake methods.