Establishing Rosetta Stones for Your Company’s Historical Data

Far too often, designers of relational data models forget the lesson of the Rosetta Stone. They focus their time and energy on modeling data scenarios that exist right now, forgetting that the dimension of time must always be accounted for in a database. Without this consideration, older data can quickly become a useless and forgotten inconvenience, rather than an asset that can be leveraged for measuring historical trends and for recreating singular points in time.

After the fall of the Western Roman Empire, the ability to read and translate the Ancient Egyptian language was lost. Egyptian hieroglyphics were common sights in ruins and artifacts that survived from the days of the pharaohs, but over the subsequent 1,400 years, no living scholar possessed the knowledge to understand them. This all changed in 1798 after the discovery of a notable slab in the town of Rosetta. This Rosetta Stone was inscribed with text for a decree commemorating the crowning of King Ptolemy V in Egypt, and the decree was copied from Egyptian hieroglyphics to Egyptian demotic and then to ancient Greek, all on the same stone. In 1822, the French scholar Jean-François Champollion was able to draw upon knowledge of the Greek alphabet and a basic understanding of demotic to produce an alphabet for the Egyptian hieroglyphics. Using this basic roadmap, the world’s collective knowledge of the Egyptian language expanded so that surviving relics and texts around the world could be translated.

Far too often, designers of relational data models forget the lesson of the Rosetta Stone. They focus their time and energy on modeling data scenarios that exist right now, forgetting that the dimension of time must always be accounted for in a database. Without this consideration, older data can quickly become a useless and forgotten inconvenience, rather than an asset that can be leveraged for measuring historical trends and for recreating singular points in time. Why is the proper storage of historical data so important for an organization? An example may help.

At health insurance companies, benefits data such as covered services and deductible limits are maintained for each health plan. On rare occasions, this data changes outside of the normal enrollment transfer periods. For example, a state government might mandate that a particular service is covered by all insurance plans starting on a particular date, or a minor plan provision might be changed to correct a mistake in the benefits setup process (yes, this does happen). So if a claim arrives on May 7 for adjudication, the claim must be processed according to the rules in place on the claim’s date of service, which we’ll assume is April 29. But what would happen if the plan’s benefits changed on May 1? The claim might need to be adjudicated against a snapshot of the benefits that existed before May 1.

When the requirements are this complex, some data modelers choose to incorporate two sets of start and end dates, one for the business effective date of the record and one for the system effective date. This approach causes new implications. When looking at historical claims, how do we know which set of benefits were in place at the time? The solution could be the storage of a pointer to the benefits rule package that was effective at the time. In certain examples like this, the current snapshot of data is used more than 99% of the time, while historical data is only needed in less than 1% of transactions. An additional strategy for dealing with this type of historical data is to store data in tables that span the time dimension but then use a view exposing only the most recent point-in-time data. Most transactions can query the view, but the base tables would be used to answer any questions about historical data.

There are no hard and fast rules for these tricky concepts. Data architecture always has been more art than science, and each of these concepts can inspire an entire textbook’s worth of discussion. The good news for companies is that many talented data architects exist, ones who understand the subtleties that are required to construct a relational model that properly accounts for time. There’s no need for historical data to be an untranslatable relic of the past. Instead, well-crafted data architecture can unlock benefits and insights from historical and point-in-time data.

Contributed by Kevin Henderson and Alan Matsumura.