Fitting a Square Peg in a Round Hole – Managing Unstructured Data

Enterprise data is expected to grow by 800% in five years, with 80% of that growth represented by unstructured data. Yet one of the biggest challenges organizations face is extracting intelligence from large volumes of unstructured data. Computing software will need to search for data and be required to match on the meaning of words, or more specifically the intent of words, not just on the existence of keywords. Is your company prepared for this growth? Sagence’s text analytics and information extraction solution can help meet this challenge.

Watch a baby play with a shape sorting cube—one where they have to take, say, a square block and fit it into its corresponding slot—and you’ll quickly realize why traditional computing software finds unstructured data challenging. Today this includes anything from images to PDF documents to social media posts to website information, which do not fit into the clearly defined collection of tables of data items found in relational databases. Unstructured data isn’t formally described and organized according to a relational model, which makes it difficult to efficiently manage, store, and analyze using existing tools and techniques.

Yet the growth of unstructured data is simply a reality businesses can no longer ignore. Gartner predicts that enterprise data will grow by 800% in five years, with 80% of that growth represented by unstructured data. This explosion contributes directly to the “Big Data” problems that businesses face. Is your company prepared for this growth? If your answer is no, you aren’t alone.

Just as a baby hasn’t developed the spatial skills required to manipulate a square block into its corresponding slot, most businesses have yet to develop the solutions, skillsets, and tools required to manage their unstructured data. Many companies store it in siloed repositories such as e-mails, file repositories, and cloud storage, making it difficult for people to find the data they need. Traditionally, companies resolve heterogeneous, siloed data problems by integrating and structuring data through building enterprise data warehouses with Extract-Transform-Load (ETL) technologies. However, for unstructured data, ETL data movement is simply not a feasible or an economical solution.

According to Gartner, one of the biggest challenges organizations face, is extracting intelligence from large volumes of unstructured data. This requires computing software to have a keen understanding of business context and to discriminate and identify data that’s most relevant to the business. This also means that searching for data will require matching on the meaning of words, or more specifically the intent of words, and not just on the existence of keywords. Traditional search capabilities (e.g., in SharePoint, e-mail servers) are limited and fail to anticipate changing business context that leads to incorrect search results.

While managing unstructured data poses unique sets of challenges that require unconventional technical, strategic solutions supported by non-traditional IT skillsets and tools, those who are able to rise to the challenge will gain a competitive advantage.

Contributed by Connie Koo and Alex Wu.