“The enterprise of the future will thrive on data-driven decision making”.
You have probably heard this being said and reiterated everywhere by futurists, industry analysts, and data scientists. Every company generates and owns vast amounts of data and even more so with the digital revolution sweeping the globe. What better opportunity than to put all this data to good use by deploying Artificial Intelligence (AI) based systems that can derive deep insights from all this data and provide key inputs to make winning business decisions.
While that could be largely true but what gets lost in the fine print is the journey that one needs to undertake from being an enterprise that simply owns (or has access to) a lot of data to a successful company that uses this data to make business decisions that impact real business metrics – like those related to revenue (top line, bottom line or both), customers (acquisition, retention etc), organizational operations, or others.
What makes this different and more interesting than other transformational journeys undertaken by enterprises, is that here ‘data’ takes the center stage. Earlier, the data was used only as an input while the business logic was independent of the data, codified separately into the application that crunched the input data and generated outputs that could be consumed by the business. Now, the data is the input and the business logic – if one can even call it that – is also hidden somewhere within the data as invisible patterns to be revealed by the use of complex AI algorithms. The output is more data which can be consumed by the business only after further post-processing and delivered in human-friendly formats. So, the entire process is truly driven by data with the AI algorithm sitting in between as a black box. This is the new paradigm of AI-driven applications and data-driven enterprises.
What immediately follows from this is that the data becomes the lifeblood of these systems and therefore requires a lot of attention of its own. One of the key aspects of data is its quality. And that can mean a lot of things but broadly it measures how readily usable the data is. The important thing to remember is that until now data quality was defined and measured by its ability to be consumed either by humans or existing (non-AI) systems. It was therefore captured, prepared and stored accordingly, often with additional metadata and other forms of derived data. What happens when we feed this existing data, as is, to the AI systems? To understand this, we need to take a quick look under the hood of AI applications. Before an AI application is put into production and made to work with real data, it needs to ‘learn’ how to do so. This activity is carried out in the lab and is called ‘Training’. During the training phase, the AI applications require the data to be fed in a specific form with fields and values in particular format and order and also devoid of any additional information that is not required in the learning process. Without these constraints, the AI system will ‘learn’ incorrectly and derive incorrect or inconsistent meaning from the input data. Picking the right AI algorithm for the given data set and curing the data to choose the right fields, their types, forms, frequency, etc. becomes key to getting the desired business output. This is as much an art as science and primarily what data scientists do. Consequently, once these systems go live, and start working with real data, we have to put in place a pre-processing mechanism to cleanse and prep the data for AI system’s consumption, so that we keep getting correct and consistent outputs.
There is considerable work to be done in this step and according to some estimates, almost 60% of the work that data scientists do is related to cleansing and prepping the data. The complexity and scale of this task is often not understood and almost always underestimated when allocating time, budget and resources by the enterprises.
In the next discussion, we will take a closer look at some of these challenges in data preparation and how they impact the adoption of AI and some thoughts on how enterprises should approach this subject.
If you are working in this area and have encountered some of the things discussed here, please do share your experience in the comments section.