Let us quickly recap. In the previous article, we discussed how data-driven decision-making is soon going to be the future of all successful enterprises. And we touched upon the importance of data quality in being able to successfully adopt AI as part of their strategy. In this article, we will talk more about data and the challenges associated with using it for AI applications.
While deploying an AI application to solve a business problem it is very important to begin at the beginning. Like all important things in life, these may sound very basic and simple but is the key to the final success of the project.
- Define the problem – What is it that you want the AI application to do?
- Output to Outcome – How will you use the AI output to improve your business outcome?
- Check the Data – Do you have the right set of data to solve this problem?
While the first two can also be very challenging to answer, we are focusing on the 3rd problem in this article.
At this stage, it is important to make a clear distinction between two very specific types of outcomes (there are others as well) that are typically desired by the business while deploying an AI application.
- To augment the human capabilities
- To automate the basic, simple and repetitive tasks — to release humans for more valuable efforts
A simple example will demonstrate this key difference. Consider large sets of data from all possible areas of the business brought together in one place to look for hidden insights on customer behavior, sales projections, next bubble, etc,. An employee looking at this would probably do a good job. However, a cleverly selected AI algorithm would do a better job and would be really fast and efficient. For the second desired outcome, consider an existing process where employees are doing mundane, and repetitive tasks but utilize some unique human faculty such as looking at an image to derive meaning or listening to a conversation to respond adequately. Trying to put an AI application to replace (partially or completely) the human from this workflow is not a trivial job and is far more complicated than the prior example of parsing data. Consequently, selecting the right AI algorithm, picking and curating the right set of data to train them, deploying the AI and putting in place a continuous learning loop are all very different in terms of scale and complexity in the two examples above. The AI used for the first example is generally called Machine Learning (ML) and the second one is a specific subset of ML called Deep Learning (DL). In both examples of using ML or DL to solve a specific business problem, the challenges related to the data are probably the trickiest to handle but in case of DL, it is even more so.
Before we look at the actual data and related challenges that it poses, there is one more important thing that needs be touched upon and that is the concept of supervised and unsupervised learning. A large proportion of the AI applications used today are for business applications that require definitive outputs which are best produced by algorithms such as regression or classification and they use the supervised learning method. Whereas problems like clustering can be better solved by unsupervised learning algorithms. In supervised learning, as the name suggests, we tell the AI algorithm what a set of input data means and what meaning (prediction/inference/class) should be derived from it. For example, if the input is an image of a cat then we also need to provide an additional input data field (label) along with the image which says this image is a cat. If you extend this to large and complex datasets, that real-life business situations have, then it is easy to see how quickly this can become a huge exercise of data tagging and labeling that needs to be done before we can feed the data to any AI application. And this now becomes a part of the whole data cleansing and preparing exercise.
We will take a closer look at the processes pertaining to the data pipeline and its management in the next discussion. Based on what we have discussed until now, it seems likely that considerable resources need to be devoted to prepare the data for enabling a technology-led transformation where AI can be used to make data-driven decisions.