Will they? Won't they? - Use Artificial Intelligence to predict customer behaviour!
by Dr. Shashi Barak, on Jun 6, 2019 8:14:00 PM
Estimated reading time: 3 mins
As enterprises strive to increase their market share and stay ahead in a competitive business environment, they need to increasingly take the technology recourse. Predictive Analytics is a technology, which studies the behavior of existing as well as potential customers through social media channels and their online website activities to predict future scenarios and their probability of purchase.
By processing the gathered information to collect key features, including essential patterns and attributes, and selecting only the most appropriate features helps to create a simple yet powerful predictive model. Predictive Analytics has helped Amazon increase its sales by 30%. As per a report from McKinsey, predictive maintenance helps reduce call centre cost by approx 20-50%.
What is Predictive Analytics?
Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning, which analyze current and historical facts to make predictions about future or otherwise unknown events.
How to create a powerful Predictive Model?
A highly effective model is built using a minimal set of features, which contain enough information to train a model to make accurate predictions/classifications.
Selection of useful features out of N given features is quite complex. Mathematically, we have 2N number of possible subsets. For example, for a problem with 20 features, we have 1048576 number of possible subsets of features, so one can easily guess the complexity if there are 40 features to deal with.
To avoid the computational burden, the number of selected input variables should not be too high. Similarly, it should not be too less either as the input variables would not be able to provide essential information.
To build an efficient model, the number of feature vectors should be optimum so that the behavior of the given phenomenon can be described with minimum non-redundant features with informative variables.
Feature selection method:
Feature selection methods can be categorized into three categories: filters, wrappers, and embedded methods.
- Filter method: In filter methods, a numerical index is obtained for each variable using some statistical test such as Pearson’s correlation coefficient, information gain, mutual information, maximum relevance, etc. Generally, the variables which are characterized by the high index are finally selected. If two features contain the same information, one can be removed using either correlation or mutual information or some other criterion.
- Wrapper method: Wrapper method uses actual prediction/classification algorithm to build a model with a subset of features and then evaluates its performance. It tries different subsets of features and the subset for which model shows the best performance is selected. One drawback of this approach is that it is computationally very expensive.
- Embedded method: Unlike filter and wrapper methods, in embedded methods, feature selection is integrated with machine learning part and is used for model generation.
The main difference between embedded method and filter method is while embedded method requires iterative updates, the model parameters are selected according to the model performance. The wrapper method considers only the model performance of the selected set of features.
Why Wrapper approach is preferred over Filter and Embedded methods?
Suppose, we have 26 features viz. a,b,c,d,...,z. Out of these 26 features, each individual attribute may not be informative by itself, but a combination of them may be (for example: Perhaps b and c have no information separately, but (b + c) or b*c, on the other hand, might have some information). Now, Filter feature selection approach may miss it as it evaluates features in isolation, not in combination, but Wrapper approach can leverage this information as it is using a prediction or classification algorithm actually for evaluation.
In Summary:
To sum up, the optimal set of features should contain the minimum number of input variables, which are required to describe the behavior of the considered system or phenomenon with minimum redundant variables, which at the same time provide maximum information. A more accurate, efficient, simple and easily interpretable model can be built if the optimal set of input variables is identified.
The rule of thumb, which is followed for feature selection across analytics is that Filter method is used when the number of features is large in the dataset while Wrapper method is used when the number of features is moderate. But in practice, it's usually a better idea to use Wrapper method for key feature selection as it takes the performance of the actual classifier you want to use into account, and different classifiers vary widely in the usage of information.
Using these simple, yet powerful techniques to build a predictive model, enterprises can improve their sales and OpEx figures and stay ahead in a globally competitive business environment.