Source – huffingtonpost.com
If an enterprise has Big Data on hand, the urge to convert it into actionable information that would help business thrive seems obvious. Let’s zoom in on how we can apply Machine Learning to business. To illustrate the case, we’ll use the results of a problem-solving session that took place at a datathon.
The Knot to Unravel: Real-world Client Data
A bank, a sponsor of the datathon, provided proprietary real-world client data in an anonymized form. The datathon participants were to analyze the datasets by generating multiple hypotheses and identifying the viable ones. The problem was expected to be solved using cluster analysis, a method of unsupervised learning.
Unsupervised Learning – a machine learning algorithm that teaches the computer system to identify inferences in datasets. They would consist of input data without labeled responses. Cluster analysis helps find hidden patterns or grouping in data based on specific parameters. For instance, this could be segmentation of subscribers of a mobile services provider.
Multiparameter Data as Basis for Hypotheses
The team was to make hypotheses using data with disparate input parameters. They included description of the product or service acquired, the amount paid using the credit cards issued by the bank, and user demographics – age and sex. The majority of the data fell into categories based on high-level payment destinations – shops, gas stations, services, etc. Some of the categories had a more detailed level of description, for instance, AliExpress, Uber, Burger King, iTunes. The major hypothesis made by the team was as follows: if they analyzed the user money spending patterns, they would generate a rather informative user portrait.
Data Processing, Pattern and Correlation Analysis
Processing Unlabeled Data
The team processed the unlabeled data as follows: reduced its dimensionality, performed clustering and correlation analysis. For this, they used such tech and tools as Python, t-SNE, DBSCAN, and Matplotlib. The participants also sanity-checked the data against the real-world parameters. Thus, they identified an outlier in payment destination values that amounted to 16,000 for an Uber ride. When studied closely, the amount turned out to have a foreign currency attribute. Once the team converted the amounts in major currencies to a common currency and screened the rare ones, the data became more informative.
Identifying Patterns and Correlations
The team managed to identify several meaningful patterns and interdependencies. The graph analysis and cluster analysis used by the participants demonstrated a correlation among the clients commuting via Uber and those who shop on iTunes. Another cluster located nearby showed that credit card holders with foreign currency accounts are young people who are regulars at local bars and restaurants.
Unbiased Hypotheses Verification
The Datathon participants test-proved that unsupervised learning is a great fit for validating unbiased hypotheses. This method does not aim to identify cause-and-effect relations or achieve stable results, which otherwise may add subjectivity to the data processing results. For instance, the assumption that an average fast-food lover would frequent different fast-food brands did not prove valid during this problem-solving session.
The unsupervised learning method will benefit those service providers who have accumulated large data volumes about their clients. It enables analysts and marketers to obtain an impartial insight into customer behavior: how the client activity changes if the company has modified its service or introduced a new one; whether the existing service offering has a weak spot, which needs fixing.