How do we define ‘churned customer’? How to initialize a churn model building process when you have unlabeled or unsupervised data? Defining ‘Churned’ for Unlabeled Dataset might be more challenging than building a supervised predicting model with acceptable accuracy.

Image for post
Image for post

Often, when we try to build a machine learning model for churn prediction, we are provided with supervised dataset where ‘churn customer’ are identified and labelled. Why does business implement subscription, membership and contract based business model? Besides information collection including personal details and transaction details, customers management and spending behavioral analysis, they can identify leaving customers with certainty and apply the dataset in future to figure out factors affecting customers’ propensity to terminate services. Unrenewed membership, terminated subscription and contracts are the best churn indicators. Nonetheless, there are cases where data available is below our expectation. …


Traditionally, we used hard code a series of steps or procedures, input data, and program will output results. As machine learning evolves, we apply algorithm to learn from historical data and it tells the program what to do and how to complete the task. Can you see the sequence of intermediate actions are different?

Image for post
Image for post

Hard coding instruction may induce some limitations, we might miss some useful information or the thoughts are bounded. For instant, we are trying to find the rules to produce an accurate prediction for future events from the data available. The very first step in the process is asking yourself some questions, what do you going to learn? What are you going to predict? Do you have data available? What kinds of data you have? What types of learning problem is it? These questions are actually interrelated. Answer to the former question gives you clue about the next. Is it a supervised or unsupervised problem? If it is a supervised problem, is it a classification or regression problem? Here is some introduction about supervised and unsupervised learning. …


Emphasizing customer retention as much as exploring new potential customers ensuring business sustainability and growth. Building a churn predicting model to identify underlying churning factors and customers inclining to leave. Choosing the right evaluation metrics helps to build more practically useful model.

Image for post
Image for post

Telecommunication industry has been showing exponential growth in line with rising demand following technology advancement. The competitions among services providers are so fierce that they are executing different strategies to meet the customers’ needs. Effort in retaining existing customers is now as important as searching for new customers.

Exploratory Analysis

The dataset has 7032 instances and 21 columns, comprised of ID information, 3 numerical attributes, 16 categorical attributes and target (‘Churn’) column. There is no missing value.


Have you ever thought of how and where to apply ANOVA? We have to make dozens of decisions every day, how do we judge which is a better one among all options available? Or, they are equally good? ANOVA is the answer that helps us to make a wiser decision.

Image for post
Image for post

Z-test or t-test come in place when comparing means of one to two populations. But, problem of error rate or Type I error (alpha) compounding arises in scenario of comparing more than two means. Let’s say we are testing 3 populations at alpha=0.05, applying three t-test resulted true alpha level in computation to be more than 0.05 but less than 0.15. ANOVA, a basic statistics analysis that is applicable to conduct hypothesis testing such that null hypothesis states all populations means are equal at predefined alpha level, eliminating compounding effect.


Rapid advancement of technologies has led to information explosion. We are leaving our ‘footprint’ for all interactions done online. Surging availability of information creates challenges in data storage and management, and thought of searching the values of data. Let’s demonstrate designing process of a simple database for a website.

Image for post
Image for post

Relational database is a better option than spreadsheet to work with huge dimensions data. We might be facing replication, redundancy and inconsistency with spreadsheet. A systematic data storage allows more efficient and effective information management and retrieving process as compared to manual operation on spreadsheet. (Imagine a dataset with thousands of columns.) We are working to design a relational database that organize data in tables and is able to link to other tables by applying data modeling technique, ER modeling through a series of steps, conceptual, logical and physical data models. Let’s understand some simple terms for ER modeling:

Entity: Objects/ Components of data to be stored. …


What are the similar behaviors of your customers? What are the answers to questions in business? Customers segmentation is the solution but how we do it? How can you make use of it for decision making? RFM analysis is applied here with Python, exhibiting its simplicity and use of most basic set of information available with purchasing records.

Image for post
Image for post

Introduction

How much do you spend to attract new customers, as compared to the expenses on retaining the existing? To sustain and expand business, one should realize being able to retain existing customers is as important as exploring new customers. If the rate of customers leaving is greater than rate of new customers entering, our customers database is actually shrinking. To certain extend, we see customers retaining effort outweighs searching for new potential customers.

Not every deal is profitable, not all the customers are financially attractive to the business. It is crucial to ensure resources allocated or deployed are in line with profit or value a customer carries. …


What is bias, variance? How to interpret learning curve? How do we diagnose bias and variance? And, what should we do to deal with it?

Introduction

In supervised learning, we have target variables provided to be compared with prediction for judging model performance. We assume there is a unknown model, f, that best describe the data, our task is to find the estimate of f. The main sources of learning error in a model is noise, bias, and variance. Noise is irreducible by the learning process .Our goal is always to build a model with good generalization capability beyond training data.

Image for post
Image for post
Is oversized good? (from disneyclips.com)

Bias-Variance

Bias evaluates model learning ability, computing difference between true values and predicted values. Under most circumstance, we try to make some assumption about the model, for example, when applying linear regression, we assume input and output have linear relationship. Often the relationship n real world problem is non-linear, model estimated does not fit data well. The erroneous assumption leads to high bias. …


Kick start machine learning with ideas of supervised, semi-supervised and unsupervised learning. Question first comes into your mind when you are given a data set, is it labeled, unlabeled, or partially labeled?

Image for post
Image for post

Introduction

Traditionally, human told a program the rules or how to do the jobs, it executes the instructions. Now, we are feeding machine with data, the algorithm learns and comes out with set of rules, producing a program to complete task applying the rules. Back then during my university study, my machine learning course began with introducing ideas about supervised and unsupervised learning. Soon, I was being exposed to ‘semi-supervised learning’. Identification of types of learning is the first step to a problem.

Supervised Learning

Labeled dataset is a set of data with predictors (input variables) and output (response/target variables). Supervised learning is where you have such a dataset, and you are searching for the best representative function that link or map predictors to relevant target. With availability of response variables, we are able to compare prediction and actual label, and hence modification applied to reduce misprediction and improve model. …


Discussing and illustrating simplified game. A strategic situation with pure strategy.

Image for post
Image for post

Introduction

OP is an organization formed by group of multinational oil production countries. It has a market share of around 44% of global crude oil supply. The members will comply with the cartel, to ensure the stabilization of oil supply and price fluctuation in the market. Members are expected to act in mutual interest. The problem arises as there is economic incentive for the members to cheat. This is a simultaneous move and infinitely repeated game with complete information. The complete payoff matrix is available for analysis. The players are choosing their strategy without knowledge of opponent’s choice and the game will be repeating when the supply-demand and oil price are fluctuating beyond the favorable range that may crash the oil market. …


Linear regression is modeling the linear relationships between real-valued predictors and target variables. We find the function that best representative of the data.

Image for post
Image for post

Introduction

Linear regression is the first model we learn about regression analysis since high school. We are looking for regression line that fit the samples drawn from population with the most commonly used statistical method, ordinary least square regression (OLS). Weestimate model parameters. However, estimates functions get complex as we have more independent variables to be included in the model. Gradient descent (GD)is another option that is widely applied to a range of models training. It is simpler to implement for linear regression model.

Here, for simplicity, we are trying to build a linear regression from scratch for simple linear regression. …

About

Hs.T

Data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store