Often, when we try to build a machine learning model for churn prediction, we are provided with supervised dataset where ‘churn customer’ are identified and labelled. Why does business implement subscription, membership and contract based business model? Besides information collection including personal details and transaction details, customers management and spending behavioral analysis, they can identify leaving customers with certainty and apply the dataset in future to figure out factors affecting customers’ propensity to terminate services. Unrenewed membership, terminated subscription and contracts are the best churn indicators. Nonetheless, there are cases where data available is below our expectation. …

Hard coding instruction may induce some limitations, we might miss some useful information or the thoughts are bounded. For instant, we are trying to find the rules to produce an accurate prediction for future events from the data available. The very first step in the process is asking yourself some questions, what do you going to learn? What are you going to predict? Do you have data available? What kinds of data you have? What types of learning problem is it? These questions are actually interrelated. Answer to the former question gives you clue about the next. Is it a supervised or unsupervised problem? If it is a supervised problem, is it a classification or regression problem? Here is some introduction about supervised and unsupervised learning. …

Telecommunication industry has been showing exponential growth in line with rising demand following technology advancement. The competitions among services providers are so fierce that they are executing different strategies to meet the customers’ needs. Effort in retaining existing customers is now as important as searching for new customers.

**Exploratory Analysis**

The dataset has 7032 instances and 21 columns, comprised of ID information, 3 numerical attributes, 16 categorical attributes and target (‘Churn’) column. There is no missing value.

Z-test or t-test come in place when comparing means of one to two populations. But, problem of error rate or Type I error (alpha) compounding arises in scenario of comparing more than two means. Let’s say we are testing 3 populations at alpha=0.05, applying three t-test resulted true alpha level in computation to be more than 0.05 but less than 0.15. ANOVA, a basic statistics analysis that is applicable to conduct hypothesis testing such that null hypothesis states all populations means are equal at predefined alpha level, eliminating compounding effect.

Relational database is a better option than spreadsheet to work with huge dimensions data. We might be facing replication, redundancy and inconsistency with spreadsheet. A systematic data storage allows more efficient and effective information management and retrieving process as compared to manual operation on spreadsheet. (Imagine a dataset with thousands of columns.) We are working to design a relational database that organize data in tables and is able to link to other tables by applying data modeling technique, ER modeling through a series of steps, conceptual, logical and physical data models. Let’s understand some simple terms for ER modeling:

**Entity:** Objects/ Components of data to be stored. …

**Introduction**

How much do you spend to attract new customers, as compared to the expenses on retaining the existing? To sustain and expand business, one should realize being able to retain existing customers is as important as exploring new customers. If the rate of customers leaving is greater than rate of new customers entering, our customers database is actually shrinking. To certain extend, we see customers retaining effort outweighs searching for new potential customers.

Not every deal is profitable, not all the customers are financially attractive to the business. It is crucial to ensure resources allocated or deployed are in line with profit or value a customer carries. …

**Introduction**

In supervised learning, we have target variables provided to be compared with prediction for judging model performance. We assume there is a unknown model, *f*, that best describe the data, our task is to find the estimate of *f. *The main sources of learning error in a model is noise, bias, and variance. Noise is irreducible by the learning process .Our goal is always to build a model with good generalization capability beyond training data.

**Bias-Variance**

Bias evaluates model learning ability, computing difference between true values and predicted values. Under most circumstance, we try to make some assumption about the model, for example, when applying linear regression, we assume input and output have linear relationship. Often the relationship n real world problem is non-linear, model estimated does not fit data well. The erroneous assumption leads to high bias. …

**Introduction**

Traditionally, human told a program the rules or how to do the jobs, it executes the instructions. Now, we are feeding machine with data, the algorithm learns and comes out with set of rules, producing a program to complete task applying the rules. Back then during my university study, my machine learning course began with introducing ideas about supervised and unsupervised learning. Soon, I was being exposed to ‘semi-supervised learning’. Identification of types of learning is the first step to a problem.

**Supervised Learning**

Labeled dataset is a set of data with predictors (input variables) and output (response/target variables). Supervised learning is where you have such a dataset, and you are searching for the best representative function that link or map predictors to relevant target. With availability of response variables, we are able to compare prediction and actual label, and hence modification applied to reduce misprediction and improve model. …

**Introduction**

OP is an organization formed by group of multinational oil production countries. It has a market share of around 44% of global crude oil supply. The members will comply with the cartel, to ensure the stabilization of oil supply and price fluctuation in the market. Members are expected to act in mutual interest. The problem arises as there is economic incentive for the members to cheat. This is a simultaneous move and infinitely repeated game with complete information. The complete payoff matrix is available for analysis. The players are choosing their strategy without knowledge of opponent’s choice and the game will be repeating when the supply-demand and oil price are fluctuating beyond the favorable range that may crash the oil market. …

**Introduction**

Linear regression is the first model we learn about regression analysis since high school. We are looking for regression line that fit the samples drawn from population with the most commonly used statistical method, ordinary least square regression (OLS). Weestimate model parameters. However, estimates functions get complex as we have more independent variables to be included in the model. Gradient descent (GD)is another option that is widely applied to a range of models training. It is simpler to implement for linear regression model.

Here, for simplicity, we are trying to build a linear regression from scratch for simple linear regression. …

About