Association Rule (Market Basket Analysis)— Your Business Booster

Availability of connectivity coverage, advancement of tools and devices push the effort in digitalization of most of our activities. Sometimes, we get lost in enormous amount of data that we have whether from internal databases, surveys, sample collections or web crawling. This post will discuss about ideas on data usage mainly in retail business based on association rule mining although the rule itself is appropriate for a number of industries including bioinformatics, manufacturing, pricing, etc. We will go through the idea and implementation.

Hshan.T
9 min readJan 10, 2022

Adoption and application of software or computer system to automate business process make every move leaves footprints. The initial purposes of these systems are for recording and storing business transactions automatically in more organized manner especially in retailing and e-commerce, with intention from business monitoring to management. Moving into new era, supercharge your business with data driven strategies and decision making using vast amount of records stored from day one. But, how? The term ASSOCIATION RULE sheds light on relevant questions. Content below will mainly focus on market basket analysis in retailing sector, but few examples of its application in different industries will be shared.

Association Rules

A rule-based method to discover relation or correlation and co-occurrence between variables, it is well represented by ‘if-then’ statement. However, do note that this is not a causal relationship. Each rule is composed of two elements, if {antecedent}, then {consequent}. ‘Antecedent’ does not cause ‘consequent’, instead ‘consequent’ happens in combination with ‘antecedent’. The model is trying to learn pattern out of a pool of variables in database where all data points seem to be independent. The model aims to search for set of relationships out of combinations of attributes and out of those relationship it tries to find interesting subsets to be defined as RULES. The idea is statistical based and is published long ago, it does not involve any advance technique but it is still quite useful for data mining today as it reveals insights. In fact, lots of building blocks of sophisticated models/methods that catching attentions today are published decades ago, popularity remains low until recent decade due to exploding development in computing power.

Let’s imagine we have a tabular data of shape N x M, such that each row represents a sample and column represent an attributes, so we have N samples with M attributes each. The rule defines relationship if {A} then {B}, where {A, B} are subset of attributes. Of course in practical, we are not looking for rule for each sample. Our goal is too uncover the statistically significant patterns out of N samples. If we have 100k samples, 10 of the samples share the same subset of attributes, it is not frequent and significant enough to call it a rule.

Use Cases for Association Rule

  • Market basket analysis — Typical example in retails, e-commerce or any shopping transactions, business will determine the sets of items that are most likely to be purchased together. This will be example for the more technical content discussed below in this post, since data is available open source.
  • Medical diagnosis — associated symptoms are analyzed for diseases diagnostic after the rule is scientifically proved, doctors will imply it as practice or evidence to detect diseases.
  • Loss Leader Strategy — Not all deals are expected to be profitable for every business, some deals are made at par with cost or at low profit level just to attract new customers with propensity to buy the associated items. The complete flow for this strategy is definitely involving some analysis on pricing individual item in subset identified through association rule.

Measures Definition

We mentioned in previous paragraph, each rule formed is statistical significant. Question, how do we compute and define its significance? There are quite a number of measures used to evaluate and judge how interesting a relationship is, including all-confidence, collective strength leverage and others that are beyond the scope of this post. Below are few common measures to be applied for market basket demonstration here:

  • support(A,B) — Probability of transactions where item A and item B occur together. The higher the better, as they are often bought together. Mathematically, this refers to the probability of intersection of A and B, P(A n B).
  • confidence(A -> B) — Probability of transactions having all items in A also have items in B. This is different from support in such a way that, we are talking about given a transaction has itemset A, how many of those transactions also have itemset B. Obviously, this is a conditional probability in mathematical term, P(B|A). It does not provide outlook on association in entire dataset, as compare to support.
  • lift(A -> B)—support(A, B) divided by product of support(A) and support(B) such that A and B are uncorrelated in the denominator. Lift of 1 means A and B are independent. Lift < 1 means substitute relationship between A and B (customer buys A might not buy B, vice versa). Lift >1 means strong association between A and B.
  • conviction(A -> B) — Expected frequency of incorrect association between A and B (A occurs by B does not) declared by the rule.
  • others, such as all confidence, collective strength, leverage, etc.

Algorithms

Next question comes after knowledge about measures used for evaluation is, how do we use it to perform the search across entire dataset to find the rules? There are multiples algorithms available for implementation based on your data properties. I am not planning to dive deep into each of these. To make it understandable, they will only be discussed along their implementation and illustration in current or future posts.

  • Apriori
  • Eclat
  • FP growth
  • Multi-Relation Association Rules (MRAR)

Demonstration of Market Basket Analysis

Scope of discussion for remaining sections will be narrowed to specific type of association rule application in retailing sectors, the market basket analysis (MBA). To be frank, MBA is simple. It used the most simple mathematics (typically set and probability) and analytics knowledge to provide the effective way to look for likelihood of various items to be purchased together in single transaction. Are you a online shopping lover? Is the term ‘frequently bought together’ familiar to you? Association rule is what happened behind the scene.

Insights drawn from MBA help to boost business by developing more effective marketing strategies in a number of ways:

  • store layout/display decision to increase cross selling
  • catalog design and cross marketing for online business
  • customized promotion for target customers
  • loss leader strategy
  • purchasing trends
  • customers needs and behavior

Implementation of Association Rules

Two main frequent itemsets mining approaches to be discussed and compared, Apriori, Eclat and FP Growth algorithms, using python language. You can easily get public dataset for MBA. For methods used for the remaining section, we do not consider number of units purchased, the information we need is just binary data. Each transaction is set of items. These mining algorithms extract frequent itemsets and association rule impart information on antecedents and consequents.

Apriori Algorithm

Given a support threshold, T, we are going to search for subsets of items that occur in at least T transactions. T is how we defined ‘frequent’ itemset. Apriori works in such a way that frequent subset of items are extended one at a time, which mean if we have {A, B} that are currently identified as frequent itemset, {A, B} will be extended to {A, B, C} in next step and count the number of transactions with this combination. This iterative process continues until there is no further extension found. Apriori holds on to the principle, if an subset is infrequent, its superset is infrequent, vice versa.

Fig 1: Searching for frequent itemsets with support threshold. (Apriori)
Fig 2: Dataset.
Fig 3: Apriori from mlxtend generating frequent itemsets.
Fig 4: Association rule from mlxtend generating antecedent and consequent. (Apriori)
Fig 5: Number of rules formed at different confident level. (Apriori)

Limitation of Apriori:

  • Inefficiency, searching time increases exponentially with dataset size. The algorithm scans whole dataset repeatedly for every subset formed to compare and test against the support threshold.
  • Memory issue, especially when the dataset is large.

Equivalent Class Clustering and Bottom-Up Lattice Traversal (ECLAT) Algorithm

As opposed to the breadth first search algorithm, Apriori, Eclat is depth first search algorithm. Theoretically, it is more efficient in term of computation time and memory required, and more scalable as compared to Apriori algorithm. Nonetheless, execution time is much longer than Apriori with library used below. The algorithm runs recursively such that those itemsets with more than T transactions are combined to form the itemset for next level of N, until there is no pairs can be formed. To make this clearer, figure below might give a better illustration on the process. Instead of running the process till N=4 as in Apriori, it stops at N=3.

Fig 6: Searching for frequent itemsets with support threshold. (ECLAT)
Fig 7: Input dataset format for pyEclat.
Fig 8: ECLAT mining.
Fig 9: Formatting output.

Limitation of Eclat:

  • Memory issue remains if we are expecting to have dataset with large subset of items in most of the transactions.
  • Does not provide more insights/information on the occurrences such as those computed in Apriori, Confidence, Lift, etc.

Frequent Pattern (FP) Growth Algorithm

FP-growth algorithm is seen as alternative to Apriori as it overcomes Apriori shortcomings by representing entire dataset in tree structure. The tree is growing and splitting as we go through each transaction by considering the support of each item in the transaction by descending order, more shared branches on top of the tree, smaller tree constructed. Items that do not meet the minimum support are discarded and it does not generate candidates itemsets. Hence, searching is reduced significantly and execution time is shorter. Illustration with diagrams below might give better intuition how the mining algorithm works.

Fig 10: Computing and ordering by support count. The table should be further expanding by forming itenset of size more than 1.
Fig 11(a): Tree construction based on simplified table given in Fig 7.
Fig 11(b): : Tree construction based on simplified table given in Fig 7.
Fig 12: FP-Growth from mlxtend, generating table of itemsets with support more than min set. (Using same dataset as Apriori)
Fig 13: Association rule from mlxtend generating antecedent and consequent. (FP-growth)
Fig 14: Number of rules formed at different confident level. (FP-Growth)

Limitation of FP-growth:

  • Cumbersome and expensive to build.
  • Memory issue. Long list of items is stored and scanned if setting low min_support_count.

Summary

Following the pace of technology and data science field advancing, dozens of sophisticated models published everyday. Most of the simple and basic models might be left out as we are amazed by how powerful the recent innovation is. Nonetheless, there is no model that best fit all problem but there is always the best model for a particular problem, we still need to do some testing to prove model proposed works. Why don’t we start from the fundamental solution for every situation encountered? If simple method solves the problem, why are we spending too much time and resources on complex model for negligible improvement? You might lose the best timing to stand ahead of other businesses. There are always times where simple methods outperform complex ones. This post covers only the popular and commonly known algorithms. Looking forward to put more effort in exploring list of algorithms available for association rule mining in future. Marking the end of this post by something I think worth a thought … will the amount of purchase for each item (so called item ‘weight’) in each transaction has some kind of contribution to MBA? And how?

--

--