What is FP-Growth?
Overview
Frequent pattern-growth (FP-Growth) is the mining of pattern itemsets, subsequences, and substructures that appear frequently in a dataset.
A Frequent itemset refers to the most common items bought together. A Subsequence where items are bought by a customer is called a frequent sequential pattern.
A Substructure refers to different structural forms, such as subgraphs and subtrees, that are combined with frequent structural patterns to analyze and find relations with different items of frequent itemsets.
Note: An example of frequent itemset mining is Market Basket Analysis.
Generating Frequent Pattern-Growth
To generate frequent pattern-growth, we do the following:
- In the first scan of the database, we try to find the minimum support of
itemsets. - If the frequency of items is less than the specified minimum support, we simply delete them from the database.
- The set of frequent items is sorted in the descending order of support.
- Then, an FP-tree is constructed.
- The initial node of the FP-tree is called the
rootnode. It is labelednull. - The items in the transactions are processed when the node is created.
- Whenever we try to pass the
itemset, we can create achildnode and expand it if theitemsetfollows therootnode. - If a branch encounters a common prefix for the existing path, we increment the value of the
rootnode. - We repeat the process for each
itemsetuntil we reach the end of theitemsetsin the transaction. - To facilitate tree traversal, we build an item header table so that each item points to occurrences in thetree via a chain of
node-links.
1 of 6
Frequent Item Sets are:
{K,E,M,O:1},{K,E,O:1},{K,M:1}}{{K,E,M:1},{K,E:2}}{{K,E:2},{K:1}}{{K:4}}