Association rules is a data mining and machine learning technique used to find the probability of relationships between large datasets' data items. For example, statistical analysis of users' interaction with a website via association rules can help developers improve the website.
These are useful for medical and analytical fields where data scientists represent the relationship between data sets based on specific criteria.
An association rule comprises an antecedent followed by a consequent.
An antecedent is just like an if
in an if-else
statement, while a consequent is the else
part. Machine learning models in data mining observe the relations and sequences formed by these association rules and make predictions about the most important patterns.
Data items in the data set are antecedents, while data items associated with them are known as consequents.
To understand antecedents and consequents better, let's take a look at an example:
In the diagram above, the machine learning models highlight some of the data items (highlighted in red). These highlighted data items are those that come before repetitive data items. After the identification, the ML models track what data items come next in combination with the highlighted items like 'cake' occurred after 'eggs' multiple times.
They observe a pattern and make predictions after numerous data parsing and traversing.
For example, the first rule may read, "If there is an egg, then cake may also be there." Likewise, the second rule may represent, "If there is a football, then cake may also be there." Furthermore, the third rule may say, "If there is a pie, then bread may also be there."
The example above is how association rules work, and data scientists and miners use this technique to perform statistical analysis and train their models.
Support and confidence are two concepts that determine the effectiveness of association rules.
Support refers to how often an association rule comes up in the mined database. In contrast, confidence refers to the number of times an association rule's result is accurate in practice.
Another element used for analysis is lift. It is the ratio of the observed frequency of co-occurrence and the expected frequency. The formula to calculate lift is given below:
These are the essential criteria for measuring an association rule's effectiveness.
Different fields of science use association rules for analysis and calculations. Some examples are:
Different symptoms give the probability of an illness with association rules data mining. Diagnosis and analysis further increase the procedure's scope where new signs and factors are added to the data set.
Observation of customer purchases in the database helps data mining models to make recommendations. A website integrated with such a model optimizes the catalog layout where the items users are more inclined to are shown first. Association rules help determine these preferences through the observation of buy patterns of users/customers.
RELATED TAGS
CONTRIBUTOR
View all Courses