What are the rules of association in data mining?

Overview of association rules in data mining

Association rules is a data mining and machine learning technique used to find the probability of relationships between large datasets' data items. For example, statistical analysis of users' interaction with a website via association rules can help developers improve the website.

These are useful for medical and analytical fields where data scientists represent the relationship between data sets based on specific criteria.

Association rules' working

An association rule comprises an antecedent followed by a consequent.

An antecedent is just like an if in an if-else statement, while a consequent is the else part. Machine learning models in data mining observe the relations and sequences formed by these association rules and make predictions about the most important patterns.

Data items in the data set are antecedents, while data items associated with them are known as consequents.

Example

To understand antecedents and consequents better, let's take a look at an example:

In the diagram above, the machine learning models highlight some of the data items (highlighted in red). These highlighted data items are those that come before repetitive data items. After the identification, the ML models track what data items come next in combination with the highlighted items like 'cake' occurred after 'eggs' multiple times.

They observe a pattern and make predictions after numerous data parsing and traversing.

For example, the first rule may read, "If there is an egg, then cake may also be there." Likewise, the second rule may represent, "If there is a football, then cake may also be there." Furthermore, the third rule may say, "If there is a pie, then bread may also be there."

The example above is how association rules work, and data scientists and miners use this technique to perform statistical analysis and train their models.

Effectiveness criteria

Support and confidence are two concepts that determine the effectiveness of association rules.

Support refers to how often an association rule comes up in the mined database. In contrast, confidence refers to the number of times an association rule's result is accurate in practice.

Another element used for analysis is lift. It is the ratio of the observed frequency of co-occurrence and the expected frequency. The formula to calculate lift is given below:

$Lift=\frac {P(x,y)}{(P(x) \times P(y))}$

The numerator is the joined probability of items $x$ and $y$ .
The denominator is the product of the probabilities of items $x$ and $y$ .

These are the essential criteria for measuring an association rule's effectiveness.

Real-life applications

Different fields of science use association rules for analysis and calculations. Some examples are:

Medical domain

Different symptoms give the probability of an illness with association rules data mining. Diagnosis and analysis further increase the procedure's scope where new signs and factors are added to the data set.

Marketing domain

Observation of customer purchases in the database helps data mining models to make recommendations. A website integrated with such a model optimizes the catalog layout where the items users are more inclined to are shown first. Association rules help determine these preferences through the observation of buy patterns of users/customers.

Free Resources