Trusted answers to developer questions

Adithya Challa

In specific data-mining applications such as clustering, it is essential to find how similar or dissimilar objects are to each other.

A **similarity measure** for two objects $(i,j)$ will return `1`

if similar and `0`

if dissimilar.

A **dissimilarity measure** works just opposite to how the similarity measure works, i.e., it returns `1`

if dissimilar and `0`

if similar.

Similarity and dissimilarity measures help remove the outliers. Their use quickly eliminates redundant data since they help identify potential outliers as highly dissimilar objects to others.

The measure of similarity and dissimilarity is referred to as **proximity**.

The measure of similarity can often be measured as a function of a measure of dissimilarity.

*Similarity and dissimilarity measures can be calculated as:*

$dis (i,j)= 1-(m/p)=p-m/p$

$sim(i,j)=1-dis(i,j) = m/p$

- $i,j$ are row and column values of the
**dissimilarity matrix**. - $m$ is several matches for which $i,j$ are in the same state.
- $p$ is a total number of attributes.

A **dissimilarity matrix** stores a collection of *proximities* that are available for all pairs of

In a dissimilarity matrix,

Let’s look at an example and try to find similarity and dissimilarity measures.

Obj Id |
Grade |
Progress |
Numeric |
---|---|---|---|

1 | A | Excellent | 45 |

2 | B | Fair | 22 |

3 | C | Good | 64 |

4 | A | Excellent | 28 |

While constructing a *dissimilarity matrix*, we give the value of `1`

for *dissimilar* objects and `0`

for *similar* things.
For a *similarity matrix*, it is vice-versa.

The proximity measure for the grade attribute is calculated below.

The *dissimilarity matrix* values are calculated as shown below:

$dis(2,1)=(A,B) =1$

$dis(3,1)=(C,A) =1$

$dis(3,2)=(A,B) =1$

$dis(4,1)=(A,A) =0$

$dis(4,2)=(A,B) =1$

$dis(4,3)=(A,C) =1$

The *similarity matrix* values for this are shown below:

$sim(2,1)=1-dis(2,1) =0$

$sim(3,1)=1-dis(3,1)=0$

$sim(3,2)=1-dis(3,2) =0$

$sim(4,1)=1-dis(4,1) =1$

$sim(4,2)=1-dis(4,2) =0$

$sim(4,3)=1-dis(4,3) =0$

The matrices from the example problem are given below:

RELATED TAGS

data mining

communitycreator

CONTRIBUTOR

Adithya Challa

RELATED COURSES

View all Courses

Keep Exploring

Related Courses