What is proximity measure for ordinal attributes?

Proximity measures are essential tools in data analysis, specifically for ordinal data (ranked or rated data). They allow us to quantify relationships between data points by tools like Spearman’s rank correlation or Goodman and Kruskal’s gamma coefficient. These measures, crucial for tasks like clustering and classification, reveal valuable patterns and structures by quantifying the proximity or dissimilarity between data points.

Let's understand how to calculate the proximity measure for ordinal attributes using the example below.

Example: proximity measure for ordinal attributes

Suppose we have a table with five ranks, i.e., Excellent, Very Good, Good, Fair, and Poor. For these ranks, we have an ordinal attribute named Test, as given below:

Step 1: Replace each value for the Test attribute by its rank

For each data point in the dataset, let’s determine its numeric rank based on each value of the Test attribute. We are doing this because it helps maintain the order of the attributes, making it easier to accurately measure the distance or similarity between them. These ranks are assigned in ascending order, starting from 1 as the lowest and incrementing to 5 as the highest. Let’s start doing it:

Object 1, has a rank Excellent and obtains a numeric rank value 5 since it is the highest value among the data points.
Object 2, has a rank Good, and obtains a numeric rank value 3 since it represents the third highest value in the dataset.

The updated table after assigning ranks looks like this:

In our case:

Distance between Object 1 and 2:|1 - 0.5| = 0.5
Distance between Object 1 and 3:|1 - 0| = 1
Distance between Object 1 and 4:|1 - 0.5| = 0.5
Distance between Object 1 and 5:|1 - 0.75| = 0.25
Distance between Object 1 and 6:|1 - 0.25| = 0.75
Distance between Object 1 and 7:|1 - 0| = 1
Distance between Object 1 and 8:|1 - 0.5| = 0.5
Distance between Object 1 and 9:|1 - 0.25| = 0.75
Distance between Object 1 and 10:|1 - 0.75| = 0.25

Note: There’s no need to separately calculate the upper right triangle when calculating the left lower triangle of the dissimilarity matrix, as they are symmetrical.

Similarly, calculate this for the rest of the pairs. The dissimilarity matrix would look like:

As a result, we can observe that:

Objects 1 and 3 are the most dissimilar, with a dissimilarity score of 1.00.
Objects 3 and 7 are highly similar, with a dissimilarity score of 0.00.
Objects 5 and 10 are also highly similar, with a dissimilarity score of 0.00.
Objects 1 and 6 are highly dissimilar, with a dissimilarity score of 0.75.
Objects 3 and 9 are also moderately similar, with a dissimilarity score of 0.25.
Objects 1 and 2 are moderately dissimilar, with a dissimilarity score of 0.50.
Objects 2 and 4 are also highly similar, with a dissimilarity score of 0.00.

In conclusion, calculating dissimilarity matrices using appropriate proximity measures for ordinal attributes is instrumental in revealing patterns within ranked data.

Unlock your potential: Proximity Measures series, all in one place!

To deepen your understanding of proximity measures, explore our series of Answers below:

What is the proximity measure for nominal attributes?
Learn how to measure similarity and dissimilarity between nominal attributes using metrics like simple matching and Hamming distance.
What is the proximity measure for ordinal attributes?
Discover how to handle ordinal data by applying ranking-based distance measures such as Manhattan and Euclidean distances.
What are proximity measures for binary attributes?
Understand different proximity measures for binary attributes, including Jaccard similarity, Hamming distance, and cosine similarity.

What is proximity measure for ordinal attributes?

Example: proximity measure for ordinal attributes

Step 1: Replace each value for the Test attribute by its rank

Step 2: Normalize the ranking

Step 3: Use Euclidean distance to find the dissimilarity matrix

Object Identifier	Test
1	Excellent
2	Good
3	Poor
4	Good
5	Very Good
6	Fair
7	Poor
8	Good
9	Fair
10	Very Good

Object Identifier	Test
1	1
2	0.5
3	0
4	0.5
5	0.75
6	0.25
7	0
8	0.5
9	0.25
10	0.75