What is the proximity measure for nominal attributes?

Proximity measures for nominal attributes are like special tools used in data analysis and machine learning. They help us determine how similar or different things are when dealing with attribute categories that don’t follow a specific order. These categories can have two or more different values, and we call attributes with such characteristics nominal attributes. Unlike regular numbers that are easy to compare, nominal attributes need a different approach. These measures allow us to measure the likeness or difference between these categories, making it possible to work with them effectively.

Example

Suppose we have some fictional companies and their performance across various market segments as nominal attributes, and we want to see and analyze which companies are similar and which are not.

Company

Technology

Healthcare

Energy

A

strong

moderate

strong

B

moderate

strong

stable

C

stable

stable

strong

D

strong

moderate

stable

E

strong

strong

moderate

In the example above, we have five companies and three attributes: technology, healthcare, and energy. The categories “strong,” “moderate,” and “stable” represent the different levels or states of performance within each category.

Proximity measure formula for nominal attributes

In the example above, we have five objects, i.e., Object A, Object B, Object C, Object D, and Object E. The formula to calculate proximity measure for nominal attributes is:

pp: It is the total number of attributes between two objects.

mm: It is the total number of matches between two objects.

Pairs of objects

Let’s make pairs of five objects (A, B, C, D, E) to calculate the distance between them. We'll compute proximity measures for nominal attributes by considering pairs that are unique and not redundant, as comparing (a, b) is equivalent to comparing (b, a) due to the symmetry of the measure.

Pairs of objects
Pairs of objects

Calculate the distance between objects

Let’s calculate the proximity measure for nominal attributes using the above-mentioned formula. As we know, pp: the total number of attributes is 3, which will remain the same for all. mm: is the total number of matches between two objects, such as:

  • For objects C and A, i.e., we have “strong” for energy attribute.

  • For objects D and A, we have two attributes, i.e., “strong” for technology and “moderate” for healthcare.

The calculation for all the pairs is done below:

 Proximity measure for nominal attributes in term of dissmiliarity
Proximity measure for nominal attributes in term of dissmiliarity

The dissimilarity matrix is:

As we know, d(A, B) evaluates to 0 if objects A and B match and 1 if the objects differ. The closer the value is to 0, the more similar the attributes are; the closer the value is to 1, the more dissimilar they are. The summary is given below:

  • Companies B and A, C and B, D and C, and E and C completely differ in performance across various market segments.

  • Company D and A have a moderate level of similarity in their performance.

  • Companies C and A, D and B, E and A, E and B, and E and D have a moderate level of dissimilarity in their performance.

Proximity measure for nominal attributes helps us understand companies’ performances across different areas. These tools are also handy in marketing, letting businesses group customers by what they like. It’s a practical way to make better decisions in various industries.

Copyright ©2024 Educative, Inc. All rights reserved