Proximity measures for nominal attributes are like special tools used in data analysis and machine learning. They help us determine how similar or different things are when dealing with attribute categories that don’t follow a specific order. These categories can have two or more different values, and we call attributes with such characteristics nominal attributes. Unlike regular numbers that are easy to compare, nominal attributes need a different approach. These measures allow us to measure the likeness or difference between these categories, making it possible to work with them effectively.
Suppose we have some fictional companies and their performance across various market segments as nominal attributes, and we want to see and analyze which companies are similar and which are not.
Company | Technology | Healthcare | Energy |
A | strong | moderate | strong |
B | moderate | strong | stable |
C | stable | stable | strong |
D | strong | moderate | stable |
E | strong | strong | moderate |
In the example above, we have five companies and three attributes: technology, healthcare, and energy. The categories “strong,” “moderate,” and “stable” represent the different levels or states of performance within each category.
In the example above, we have five objects, i.e., Object A, Object B, Object C, Object D, and Object E. The formula to calculate proximity measure for nominal attributes is:
Let’s make pairs of five objects (A, B, C, D, E) to calculate the distance between them. We'll compute proximity measures for nominal attributes by considering pairs that are unique and not redundant, as comparing (a, b) is equivalent to comparing (b, a) due to the symmetry of the measure.
Let’s calculate the proximity measure for nominal attributes using the above-mentioned formula. As we know,
For objects C and A, i.e., we have “strong” for energy attribute.
For objects D and A, we have two attributes, i.e., “strong” for technology and “moderate” for healthcare.
The calculation for all the pairs is done below:
The dissimilarity matrix is:
As we know, d(A, B) evaluates to 0 if objects A and B match and 1 if the objects differ. The closer the value is to 0, the more similar the attributes are; the closer the value is to 1, the more dissimilar they are. The summary is given below:
Companies B and A, C and B, D and C, and E and C completely differ in performance across various market segments.
Company D and A have a moderate level of similarity in their performance.
Companies C and A, D and B, E and A, E and B, and E and D have a moderate level of dissimilarity in their performance.
Proximity measure for nominal attributes helps us understand companies’ performances across different areas. These tools are also handy in marketing, letting businesses group customers by what they like. It’s a practical way to make better decisions in various industries.