Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python

How to derive a new column of hashes from other columns in pandas

Khizar Hayat Saani

The pandas library in Python creates and manages data frames. We use this extensively when cleaning data for use in Machine Learning algorithms.

Using the pandas library, we can derive a new column of hashes based on existing columns (or features) using two main methods.

Direct method

We can hash a new column directly by performing the required operation on the desired column element-wise.

This is a fast and simple method of hashing a new column into the data frame.

import pandas as pd

# Creating a Data Frame
df = pd.DataFrame({'Item':['Milk', 'Eggs', 'Sugar', 'Bread'],
                    'Quantity':[2, 12, 5, 1],
                    'Price':[80, 100, 150, 120]})

print ('Old Data Frame:\n')
print(df)

print('\n------------------------------------------------\n')

# Hashing a new column 'Cost'
df['Cost'] = df['Quantity'] * df['Price']

print('New Data Frame:\n')
print(df)

In the above code, we begin by creating a data frame that contains everyday grocery items, the quantity bought, and the individual price of the items.

Using the direct method, we hash a new column, Cost, that shows the total cost of buying the items. The cost is calculated by multiplying the Quantity and Price column values in each row.

Apply method

We can also hash a new column using the apply() function of a data frame.

The apply() function allows us to pass a function and apply it onto every row of the data frame.

Since we pass a user-defined function to apply(), this method is a lot more specific and powerful than the direct method.

import pandas as pd

# Creating a Data Frame
df = pd.DataFrame({'Item':['Milk', 'Eggs', 'Sugar', 'Bread'],
                    'Price':[80, 100, 150, 120]})

print ('Old Data Frame:\n')
print(df)

print('\n------------------------------------------------\n')

# defining function
def stock_status(row):
  if row['Item'] == 'Sugar':
    return 'Out of Stock'
  return 'In Stock'

# Hashing a new column 'Cost'
df['Status'] = df.apply(lambda x: stock_status(x), axis=1)

print('New Data Frame:\n')
print(df)

Here, we continue with the same data frame. We define a function (stock_status()) that takes each row of the data frame and checks whether the item is Sugar.

If the item is Sugar, the function returns "Out of Stock" and adds it to the corresponding Status column for that row. For all other cases, the function returns "In Stock".

RELATED TAGS

python

CONTRIBUTOR

Khizar Hayat Saani
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring