Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

pandas
dataframe
python
communitycreator

How to use the sample() function on a DataFrame in pandas

Sheza Naveed

What is a DataFrame?

A DataFrame is a commonly used 2-dimensional data structure.

It is a table that consists of columns and rows and is used primarily as an object in pandas.

sample dataframe

Reqiurements

This requires the pandas library as shown below.


import pandas as pd

Code

Example

A DataFrame can be formed as shown below. This one contains countries that have been put in different groups and are given a different a_score and b_score.

Both the scores are imaginary values for this example.

import pandas as pd

a_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]
b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]

country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']
groups = ['A','A','B','A','B','B','C','A','C','C']

df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})
print(df)
An example dataframe

The sample() function

The sample function randomly allows the selection of values from a DataFrame or series.

It is useful if you want to select a random sample from a distribution.

Syntax

The function prototype is as follows.


mysample = df.sample(n=5)

Parameter

  • number of values using the n parameter (n = 3)

    or

  • a ratio with the frac parameter (frac = 0.5)

Return value

The return value of this method is the requested sample.

Example

The following example prints four of the rows first, and then half of the rows as shown below.

import pandas as pd

a_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]
b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]

country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']
groups = ['A','A','B','A','B','B','C','A','C','C']

df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})
print("the main dataframe")
print(df)

print("")
print("sample_num")

sample_num = df.sample(n=3)
print(sample_num)

print("")
print("sample_frac")

sample_frac = df.sample(frac=0.5)
print(sample_frac)
sample function

RELATED TAGS

pandas
dataframe
python
communitycreator
RELATED COURSES

View all Courses

Keep Exploring