DataFrame
?A DataFrame
is a commonly used 2-dimensional data structure.
It is a table that consists of columns and rows and is used primarily as an object in pandas
.
This requires the pandas
library as shown below.
import pandas as pd
A DataFrame
can be formed as shown below. This one contains countries that have been put in different groups and are given a different a_score
and b_score
.
Both the scores are imaginary values for this example.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print(df)
sample()
functionThe sample
function randomly allows the selection of values from a DataFrame
or series.
It is useful if you want to select a random sample from a distribution.
The function prototype is as follows.
mysample = df.sample(n=5)
number of values using the n
parameter (n = 3)
or
a ratio with the frac
parameter (frac = 0.5)
The return value of this method is the requested sample.
The following example prints four of the rows first, and then half of the rows as shown below.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil', 'India', 'Beligium', 'Malaysia', 'Peru', 'England', 'Scotland']groups = ['A','A','B','A','B','B','C','A','C','C']df = pd.DataFrame({'group':groups, 'country':country, 'a_score':a_score, 'b_score':b_score})print("the main dataframe")print(df)print("")print("sample_num")sample_num = df.sample(n=3)print(sample_num)print("")print("sample_frac")sample_frac = df.sample(frac=0.5)print(sample_frac)