Creating a DataFrame From Arrays and Lists

A pandas DataFrame can be created in a number of ways, let's see how we can do it.

Create a DataFrame from a Numpy ndarray

Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray.

You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError.

If you pass a raw Numpy ndarray, the index and column names start at 0 by default. You can also assign different column names to your data which will be discussed in a later lesson.

import pandas as pd
import numpy as np
d = np.random.normal(size=(2,3))
print("The original Numpy array")
print(d)
print("---------------------")
s = pd.DataFrame(d)
print("The DataFrame ")
print(s)

A Numpy ndarray is created on line 4, which is a matrix of size 2*3.

Line 9 shows how to create a DataFrame object from a Numpy ndarray by passing the ndarray object to pd.DataFrame.

Create a DataFrame from a dictionary of lists

We have already learned how to create a pandas Series from a dictionary. We can also create a DataFrame object from a dictionary of lists. The difference is that in a series, the key is the index whereas, in a DataFrame, object, the key is the column name.

When you are trying to specify an index for each column value, only the rows with the same index value will be joined. Otherwise, a new row is created, and its columns are filled by NaN if the type is int or float.

Of course, you can specify an index for each column value by nesting a dictionary in another dictionary.

The example code below shows both single and multi-level indexing in the DataFrame.

Create DataFrame from a Dict of list without index
Create DataFrame from a Dict of list without index
import pandas as pd
# example 1: init a dataframe by dict without index
d = {"a": [1, 2, 3, 4], "b": [2, 4, 6, 8]}
df = pd.DataFrame(d)
print("The DataFrame ")
print(df)
print("---------------------")
print("The values of column a are {}".format(df["a"].values))
# example 2: init a dataframe by dict with different index
d = {"a": {"a1":1, "a2":2, "c":3}, "b":{"b1":2, "b2":4, "c":9}}
df = pd.DataFrame(d)
print("The DataFrame ")
print(df)

A Python dict is created on line 4, then is passed to pd.DataFrame to create a DataFrame object.

In line 12, a nested Python dictionary is created, then is passed to pd.DataFrame to create a DataFrame object on line 13.