Open In App

Different ways to create Pandas Dataframe

Last Updated : 08 Oct, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

It is the most commonly used Pandas object. The pd.DataFrame() function is used to create a DataFrame in Pandas. There are several ways to create a Pandas Dataframe in Python.

Example: Creating a DataFrame from a Dictionary

Python
import pandas as pd

# initialize data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'],
        'Age': [20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)

print(df)

Output:

 Name  Age
0    Tom   20
1   nick   21
2  krish   19
3   jack   18

Explanation: Here, a dictionary named data is created. The dictionary contains two keys: 'Name' and 'Age'.

  • The value for 'Name' is a list of names: ['Tom', 'nick', 'krish', 'jack'].
  • The value for 'Age' is a list of corresponding ages: [20, 21, 19, 18].
  • This dictionary structure is suitable for creating a DataFrame, as it allows each key to represent a column in the resulting DataFrame.

Pandas Create Dataframe Syntax

pandas.DataFrame(data, index, columns)

Parameters:

  • data: It is a dataset from which a DataFrame is to be created. It can be a list, dictionary, scalar value, series, and arrays, etc.
  • index: It is optional, by default the index of the DataFrame starts from 0 and ends at the last data value(n-1). It defines the row label explicitly.
  • columns: This parameter is used to provide column names in the DataFrame. If the column name is not defined by default, it will take a value from 0 to n-1.

Returns:

  • DataFrame object

Now that we have discussed about DataFrame() function, let’s look at Different ways to Create Pandas Dataframe.

Pandas DataFrames are essential for effective data handling and analysis in Python. Each method offers unique advantages depending on the data source and format.You can enroll in our Complete Machine Learning & Data Science Program to explore these techniques to leverage the full potential of Pandas for your data-centric tasks.Gain hands-on experience with Pandas DataFrames and learn advanced techniques

Create an Empty DataFrame

Pandas Create Dataframe can be created by the DataFrame() function of the Pandas library. Just call the function with the DataFrame constructor to create a DataFrame.

Python
# Importing Pandas to create DataFrame
import pandas as pd

# Creating Empty DataFrame and Storing it in variable df
df = pd.DataFrame()

print(df)

Output:

Empty DataFrame
Columns: []
Index: []

Creating a DataFrame from Lists or Arrays

Python
import pandas as pd

# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(df)

Output:

 Name  Age
0   tom   10
1  nick   15
2  juli   14

Explanation: To create a Pandas DataFrame from a list of lists, you can use the pd.DataFrame() function. This function takes a list of lists as input and creates a DataFrame with the same number of rows and columns as the input list.

Create DataFrame from List of Dictionaries

Python
import pandas as pd

# Initialize data to lists.
data = [{'a': 1, 'b': 2, 'c': 3},
        {'a': 10, 'b': 20, 'c': 30}]

# Creates DataFrame.
df = pd.DataFrame(data)

print(df)

Output:

a   b   c
0   1   2   3
1  10  20  30

Explanation: Pandas DataFrame can be created by passing lists of dictionaries as input data. By default, dictionary keys will be taken as columns.

Another example is to create a Pandas DataFrame by passing lists of dictionaries and row indexes.

Python
import pandas as pd

# Initialize data of lists
data = [{'b': 2, 'c': 3}, {'a': 10, 'b': 20, 'c': 30}]

# Creates pandas DataFrame by passing
# Lists of dictionaries and row index.
df = pd.DataFrame(data, index=['first', 'second'])

print(df)

Output:

b   c     a
first    2   3   NaN
second  20  30  10.0

Creating a DataFrame from Another DataFrame

Python
original_df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
    'Age': [20, 21, 19, 18]
})

new_df = original_df[['Name']] 
print(new_df)

Output:

    Name
0    Tom
1   Nick
2  Krish
3   Jack

Explanation: You can create a new DataFrame based on an existing DataFrame by selecting specific columns or rows.

Create DataFrame from a Dictionary of Series

Python
import pandas as pd

# Initialize data to Dicts of series.
d = {'one': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd']),
     'two': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd'])}

# creates Dataframe.
df = pd.DataFrame(d)

print(df)

Output:

   one  two
a   10   10
b   20   20
c   30   30
d   40   40

Explanation: To create a dataframe in Python from a dictionary of series, a dictionary can be passed to form a DataFrame. The resultant index is the union of all the series of passed indexed.

Create DataFrame using the zip() function

Python
import pandas as pd

# List1
Name = ['tom', 'krish', 'nick', 'juli']

# List2
Age = [25, 30, 26, 22]

# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip(Name, Age))

# Assign data to tuples.
list_of_tuples


# Converting lists of tuples into
# pandas Dataframe.
df = pd.DataFrame(list_of_tuples,
                  columns=['Name', 'Age'])

print(df)

Output:

 Name  Age
0    tom   25
1  krish   30
2   nick   26
3   juli   22

Explanation: Two lists can be merged by using the zip() function. Now, create the Pandas DataFrame by calling pd.DataFrame() function.

Create a DataFrame by Proving the Index Label Explicitly

Python
import pandas as pd

# initialize data of lists.
data = {'Name': ['Tom', 'Jack', 'nick', 'juli'],
        'marks': [99, 98, 95, 90]}

# Creates pandas DataFrame.
df = pd.DataFrame(data, index=['rank1',
                               'rank2',
                               'rank3',
                               'rank4'])

# print the data
print(df)

Output:

 Name  marks
rank1   Tom     99
rank2  Jack     98
rank3  nick     95
rank4  juli     90

Explanation: To create a DataFrame by providing the index label explicitly, you can use the index parameter of the pd.DataFrame() constructor. The index parameter takes a list of index labels as input, and the DataFrame will use these labels for the rows of the DataFrame.

Different ways to create Pandas Dataframe – FAQs

What are the methods for DataFrame in Python?

Some common methods for Pandas DataFrame include:

  • head(): Returns the first n rows.
  • tail(): Returns the last n rows.
  • info(): Provides a summary of the DataFrame.
  • describe(): Generates descriptive statistics.
  • sort_values(): Sorts the DataFrame by specified columns.
  • groupby(): Groups the DataFrame using a mapper or by series of columns.
  • merge(): Merges DataFrame or named series objects with a database-style join.
  • apply(): Applies a function along the axis of the DataFrame.
  • drop(): Removes specified labels from rows or columns.
  • pivot_table(): Creates a pivot table.
  • fillna(): Fills NA/NaN values.
  • isnull(): Detects missing values.

Which data types can be used to create DataFrame?

DataFrames can be created using various data types including:

  • Dictionaries of arrays, lists, or series.
  • Lists of dictionaries.
  • 2D NumPy arrays.
  • Series.
  • Another DataFrame

import pandas as pd import numpy as np # From a dictionary of lists df1 = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]}) # From a list of dictionaries df2 = pd.DataFrame([{‘A’: 1, ‘B’: 4}, {‘A’: 2, ‘B’: 5}, {‘A’: 3, ‘B’: 6}]) # From a 2D NumPy array df3 = pd.DataFrame(np.array([[1, 4], [2, 5], [3, 6]]), columns=[‘A’, ‘B’]) # From a series df4 = pd.DataFrame({‘A’: pd.Series([1, 2, 3]), ‘B’: pd.Series([4, 5, 6])})

How many data types are there in a Pandas DataFrame?

A pandas DataFrame can contain multiple data types across its columns, such as:

  • int64 : Integer values.
  • float64 : Floating-point values.
  • object : Text or mixed types.
  • datetime64[ns] : Date and time values.
  • bool : Boolean values.

You can check the data types of a DataFrame using the dtypes attribute.

df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4.0, 5.1, 6.2], ‘C’: [‘x’, ‘y’, ‘z’]}) print(df.dtypes) # Output: # A int64 # B float64 # C object # dtype: object

Why use DataFrame instead of a Dataset?

DataFrames are specifically designed for data manipulation and analysis, offering several advantages over general datasets:

  • Integrated handling of missing data.
  • Label-based indexing for rows and columns.
  • Powerful data alignment and broadcasting.
  • Extensive functionality for data manipulation, aggregation, and transformation.
  • Better performance for operations involving structured data.
  • Integration with a variety of data sources and file formats.

What type is a DataFrame in Pandas?

In pandas, a DataFrame is of the type pandas.core.frame.DataFrame.

import pandas as pd df = pd.DataFrame({‘A’: [1, 2, 3], ‘B’: [4, 5, 6]}) print(type(df)) # Output: <class ‘pandas.core.frame.DataFrame’>



Next Article

Similar Reads

Practice Tags :
three90RightbarBannerImg
  翻译: