Open In App

How to Save Pandas Dataframe as gzip/zip File?

Last Updated : 26 Nov, 2020
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

Pandas is an open-source library that is built on top of NumPy library. It is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users. 

Converting to zip/gzip file

The to_pickle() method in Pandas is used to pickle (serialize) the given object into the file. This method utilizes the syntax as given below :

Syntax:

DataFrame.to_pickle(self, path,
                   compression='infer',
                   protocol=4)

This method supports compressions like zip, gzip, bz2, and xz. In the given examples, you’ll see how to convert a DataFrame into zip, and gzip.

Example 1: Save Pandas Dataframe as zip File

Python3




# importing packages
import pandas as pd
  
# dictionary of data
dct = {'ID': {0: 23, 1: 43, 2: 12,
  
              3: 13, 4: 67},
  
       'Name': {0: 'Ajay', 1: 'Deep',
  
                2: 'Deepanshi', 3: 'Mira',
  
                4: 'Yash'},
  
       'Marks': {0: 89, 1: 97, 2: 45, 3: 78,
  
                 4: 56},
  
       'Grade': {0: 'B', 1: 'A', 2: 'F', 3: 'C',
  
                 4: 'E'}
       }
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# using to_pickle function to form file
# by default, compression type infers from the file extension in specified path.
# file will be created in the given path
data.to_pickle('file.zip')


Output:

 

Example 2: Save Pandas Dataframe as gzip File.

Python3




# importing packages
import pandas as pd
  
# dictionary of data
dct = {"C1": range(5), "C2": range(5, 10)}
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# using to_pickle function to form file
# we can also select compression type
# file will be created in the given path
data.to_pickle('file.gzip')


Output:

Reading zip/gzip file

In order to read the created files, you’ll need to use read_pickle() method. This method utilizes the syntax as given below:

pandas.read_pickle(filepath_or_buffer,  
               compression='infer')

Example 1: Reading zip file

Python3




# reading from the zip file
pd.read_pickle('file.zip')


Output:

 

Example 2: Reading gzip File.

Python3




# reading from gzip file
pd.read_pickle('file.gzip')


Output:

From the above two examples, we can see both of the compressed files can be read by the read_pickle() method without any changes except for the file extension.


Level up your coding with DSA Python in 90 days! Master key algorithms, solve complex problems, and prepare for top tech interviews. Join the Three 90 Challenge—complete 90% of the course in 90 days and earn a 90% refund. Start your Python DSA journey today!


Practice Tags :

Similar Reads

NumPy save() Method | Save Array to a File
The NumPy save() method is used to store the input array in a binary file with the 'npy extension' (.npy). Example: Python3 import numpy as np a = np.arange(5) np.save('array_file', a) SyntaxSyntax: numpy.save(file, arr, allow_pickle=True, fix_imports=True) Parameters: file: File or filename to which the data is saved. If the file is a string or P
2 min read
How to Save Pandas Dataframe as gzip/zip File?
Pandas is an open-source library that is built on top of NumPy library. It is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users. Converting to
2 min read
How to save file with file name from user using Python?
Prerequisites: File Handling in PythonReading and Writing to text files in Python Saving a file with the user's custom name can be achieved using python file handling concepts. Python provides inbuilt functions for working with files. The file can be saved with the user preferred name by creating a new file, renaming the existing file, making a cop
5 min read
Convert given Pandas series into a dataframe with its index as another column on the dataframe
First of all, let we understand that what are pandas series. Pandas Series are the type of array data structure. It is one dimensional data structure. It is capable of holding data of any type such as string, integer, float etc. A Series can be created using Series constructor. Syntax: pandas.Series(data, index, dtype, copy) Return: Series object.
1 min read
How to Convert Wide Dataframe to Tidy Dataframe with Pandas stack()?
We might sometimes need a tidy/long-form of data for data analysis. So, in python's library Pandas there are a few ways to reshape a dataframe which is in wide form into a dataframe in long/tidy form. Here, we will discuss converting data from a wide form into a long-form using the pandas function stack(). stack() mainly stacks the specified index
4 min read
Converting Pandas Dataframe To Dask Dataframe
In this article, we will delve into the process of converting a Pandas DataFrame to a Dask DataFrame in Python through several straightforward methods. This conversion is particularly crucial when dealing with large datasets, as Dask provides parallel and distributed computing capabilities, allowing for efficient handling of substantial data volume
3 min read
Python | Pandas DataFrame.fillna() to replace Null values in dataframe
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Just like the pandas dropna() method manages and rem
5 min read
Pandas Dataframe rank() | Rank DataFrame Entries
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.  Pandas DataFrame rank() method returns a rank of every respective entry (1 through n) along an axis of the DataFrame passed. The rank is retu
3 min read
Pandas DataFrame to_dict() Method | Convert DataFrame to Dictionary
Python is a great language for doing data analysis because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.  Pandas .to_dict() method is used to convert a DataFrame into a dictionary of series or list-like data type depending on the orient parameter. Exam
3 min read
Pandas DataFrame assign() Method | Create new Columns in DataFrame
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, making importing and analyzing data much easier.The Dataframe.assign() method assigns new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones
4 min read
  翻译: