Open In App

How to Save Pandas Dataframe as gzip/zip File?

Last Updated : 26 Nov, 2020
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Pandas is an open-source library that is built on top of NumPy library. It is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users. 

Converting to zip/gzip file

The to_pickle() method in Pandas is used to pickle (serialize) the given object into the file. This method utilizes the syntax as given below :

Syntax:

DataFrame.to_pickle(self, path,
                   compression='infer',
                   protocol=4)

This method supports compressions like zip, gzip, bz2, and xz. In the given examples, you’ll see how to convert a DataFrame into zip, and gzip.

Example 1: Save Pandas Dataframe as zip File

Python3




# importing packages
import pandas as pd
  
# dictionary of data
dct = {'ID': {0: 23, 1: 43, 2: 12,
  
              3: 13, 4: 67},
  
       'Name': {0: 'Ajay', 1: 'Deep',
  
                2: 'Deepanshi', 3: 'Mira',
  
                4: 'Yash'},
  
       'Marks': {0: 89, 1: 97, 2: 45, 3: 78,
  
                 4: 56},
  
       'Grade': {0: 'B', 1: 'A', 2: 'F', 3: 'C',
  
                 4: 'E'}
       }
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# using to_pickle function to form file
# by default, compression type infers from the file extension in specified path.
# file will be created in the given path
data.to_pickle('file.zip')


Output:

 

Example 2: Save Pandas Dataframe as gzip File.

Python3




# importing packages
import pandas as pd
  
# dictionary of data
dct = {"C1": range(5), "C2": range(5, 10)}
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# using to_pickle function to form file
# we can also select compression type
# file will be created in the given path
data.to_pickle('file.gzip')


Output:

Reading zip/gzip file

In order to read the created files, you’ll need to use read_pickle() method. This method utilizes the syntax as given below:

pandas.read_pickle(filepath_or_buffer,  
               compression='infer')

Example 1: Reading zip file

Python3




# reading from the zip file
pd.read_pickle('file.zip')


Output:

 

Example 2: Reading gzip File.

Python3




# reading from gzip file
pd.read_pickle('file.gzip')


Output:

From the above two examples, we can see both of the compressed files can be read by the read_pickle() method without any changes except for the file extension.



Next Article

Similar Reads

NumPy save() Method | Save Array to a File
The NumPy save() method is used to store the input array in a binary file with the 'npy extension' (.npy). Example: C/C++ Code import numpy as np a = np.arange(5) np.save('array_file', a) SyntaxSyntax: numpy.save(file, arr, allow_pickle=True, fix_imports=True) Parameters: file: File or filename to which the data is saved. If the file is a string or
2 min read
How to save file with file name from user using Python?
Prerequisites: File Handling in PythonReading and Writing to text files in Python Saving a file with the user's custom name can be achieved using python file handling concepts. Python provides inbuilt functions for working with files. The file can be saved with the user preferred name by creating a new file, renaming the existing file, making a cop
5 min read
Difference Between Spark DataFrame and Pandas DataFrame
Dataframe represents a table of data with rows and columns, Dataframe concepts never change in any Programming language, however, Spark Dataframe and Pandas Dataframe are quite different. In this article, we are going to see the difference between Spark dataframe and Pandas Dataframe. Pandas DataFrame Pandas is an open-source Python library based o
3 min read
Pandas Dataframe.to_numpy() - Convert dataframe to Numpy array
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). This data structure can be converted to NumPy ndarray with the help of the DataFrame.to_numpy() method. In this article we will see how to convert dataframe to numpy array. Syntax of Pandas DataFrame.to_numpy()
3 min read
Convert given Pandas series into a dataframe with its index as another column on the dataframe
First of all, let we understand that what are pandas series. Pandas Series are the type of array data structure. It is one dimensional data structure. It is capable of holding data of any type such as string, integer, float etc. A Series can be created using Series constructor. Syntax: pandas.Series(data, index, dtype, copy) Return: Series object.
1 min read
How to Convert Wide Dataframe to Tidy Dataframe with Pandas stack()?
We might sometimes need a tidy/long-form of data for data analysis. So, in python's library Pandas there are a few ways to reshape a dataframe which is in wide form into a dataframe in long/tidy form. Here, we will discuss converting data from a wide form into a long-form using the pandas function stack(). stack() mainly stacks the specified index
4 min read
Replace values of a DataFrame with the value of another DataFrame in Pandas
In this article, we will learn how we can replace values of a DataFrame with the value of another DataFrame using pandas. It can be done using the DataFrame.replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc. from a DataFrame, Values of the DataFrame method are get replaced with another value dynamically.
4 min read
Converting Pandas Dataframe To Dask Dataframe
In this article, we will delve into the process of converting a Pandas DataFrame to a Dask DataFrame in Python through several straightforward methods. This conversion is particularly crucial when dealing with large datasets, as Dask provides parallel and distributed computing capabilities, allowing for efficient handling of substantial data volume
3 min read
Pandas Dataframe rank() | Rank DataFrame Entries
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.  Pandas DataFrame rank() method returns a rank of every respective entry (1 through n) along an axis of the DataFrame passed. The rank is retu
3 min read
Pandas DataFrame to_dict() Method | Convert DataFrame to Dictionary
Python is a great language for doing data analysis because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.  Pandas .to_dict() method is used to convert a DataFrame into a dictionary of series or list-like data type depending on the orient parameter. Exam
3 min read
Pandas DataFrame assign() Method | Create new Columns in DataFrame
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, making importing and analyzing data much easier. The Dataframe.assign() method assigns new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original one
4 min read
Python | Pandas DataFrame.fillna() to replace Null values in dataframe
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Just like the pandas dropna() method manages and rem
5 min read
Pandas DataFrame hist() Method | Create Histogram in Pandas
A histogram is a graphical representation of the numerical data. Sometimes you'll want to share data insights with someone, and using graphical representations has become the industry standard. Pandas.DataFrame.hist() function plots the histogram of a given Data frame. It is useful in understanding the distribution of numeric variables. This functi
4 min read
Pandas DataFrame interpolate() Method | Pandas Method
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.  Python Pandas interpolate() method is used to fill NaN values in the DataFrame or Series using various interpolation techniques to fill the m
3 min read
Pandas DataFrame duplicated() Method | Pandas Method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas duplicated() method identifies duplicated rows in a DataFrame. It returns a boolean series which is True only for unique rows. Ex
3 min read
How to Filter and save the data as new files in Excel with Python Pandas
Sometimes you will want to filter and save the data as new files in Excel with Python Pandas as it can help you in selective data analysis, data organization, data sharing, etc. In this tutorial, we will learn how to filter and save the data as new files in Excel with Python Pandas. This easy guide will tell you the techniques you need to perform t
3 min read
Read a zipped file as a Pandas DataFrame
In this article, we will try to find out how can we read data from a zip file using a panda data frame. Why we need a zip file? People use related groups of files together and to make files compact, so they are easier and faster to share via the web. Zip files are ideal for archiving since they save storage space. And, they are also useful for secu
2 min read
How to read a CSV file to a Dataframe with custom delimiter in Pandas?
Python is a good language for doing data analysis because of the amazing ecosystem of data-centric python packages. pandas package is one of them and makes importing and analyzing data so much easier.Here, we will discuss how to load a csv file into a Dataframe. It is done using a pandas.read_csv() method. We have to import pandas library to use th
3 min read
How to export Pandas DataFrame to a CSV file?
Let us see how to export a Pandas DataFrame to a CSV file. We will be using the to_csv() function to save a DataFrame as a CSV file. DataFrame.to_csv() Syntax : to_csv(parameters) Parameters : path_or_buf : File path or object, if None is provided the result is returned as a string. sep : String of length 1. Field delimiter for the output file. na_
3 min read
Pandas - DataFrame to CSV file using tab separator
Let's see how to convert a DataFrame to a CSV file using the tab separator. We will be using the to_csv() method to save a DataFrame as a csv file. To save the DataFrame with tab separators, we have to pass "\t" as the sep parameter in the to_csv() method. Approach : Import the Pandas and Numpy modules.Create a DataFrame using the DataFrame() metho
1 min read
Export Pandas dataframe to a CSV file
Suppose you are working on a Data Science project and you tackle one of the most important tasks, i.e, Data Cleaning. After data cleaning, you don't want to lose your cleaned data frame, so you want to save your cleaned data frame as a CSV. Let us see how to export a Pandas DataFrame to a CSV file. Pandas enable us to do so with its inbuilt to_csv(
2 min read
Exporting Pandas DataFrame to JSON File
Let us see how to export a Pandas DataFrame as a JSON file. To perform this task we will be using the DataFrame.to_json() and the pandas.read_json() function. Example 1 : C/C++ Code # importing the module import pandas as pd # creating a DataFrame df = pd.DataFrame([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']], index =['row 1', 'row 2', 'row3
2 min read
Exporting DTA File Using pandas.DataFrame.to_stata() function in Python
This method is used to writes the DataFrame to a Stata dataset file. “dta” files contain a Stata dataset. DTA file is a database file and it is used by IWIS Chain Engineering. Syntax : DataFrame.to_stata(path, convert_dates=None, write_index=True, time_stamp=None) Parameters : path : str, buffer or path objectconvert_dates : dictwrite_index : boolt
1 min read
How to load a TSV file into a Pandas DataFrame?
In this article, we will discuss how to load a TSV file into a Pandas Dataframe. The idea is extremely simple we only have to first import all the required libraries and then load the data set by using various methods in Python. Dataset Used: data.tsv Using read_csv() to load a TSV file into a Pandas DataFrame Here we are using the read_csv() metho
1 min read
How to Append Pandas DataFrame to Existing CSV File?
In this discussion, we'll explore the process of appending a Pandas DataFrame to an existing CSV file using Python. Add Pandas DataFrame to an Existing CSV File. To achieve this, we can utilize the to_csv() function in Pandas with the 'a' parameter to write the DataFrame to the CSV file in append mode. Pandas DataFrame to_csv() Syntax Syntax : df.t
3 min read
Log File to Pandas DataFrame
Log files are a common way to store data generated by various applications and systems. Converting these log files into a structured format like a Pandas DataFrame can significantly simplify data analysis and visualization. This article will guide you through the process of converting log files into Pandas DataFrames using Python, with examples and
6 min read
Exporting a Pandas DataFrame to an Excel file
Sometimes we need an Excel file for reporting, so as a coder we will see how to export Pandas DataFrame to an Excel file. The to_excel() function in the Pandas library is utilized to export a DataFrame to an Excel sheet with the .xlsx extension. Syntax # saving the exceldataframe_name.to_excel(file_name.xlsx)Table of Content Using to_excel() Functi
4 min read
How To Save The Network In XML File Using PyBrain
In this article, we are going to see how to save the network in an XML file using PyBrain in Python. A network consists of several modules. These modules are generally connected with connections. PyBrain provides programmers with the support of neural networks. A network can be interpreted as an acyclic directed graph where each module serves the p
2 min read
Scrape and Save Table Data in CSV file using Selenium in Python
Selenium WebDriver is an open-source API that allows you to interact with a browser in the same way a real user would and its scripts are written in various languages i.e. Python, Java, C#, etc. Here we will be working with python to scrape data from tables on the web and store it as a CSV file. As Google Chrome is the most popular browser, to make
3 min read
Save multiple matplotlib figures in single PDF file using Python
In this article, we will discuss how to save multiple matplotlib figures in a single PDF file using Python. We can use the PdfPages class's savefig() method to save multiple plots in a single pdf. Matplotlib plots can simply be saved as PDF files with the .pdf extension. This saves Matplotlib-generated figures in a single PDF file named Save multip
3 min read
Practice Tags :
  翻译: