Select rows that contain specific text using Pandas
Last Updated :
07 Apr, 2021
While preprocessing data using pandas dataframe there may be a need to find the rows that contain specific text. In this article we will discuss methods to find the rows that contain specific text in the columns or rows of a dataframe in pandas.
Dataset in use:
job |
Age_Range |
Salary |
Credit-Rating |
Savings |
Buys_Hone |
Own |
Middle-aged |
High |
Fair |
10000 |
Yes |
Govt |
Young |
Low |
Fair |
15000 |
No |
Private |
Senior |
Average |
Excellent |
20000 |
Yes |
Own |
Middle-aged |
High |
Fair |
13000 |
No |
Own |
Young |
Low |
Excellent |
17000 |
Yes |
Private |
Senior |
Average |
Fair |
18000 |
No |
Govt |
Young |
Average |
Fair |
11000 |
No |
Private |
Middle-aged |
Low |
Excellent |
9000 |
No |
Govt |
Senior |
High |
Excellent |
14000 |
Yes |
Method 1 : Using contains()
Using the contains() function of strings to filter the rows. We are filtering the rows based on the ‘Credit-Rating’ column of the dataframe by converting it to string followed by the contains method of string class. contains() method takes an argument and finds the pattern in the objects that calls it.
Example:
Python3
import pandas as pd
df = pd.read_csv( "Assignment.csv" )
df = df[df[ 'Credit-Rating' ]. str .contains( 'Fair' )]
print (df)
|
Output :
Rows containing Fair as Savings
Method 2 : Using itertuples()
Using itertuples() to iterate rows with find to get rows that contain the desired text. itertuple method return an iterator producing a named tuple for each row in the DataFrame. It works faster than the iterrows() method of pandas.
Example:
Python3
import pandas as pd
df = pd.read_csv( "Assignment.csv" )
for x in df.itertuples():
if x[ 2 ].find( 'Young' ) ! = - 1 :
print (x)
|
Output :
Rows with Age_Range as Young
Method 3 : Using iterrows()
Using iterrows() to iterate rows with find to get rows that contain the desired text. iterrows() function returns the iterator yielding each index value along with a series containing the data in each row. It is slower as compared to the itertuples because of lot of type checking done by it.
Example:
Python3
import pandas as pd
df = pd.read_csv( "Assignment.csv" )
for index, row in df.iterrows():
if 'Govt' in row[ 'job' ]:
print (index, row[ 'job' ], row[ 'Age_Range' ],
row[ 'Salary' ], row[ 'Savings' ], row[ 'Credit-Rating' ])
|
Output :
Rows with job as Govt
Method 4 : Using regular expressions
Using regular expressions to find the rows with the desired text. search() is a method of the module re. re.search(pattern, string): It is similar to re.match() but it doesn’t limit us to find matches at the beginning of the string only. We are iterating over the every row and comparing the job at every index with ‘Govt’ to only select those rows.
Example:
Python3
from re import search
import pandas as pd
df = pd.read_csv( "Assignment.csv" )
for ind in df.index:
if search( 'Govt' , df[ 'job' ][ind]):
print (df[ 'job' ][ind], df[ 'Savings' ][ind],
df[ 'Age_Range' ][ind], df[ 'Credit-Rating' ][ind])
|
Output :
Rows where job is Govt