Google SERP and Website Scraping with Python

Gokay Sevilmis

Sr. SEO Executive | Python for SEO

Published Aug 21, 2023

While preparing the content briefs, one of the analysis we do is to browse the SERP for our target keywords and look for the pages which are ranking on the first page. It is especially important for us what headings are used in these blog posts. Of course, to reach these articles, we have to search for keywords one by one and open the results one by one and examine their headings.

One day, we were talking with my co-worker Duygu Garip , that this process is taking a lot of time. At this point, we started to think about how we can make things easier by automating. Duygu said that if we can pull this data from the sites, our work will be accelerated and we can save our time.

We can list what we need to do here in 2 steps;

1- Searching for the relevant keywords on Google

2- Going to the URLs in SERP and pulling the H headings on those pages.

At the end of our research, we achieved this with Python. Let's see how we can write this code now;

1- Install

Let's load the required libraries first;

pip install google-api-python-client

pip install selenium

pip install webdriver-manager

pip install pandas

pip install openpyxl

2- Import Process

Now, we can start writing our code by opening a new python file.

Recommended by LinkedIn

Most Popular Scraping Libraries for 2023

Oxylabs.io 1 year ago

Introduction to Web Scraping with Python

Global Tech Council 10 months ago

A Bokeh Tutorial for Interactive Dashboards in Python

Handson School Of Data Science Management & Technology 1 year ago

In order to avoid confusion every time, let's do the following import operations from the beginning. As you write the codes, you will see where we use the libraries we import.

from googleapiclient.discovery import build

from selenium import webdriver 

from selenium.webdriver.common.by import By 

from selenium.webdriver.chrome.service import Service as ChromeService 

from webdriver_manager.chrome import ChromeDriverManager 

import pandas as pd

3- Withdrawing Data from Google

First we need to pull data from Google. For this we need 2 elements; API Key and CSE(Custom Search Engine) ID.

After providing these elements, we can search Google with the following function and return the results.

my_api_key = 'Your API Key'
my_cse_id = "Your CSE ID" 

def google_search(search_term, api_key, cse_id, **kwargs): 
service = build("customsearch", "v1", developerKey =api_key) 
res = service.cse().list(q =search_term, cx =cse_id, **kwargs).execute()
    return res 

result = google_search("Apple", my_api_key, my_cse_id) 
print(result)

The return value here will be in JSON format. You can extract the information you want from the dataset in JSON format.

But we want to search for more than one keyword here. At the same time, we want to pull the H headings by entering the URLs on the results page. At this point, we should make some changes and additions.

4- Scraping Heading Tags

About internal link opportunities In the article I wrote, I talked about how to scrape the hrefs in the site using Python. Here we will use a similar code structure.

First, we will pull the hrefs from the JSON data returned from the above code. Then we will open these URLs using the Selenium library and scrape the H headers we want. Then we will print them to an excel file.

The final version of our code will look exactly like this:

from googleapiclient.discovery import build
from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.chrome.service import Service as ChromeService 
from webdriver_manager.chrome import ChromeDriverManager 
import pandas as pd 

my_api_key = 'Your API Keyid' 
my_cse = "Your CSE ID" 

def google_search (search_term, api_key, cse_id, **kwargs): 
service = build("customsearch", "v1", developerKey =api_key) 
res = service.cse().list( q =search_term, cx =cse_id, **kwargs).execute()
    return res 


KEYWORDS = ["tea", "coffee", "cola"] 

df = pd.DataFrame(columns =['Keywords', 'URLs', 'Headings', 'Contents'], index =[]) 


for Keyword in KEYWORDS:
    try : 
result = google_search(Keyword, my_api_key, my_cse_id) if 'items' in result: 
URLS = [item['link'] for item in result['items']]
            for URL in URLS: 
options = webdriver.ChromeOptions() 
options.add_argument('--headless')
driver = webdriver.Chrome(service = ChromeService(ChromeDriverManager().install()), options =options) 
driver.set_page_load_timeout(10)
                try : 
driver.get(URL) elements1 = driver.find_elements(By.XPATH, '//h1')
                    for headers1 in elements1: 
df.loc[ len (df.index)] = [Keyword, URL, "H1", headers1.text] 
elements2 = driver.find_elements(By.XPATH, '//h2')
                    for headers2 in elements2: 
df.loc[ len (df.index)] = [Keyword, URL, "H2", headers2.text] 
elements3 = driver.find_elements(By.XPATH, '//h3')
                    for headers3 in elements3: 
df.loc[ len (df.index)] = [Keyword, URL, "H3", headers3.text] 
elements4 = driver.find_elements(By.XPATH, '//h4')
                    for headers4 in elements4: 
df.loc[ len (df.index)] = [Keyword, URL, "H4", headers4.text] 
elements5 = driver.find_elements(By.XPATH, '//h5')
                    for headers5 in elements5: 
df.loc[ len (df.index)] = [Keyword, URL, "H5", headers5.text] 
elements6 = driver.find_elements(By.XPATH, '//h6')
                    for headers6 in elements6: 
df.loc[ len (df.index)] = [Keyword, URL, "H6", headers6.text]
                except : 
df.loc[ len (df.index)] = [Keyword, URL, "No Data" , "No Data" ]
                    pass 
except :
        print ("No Data")
        pass 
df.to_excel(r'/Users/gokaysevilmis/Downloads/Result.xlsx') 
print ("Done")

By exporting the heading tags into excel, you can easily analyse the headings;

Thanks to Duygu Garip for her support in the making of this study.

Burcu Y.

SEO Executive at SEM

Eline sağlıık, çok faydalı olmuş 🤓

1 Reaction

Nefise Taş

Sr. SEO Executive

Durmuyorduuu 😀 Ellerine sağlık kankacım ❤️

1 Reaction

Ali Ozan Kaçmaz

Sr SEO Executive at SEM

Teşekkürler Gökay ☺️

1 Reaction

Buğra Tan

Trendyol şirketinde Sr. SEO Specialist

Harika bir kaynak ❤️

1 Reaction

See more comments

To view or add a comment, sign in

Google SERP and Website Scraping with Python

Gokay Sevilmis

Sr. SEO Executive | Python for SEO

1- Install

2- Import Process

Recommended by LinkedIn

3- Withdrawing Data from Google

4- Scraping Heading Tags

More articles by Gokay Sevilmis

Insights from the community

Others also viewed

A Beginner's Guide to Data Extraction from Websites Using Python

FastAPI in 2023:How to Build High-Performance Python APIs

Web Scraping News Websites with Python for Real-Time Data Feeds

Web Scraping with Python: Extracting Data from the Web

[Introduction to Scraping] Retrieving Table Data from Websites with Python

Explore numerous examples of using the BeautifulSoup (bs4) library in Python

How To Look After SEO With Python: Hands-On Experience From Ocean Power Developer

Unlock the Power of Python Function Factories: Build Dynamic, Reusable Code for Web Scraping and More

Web Scraping 103 : Scrape Amazon Product Reviews With Python –

4 Advanced Python Function Tricks

Explore topics

1- Install

2- Import Process

Recommended by LinkedIn

3- Withdrawing Data from Google

4- Scraping Heading Tags

More articles by Gokay Sevilmis

How to Automate Competitor Analysis with Python?

How Can We Find Redirect Chain and Redirect Loop with Python

How to Use GSC API for Strengthening Internal Linking with Python?

Internal Linking Opportunities with Python

Insights from the community

Others also viewed

A Beginner's Guide to Data Extraction from Websites Using Python

FastAPI in 2023:How to Build High-Performance Python APIs

Web Scraping News Websites with Python for Real-Time Data Feeds

Web Scraping with Python: Extracting Data from the Web

[Introduction to Scraping] Retrieving Table Data from Websites with Python

Explore numerous examples of using the BeautifulSoup (bs4) library in Python

How To Look After SEO With Python: Hands-On Experience From Ocean Power Developer

Unlock the Power of Python Function Factories: Build Dynamic, Reusable Code for Web Scraping and More

Web Scraping 103 : Scrape Amazon Product Reviews With Python –

4 Advanced Python Function Tricks

Explore topics