Open In App

BeautifulSoup – Error Handling

Last Updated : 17 Feb, 2023
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Sometimes, during scraping data from websites we all have faced several types of errors in which some are out of understanding and some are basic syntactical errors. Here we will discuss on types of exceptions that are faced during coding the script.

Error During Fetching of Website

When we are fetching any website content we need to aware of some of the errors that occur during fetching. These errors may be HTTPError,  URLError, AttributeError, or XMLParserError. Now we will discuss each error one by one.

HTTPError: 

HTTPError occurs when we’re performing web scraping operations on a website that is not present or not available on the server. When we provide the wrong link during requesting to the server then and we execute the program is always shows an Error “Page Not Found” on the terminal.

Example :

Python3




# importing modules
import requests
from urllib.error import HTTPError
 
 
try:
    response = requests.get(url)
    response.raise_for_status()
except HTTPError as hp:
    print(hp)
     
else:
    print("it's worked")


Output:
 

 

The link we provide to the URL is running correctly there is no Error occurs. Now we see HTTPError by changing the link.

Python




# importing modules
import requests
from urllib.error import HTTPError
 
 
try:
    response = requests.get(url)
    response.raise_for_status()
except HTTPError as hp:
    print(hp)
     
else:
    print("it's worked")


Output:

 

URLError:

When we request the wrong website from the server it means that URL which we are given for requesting is wrong then URLError will occur. URLError always responds as a server not found an error.

 Example:

Python3




# importing modules
import requests
from urllib.error import URLError
 
 
try:
  response = requests.get(url)
  response.raise_for_status()
except URLError as ue:
  print("The Server Could Not be Found")
   
else:
  print("No Error")
  


Output:

Here we see that the program executes correct and print output “No Error”. Now we change the URL link for showing the URLError :-

Python3




# importing modules
import requests
from urllib.error import URLError
 
 
try:
    response = requests.get(url)
    response.raise_for_status()
except URLError as ue:
    print("The Server Could Not be Found")
 
else:
    print("No Error")


Output:

AttributeError:

The AttributeError in BeautifulSoup is raised when an invalid attribute reference is made, or when an attribute assignment fails. When during the execution of code we pass the wrong attribute to a function that attribute doesn’t have a relation with that function then AttributeError occurs.  When we try to access the Tag using BeautifulSoup from a website and that tag is not present on that website then BeautifulSoup always gives an AttributeError.

We take a good example to explain the concept of AttributeError with web scraping using BeautifulSoup:

Python3




# importing modules
import requests
import bs4
 
 
# getting response from server
response = requests.get(url)
 
# extracting html
soup = bs4.BeautifulSoup(response.text, 'html.parser')
 
# for printing attribute error
print(soup.NoneExistingTag.SomeTag)


Output:

XML Parser Error :

We all are gone through XML parser error during coding the web scraping scripts, by the help of BeautifulSoup we parse the document into HTML very easily. If we stuck on the parser error then we easily overcome this error by using BeautifulSoup, and it is very easy to use.

When we’re parsing the HTML content from the website we generally use ‘ xml ‘  or ‘ xml-xml ‘ in the parameter of BeautifulSoup constructor. It was written as the second parameter after the HTML document. 

Syntax: 

soup = bs4.BeautifulSoup( response, ‘ xml ‘ )

or

soup = bs4.BeautifulSoup( response, ‘ xml -xml’ )  

XML parser error generally happens when we’re not passing any element in the find() and find_all() function or element is missing from the document. It sometimes gives the empty bracket [] or None as their output.

Python




import requests
import bs4
 
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text,'xml')
 
print(soup.find('div',class_='that not present in html content'))


Output:
 

 



Similar Reads

Article Tags :
Practice Tags :
three90RightbarBannerImg
  翻译: