BeautifulSoup – Error Handling
Last Updated :
17 Feb, 2023
Sometimes, during scraping data from websites we all have faced several types of errors in which some are out of understanding and some are basic syntactical errors. Here we will discuss on types of exceptions that are faced during coding the script.
Error During Fetching of Website
When we are fetching any website content we need to aware of some of the errors that occur during fetching. These errors may be HTTPError, URLError, AttributeError, or XMLParserError. Now we will discuss each error one by one.
HTTPError:
HTTPError occurs when we’re performing web scraping operations on a website that is not present or not available on the server. When we provide the wrong link during requesting to the server then and we execute the program is always shows an Error “Page Not Found” on the terminal.
Example :
Python3
import requests
from urllib.error import HTTPError
try :
response = requests.get(url)
response.raise_for_status()
except HTTPError as hp:
print (hp)
else :
print ( "it's worked" )
|
Output:
The link we provide to the URL is running correctly there is no Error occurs. Now we see HTTPError by changing the link.
Python
import requests
from urllib.error import HTTPError
try :
response = requests.get(url)
response.raise_for_status()
except HTTPError as hp:
print (hp)
else :
print ( "it's worked" )
|
Output:
URLError:
When we request the wrong website from the server it means that URL which we are given for requesting is wrong then URLError will occur. URLError always responds as a server not found an error.
Example:
Python3
import requests
from urllib.error import URLError
try :
response = requests.get(url)
response.raise_for_status()
except URLError as ue:
print ( "The Server Could Not be Found" )
else :
print ( "No Error" )
|
Output:
Here we see that the program executes correct and print output “No Error”. Now we change the URL link for showing the URLError :-
Python3
import requests
from urllib.error import URLError
try :
response = requests.get(url)
response.raise_for_status()
except URLError as ue:
print ( "The Server Could Not be Found" )
else :
print ( "No Error" )
|
Output:
AttributeError:
The AttributeError in BeautifulSoup is raised when an invalid attribute reference is made, or when an attribute assignment fails. When during the execution of code we pass the wrong attribute to a function that attribute doesn’t have a relation with that function then AttributeError occurs. When we try to access the Tag using BeautifulSoup from a website and that tag is not present on that website then BeautifulSoup always gives an AttributeError.
We take a good example to explain the concept of AttributeError with web scraping using BeautifulSoup:
Python3
import requests
import bs4
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, 'html.parser' )
print (soup.NoneExistingTag.SomeTag)
|
Output:
XML Parser Error :
We all are gone through XML parser error during coding the web scraping scripts, by the help of BeautifulSoup we parse the document into HTML very easily. If we stuck on the parser error then we easily overcome this error by using BeautifulSoup, and it is very easy to use.
When we’re parsing the HTML content from the website we generally use ‘ xml ‘ or ‘ xml-xml ‘ in the parameter of BeautifulSoup constructor. It was written as the second parameter after the HTML document.
Syntax:
soup = bs4.BeautifulSoup( response, ‘ xml ‘ )
or
soup = bs4.BeautifulSoup( response, ‘ xml -xml’ )
XML parser error generally happens when we’re not passing any element in the find() and find_all() function or element is missing from the document. It sometimes gives the empty bracket [] or None as their output.
Python
import requests
import bs4
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, 'xml' )
print (soup.find( 'div' , class_ = 'that not present in html content' ))
|
Output: