Urllib is a Python 3 package that allows you to access, and interact with, websites using their URL’s (Uniform Resource Locator). It has several modules for working with URL’s, these are shown in the illustration below:
urlopen, allows you to open the specified URL. This is shown in the code snippet below:
from urllib.request import urlopen myURL = urlopen("http://www.google.com/") print(myURL.read())
The URL is opened and its HTML code is returned.
Once the URL has been opened, the
read() function is used to get the entire HTML code for the webpage.
The code snippet below shows the usage of
from urllib.parse import urlparse parsedUrl = urlparse('https://www.educative.io/track/python-for-programmers') print(parsedUrl)
The URL is split into several different components.
The URL is split into its components such as the protocol scheme used, the network location netloc and the path to the webpage.
This module is used to catch exceptions encountered from
url.request. These exceptions, or errors, are classified as follows:
from urllib.request import urlopen, HTTPError, URLError try: myURL = urlopen("http://ww.educative.xyz/") except HTTPError as e: print('HTTP Error code: ', e.code) except URLError as e: print('URL Error: ', e.reason) else: print('No Error.')
A request to open
http://ww.educative.xyz/ is caught by the URLError exception; the URL is invalid. Experiment with the exceptions by opening different URL’s.