How to find all broken links using Selenium webdriver in Python
Selenium overview
Selenium is an open-source web-based automation tool. We'll learn how to find the broken links in the web page using selenium in Python.
We'll follow the steps mentioned below to find the broken links:
- Find all links present on the web page.
- Send an HTTP request to each link and get its status code.
- Based on the status code we will decide if a link is broken or not.
Example
from selenium import webdriverfrom selenium.webdriver.common.by import Byimport timeimport requests#specify where your chrome driver present in your pcPATH=r"C:\Users\educative\Documents\chromedriver\chromedriver.exe"#get instance of web driverdriver = webdriver.Chrome(PATH)#provide website url heredriver.get("http://demo.guru99.com/test/newtours/")#get all linksall_links = driver.find_elements(By.CSS_SELECTOR,"a")#check each link if it is broken or notfor link in all_links:#extract url from href attributeurl = link.get_attribute('href')#send request to the url and get the resultresult = requests.head(url)#if status code is not 200 then print the url (customize the if condition according to the need)if result.status_code != 200:print(url, result.status_code)
Explanation
- Lines 1–4: We import the required packages.
- Line 7: We provide the path where we placed the driver of the web browser. For chrome, it is
chromedriver.exein the windows environment. - Line 10: We get the instance of the
webdriver. - Line 13: We provide the URL to the
driver.get()method to open it. - Line 16: We use the
find_elements()method to get all links present on the current web page. - Line 19: We use the
for-inloop to loop through each link returned in the above step. - Line 21: We extract the URL from the element.
- Line 24: We send an HTTP request to the URL and store the result.
- Lines 27–28: We check if the status code is not equal to
200then we consider it as a broken link and print it. We can also customize this condition according to our needs.