Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

selenium
python

How to find all broken links using Selenium webdriver in Python

Gutha Vamsi Krishna

Selenium overview

Selenium is an open-source web-based automation tool. We'll learn how to find the broken links in the web page using selenium in Python.

We'll follow the steps mentioned below to find the broken links:

  • Find all links present on the web page.
  • Send an HTTP request to each link and get its status code.
  • Based on the status code we will decide if a link is broken or not.

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
import requests

#specify where your chrome driver present in your pc
PATH=r"C:\Users\educative\Documents\chromedriver\chromedriver.exe"

#get instance of web driver
driver = webdriver.Chrome(PATH)

#provide website url here
driver.get("http://demo.guru99.com/test/newtours/")

#get all links
all_links = driver.find_elements(By.CSS_SELECTOR,"a")

#check each link if it is broken or not
for link in all_links:
    #extract url from href attribute
    url = link.get_attribute('href')

    #send request to the url and get the result
    result = requests.head(url)

    #if status code is not 200 then print the url (customize the if condition according to the need)
    if result.status_code != 200:
        print(url, result.status_code)
Find all broken links using selenium web driver in Python

Explanation

  • Lines 1–4: We import the required packages.
  • Line 7: We provide the path where we placed the driver of the web browser. For chrome, it is chromedriver.exe in the windows environment.
  • Line 10: We get the instance of the webdriver.
  • Line 13: We provide the URL to the driver.get() method to open it.
  • Line 16: We use the find_elements() method to get all links present on the current web page.
  • Line 19: We use the for-in loop to loop through each link returned in the above step.
  • Line 21: We extract the URL from the element.
  • Line 24: We send an HTTP request to the URL and store the result.
  • Lines 27–28: We check if the status code is not equal to 200 then we consider it as a broken link and print it. We can also customize this condition according to our needs.

RELATED TAGS

selenium
python
RELATED COURSES

View all Courses

Keep Exploring