How to download/upload files using Selenium WebDriver via Chrome

Overview

In Selenium WebDriver, the task to download and upload files is browser-specific. It also depends on the structure and design of the website on which the task is being performed on. In simple words, it is easier to perform this task on Chrome than on Firefox, as the latter requires extra setup and parameters to execute the same task. As for website-related issues, we'll tackle those by doing the same task on the same website in various ways.

Note: Make sure to confirm that this article is up to date. Browsers regularly update their functionality with each update as does Selenium. Latest Chrome and Selenium have been used at the time of publishing.

Download

The script for downloading a file from a website is simple. The script will mainly comprise boilerplate code to get Selenium and Chrome running. It will be executed in a maximum of one to two lines.

Boilerplate

Here's the boilerplate code:

Note: We can confirm that the script below works by clicking on the "Output" tab. Check if Chrome opens Python's download page. Ignore the brief warning issued in the "Terminal" tab.

from selenium import webdriver
import time
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

# Setting the root as the download directory.
# You can change it to, for example, "D:\Selenium\Test".
prefs = {"download.default_directory": "."};

options.add_experimental_option("prefs", prefs);
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

driver.get('https://www.python.org/downloads/');

time.sleep(60);
driver.close();
Boilerplate for use with Chrome

Let's go through the boilerplate code below:

  • Line 1: We import webdriver as it is useful in performing browser-specific actions such as clicking, navigation, and so on. The time library prevents the script from terminating (and the browser closing). In line 20, we can see that we've asked for a minute gap before the script finishes running.
  • Line 3: We import the required binaries for use with Chrome from the WebDriver Manager. Make sure to have WebDriver Manager installed (The pip install webdriver-manager will install it).
  • Lines 4 and 5: We import Options and Service. The Option is used to introduce preferences. As we can see later in the code, it is used to set the location where you want to download your files. The Service is used to initialize the browser.
  • Lines 8 and 9: We require these lines for Selenium to work with our nifty widgets.
  • Lines 16 and 18: We start the browser and open Python's Downloads page.
  • Line 20: We close the browser.

Now, let's see how we can download (the latest version of) Python from the Downloads page.

The driver.find_element()function

The function responsible for grabbing elements from web pages is called find_element(). This is used with the driver.

We can use the following eight locators to find the download link or hyperlinked image or tag to download our file from. Other than XPATH and LINK_TEXT, most of these can be used to refer to singular attributes in HTML tags:

  • find_element(By.ID, “this_id”)
  • find_element(By.NAME, “that_name”)
  • find_element(By.XPATH, “this_xpath”)
  • find_element(By.LINK_TEXT, “that_text”)
  • find_element(By.TAG_NAME, “this_tag”)
  • find_element(By.CLASS_NAME, “that_class”)
  • find_element(By.CSS_SELECTOR, “those_css”)

We'll only use three of them in this answer.

Note: We also need to have from selenium.webdriver.common.by import By in our script for the function to work.

The LINK_TEXT locator

The first locator we'll use is LINK_TEXT. This one is fairly simple. We can use this if the HTML tag (where we can download our file from) contains the following:

  • <p>
  • <a>
  • <span>
  • Any heading tags like <h1>

When we inspect the download button in Chrome, it will show us the following code:

Notice the text in the highlighted HTML tag.
Notice the text in the highlighted HTML tag.
  • Line 1: From the image above, we can see that we need to search for Download Python 3.10.5. Check the code example below, to see this locator in action.

Note: Python will update to a new version after this Answer is published. This means that the Python version will change and, therefore, the version number written on the site and code will have to change as well. If you don't see a file downloading after running the code below, you'll have to update the Python version number in line 1.

download_python = driver.find_element(By.LINK_TEXT, "Download Python 3.10.5")

# Click the element that holds the text you've searched for.
download_python.click()

time.sleep(60)
driver.close()
Using LINK_TEXT to download a file

The CSS_SELECTOR locator

  • Line 1: We'll use the CSS_SELECTOR locator. We can see that the link for the file we want to download is in the href attribute. We simply extract that link by telling the function to look at the <a> tag that is housing the CSS attribute class which holds the value button, such as a.button. Check the code block below to see how this locator works.

Note: This locator will return the first element that matches the specified criteria

download_python = driver.find_element(By.CSS_SELECTOR, "a.button").get_attribute("href")

# We tell Chrome to open the link we've extracted
# for the download file.
driver.get(download_python)

time.sleep(60)
driver.close()
Using CSS_SELECTOR to download a file

The XPATH locator

  • Line 1: We can see from the image above that our required element is in a specific sequence of HTML tags. We can simply specify a path to the tag we want and extract the href link.

We can start our pathing from anywhere as long as we don't need to go up. We'll instruct our function to look at the <div> that has class='download-for-current-os'. Therefore, the start of our pathing will be .//div[@class=‘download-for-current-os’].

We can then quickly list the HTML tags we want to go through to get to our desired element: .//div[@class=‘download-for-current-os’]/div/p/a. We can specify our HTML tags for pathing by listing whatever attribute we like at whichever level (like we did with the first <div>). Also, if there are multiple elements of the same tag in a level then just index as a[2]. Take a look below to see how this works.

Note: The XPATH locator should be used as a last resort or if the webpage is confirmed to not be dynamic.

downloadcsv = driver.find_element(By.XPATH, ".//div[@class='download-for-current-os']/div/p/a").get_attribute("href")
driver.get(downloadcsv)

time.sleep(60)
driver.close()
Using XPATH to download a file

Upload

We can upload a file to a webpage using the find_element() function. Again, there are many ways to do so. We could visit a website, have the script click the "Upload File" or "Add File" button, and then select the file we want to upload. Unfortunately, this would require third-party tools. We'll try to upload without these tools.

Most websites use hidden <input> tags for file uploads. The goal is to find those tags and give them the path of the files on our local machine. We'll see how to do so using the following site.

Note: Educative does not endorse the website used in this example.

In this case, the <input> tag does not have the hidden attribute, but that makes no difference.
In this case, the <input> tag does not have the hidden attribute, but that makes no difference.

In the image above, inspecting the blue 'Add File' will highlight the <div> or the <label>, or (if you're lucky) even the <input> tag.

Now there are many ways to go about this. We can use CSS_SELECTOR to pinpoint the tags the required <input> tag is situated in and call for the tag itself. That is, you would use the following: .uploader-btn-wrap input[type=‘file’]. There is also a far easier way to do so if there is only one <input> tag or if the <input> tag we want to upload to is the very first one structurally (as seen in the inspect window). We can just use XPATH for this.

Let's look at the code and comments below to understand this:

from selenium import webdriver
import time
from selenium.webdriver.common.by import By

# Used to introduce a waiting mechanism.
from selenium.webdriver.support.ui import WebDriverWait

# Library of useful functions.
from selenium.webdriver.support import expected_conditions as EC

from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
prefs = {"download.default_directory": "."};
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# The service we would like to upload our file to.
driver.get('https://www.filemail.com/share/upload-file')

# This is not required and should be removed from your own script.
# This is here so that there is a gap between the time the webpage
# loads and the file is uploaded as, otherwise, the action is instantaneous.
time.sleep(15)

# The 'wait' variable will now tell the browser to wait for 30
# seconds whenever it is invoked later.
wait = WebDriverWait(driver, 30)

# Tell the browser to wait until our required element, the
# 'Add File' and the <input> tag it contains, are loaded.
# Then you supply the path to your file to the <input> tag. 
wait.until(EC.presence_of_element_located((By.XPATH, ".//input[@type='file']"))).send_keys("/usercode/Educative_Logo.png")

time.sleep(60)
driver.close()
Using XPATH to upload a file
  • Line 32: It should be noted that the time to wait for in WebDriverWait(driver, 30), will vary depending on the site. Not enough time will cause the script to throw an error. Also, some websites may not allow us to download or upload files before first accepting cookies. Use the following lines before the download/upload part:
got_cookie = driver.find_element(By.ID, 'accept-cookie-notification')
got_cookie.click()

Copyright ©2024 Educative, Inc. All rights reserved