In Selenium WebDriver, the task to download and upload files is browser-specific. It also depends on the structure and design of the website on which the task is being performed on. In simple words, it is easier to perform this task on Chrome than on Firefox, as the latter requires extra setup and parameters to execute the same task. As for website-related issues, we'll tackle those by doing the same task on the same website in various ways.
Note: Make sure to confirm that this article is up to date. Browsers regularly update their functionality with each update as does Selenium. Latest Chrome and Selenium have been used at the time of publishing.
The script for downloading a file from a website is simple. The script will mainly comprise boilerplate code to get Selenium and Chrome running. It will be executed in a maximum of one to two lines.
Here's the boilerplate code:
Note: We can confirm that the script below works by clicking on the "Output" tab. Check if Chrome opens Python's download page. Ignore the brief warning issued in the "Terminal" tab.
from selenium import webdriver import time from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service options = Options() options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') # Setting the root as the download directory. # You can change it to, for example, "D:\Selenium\Test". prefs = {"download.default_directory": "."}; options.add_experimental_option("prefs", prefs); driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options) driver.get('https://www.python.org/downloads/'); time.sleep(60); driver.close();
Let's go through the boilerplate code below:
webdriver
as it is useful in performing browser-specific actions such as clicking, navigation, and so on. The time
library prevents the script from terminating (and the browser closing). In line 20, we can see that we've asked for a minute gap before the script finishes running.pip install webdriver-manager
will install it).Options
and Service
. The Option
is used to introduce preferences. As we can see later in the code, it is used to set the location where you want to download your files. The Service
is used to initialize the browser.Now, let's see how we can download (the latest version of) Python from the Downloads page.
driver.find_element()
functionThe function responsible for grabbing elements from web pages is called find_element()
. This is used with the driver
.
We can use the following eight locators to find the download link or hyperlinked image or tag to download our file from. Other than XPATH
and LINK_TEXT
, most of these can be used to refer to singular attributes in HTML tags:
find_element(By.ID, “this_id”)
find_element(By.NAME, “that_name”)
find_element(By.XPATH, “this_xpath”)
find_element(By.LINK_TEXT, “that_text”)
find_element(By.TAG_NAME, “this_tag”)
find_element(By.CLASS_NAME, “that_class”)
find_element(By.CSS_SELECTOR, “those_css”)
We'll only use three of them in this answer.
Note: We also need to have
from selenium.webdriver.common.by import By
in our script for the function to work.
LINK_TEXT
locator The first locator we'll use is LINK_TEXT
. This one is fairly simple. We can use this if the HTML tag (where we can download our file from) contains the following:
<p>
<a>
<span>
<h1>
When we inspect the download button in Chrome, it will show us the following code:
Download Python 3.10.5
. Check the code example below, to see this locator in action.Note: Python will update to a new version after this Answer is published. This means that the Python version will change and, therefore, the version number written on the site and code will have to change as well. If you don't see a file downloading after running the code below, you'll have to update the Python version number in line 1.
download_python = driver.find_element(By.LINK_TEXT, "Download Python 3.10.5") # Click the element that holds the text you've searched for. download_python.click() time.sleep(60) driver.close()
CSS_SELECTOR
locatorCSS_SELECTOR
locator. We can see that the link for the file we want to download is in the href
attribute. We simply extract that link by telling the function to look at the <a>
tag that is housing the CSS attribute class
which holds the value button
, such as a.button
. Check the code block below to see how this locator works.Note: This locator will return the first element that matches the specified criteria
download_python = driver.find_element(By.CSS_SELECTOR, "a.button").get_attribute("href") # We tell Chrome to open the link we've extracted # for the download file. driver.get(download_python) time.sleep(60) driver.close()
XPATH
locatorhref
link.We can start our pathing from anywhere as long as we don't need to go up. We'll instruct our function to look at the <div>
that has class='download-for-current-os'
. Therefore, the start of our pathing will be .//div[@class=‘download-for-current-os’]
.
We can then quickly list the HTML tags we want to go through to get to our desired element: .//div[@class=‘download-for-current-os’]/div/p/a
. We can specify our HTML tags for pathing by listing whatever attribute we like at whichever level (like we did with the first <div>
). Also, if there are multiple elements of the same tag in a level then just index as a[2]
. Take a look below to see how this works.
Note: The
XPATH
locator should be used as a last resort or if the webpage is confirmed to not be dynamic.
downloadcsv = driver.find_element(By.XPATH, ".//div[@class='download-for-current-os']/div/p/a").get_attribute("href") driver.get(downloadcsv) time.sleep(60) driver.close()
We can upload a file to a webpage using the find_element()
function. Again, there are many ways to do so. We could visit a website, have the script click the "Upload File" or "Add File" button, and then select the file we want to upload. Unfortunately, this would require third-party tools. We'll try to upload without these tools.
Most websites use hidden <input>
tags for file uploads. The goal is to find those tags and give them the path of the files on our local machine. We'll see how to do so using the following site.
Note: Educative does not endorse the website used in this example.
In the image above, inspecting the blue 'Add File' will highlight the <div>
or the <label>
, or (if you're lucky) even the <input>
tag.
Now there are many ways to go about this. We can use CSS_SELECTOR
to pinpoint the tags the required <input>
tag is situated in and call for the tag itself. That is, you would use the following: .uploader-btn-wrap input[type=‘file’]
. There is also a far easier way to do so if there is only one <input>
tag or if the <input>
tag we want to upload to is the very first one structurally (as seen in the inspect window). We can just use XPATH
for this.
Let's look at the code and comments below to understand this:
from selenium import webdriver import time from selenium.webdriver.common.by import By # Used to introduce a waiting mechanism. from selenium.webdriver.support.ui import WebDriverWait # Library of useful functions. from selenium.webdriver.support import expected_conditions as EC from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service options = Options() options.add_argument('--no-sandbox') options.add_argument('--disable-dev-shm-usage') prefs = {"download.default_directory": "."}; options.add_experimental_option("prefs", prefs) driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options) # The service we would like to upload our file to. driver.get('https://www.filemail.com/share/upload-file') # This is not required and should be removed from your own script. # This is here so that there is a gap between the time the webpage # loads and the file is uploaded as, otherwise, the action is instantaneous. time.sleep(15) # The 'wait' variable will now tell the browser to wait for 30 # seconds whenever it is invoked later. wait = WebDriverWait(driver, 30) # Tell the browser to wait until our required element, the # 'Add File' and the <input> tag it contains, are loaded. # Then you supply the path to your file to the <input> tag. wait.until(EC.presence_of_element_located((By.XPATH, ".//input[@type='file']"))).send_keys("/usercode/Educative_Logo.png") time.sleep(60) driver.close()
WebDriverWait(driver, 30)
, will vary depending on the site. Not enough time will cause the script to throw an error. Also, some websites may not allow us to download or upload files before first accepting cookies. Use the following lines before the download/upload part:got_cookie = driver.find_element(By.ID, 'accept-cookie-notification')got_cookie.click()