How to use Selenium to scrape data from a website

Pradip Thapa
3 min readMar 21, 2023

--

Web scraping has become an integral part of the web development process. It allows you to extract data from websites and use it for various purposes, such as analysis or building new applications. Python has a range of libraries that can be used for web scraping, including BeautifulSoup and Scrapy. However, when it comes to scraping websites that are dynamic, meaning they require user interaction to load the data, a library like Selenium can come in handy.

In this article, we will explore a code snippet that demonstrates how to use Selenium to scrape data from a website. Specifically, we will look at how to extract information about the countries of the world from “https://www.scrapethissite.com/pages/simple/". We will extract data such as the country name, capital city, population, and area.

Before we dive into the code, it is essential to understand what Selenium is and what it does. Selenium is a web driver that allows you to automate tasks such as clicking buttons and filling out forms on web pages. It simulates user interaction with a website, which is useful when trying to scrape dynamic websites.

Code Explanation

Let’s take a look at the code:

from selenium import webdriver
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

# Set the path to the Chrome driver
serv_obj = Service("F:\Automation Testing\selenium\chromedriver.exe")

# Launch Chrome using the driver
demo = webdriver.Chrome(service=serv_obj)

# The URL we want to scrape
url = "https://www.scrapethissite.com/pages/simple/"

# Load the website
demo.get(url)

The first thing we do is import the necessary libraries, including Selenium and pandas, which is a libraries for data manipulation. We also import the Service class and By from the selenium.webdriver.chrome.service module. The Service class is used to configure the driver and By is used to specify the method for locating web elements on a page.

Next, we set the path to the Chrome driver using the Service class. The Chrome driver is used to interact with the Chrome browser. We then launch Chrome using the webdriver.Chrome() method and pass in the service object.

We define the URL we want to scrape, which in this case is https://www.scrapethissite.com/pages/simple/". Finally, we load the website using the get() method of the demo object.

Next, we define a series of lists that we will use to store the data we want to extract. The code uses the find_elements method of the web driver object to find all the elements that match the specified XPath expression for each of the data categories we want to scrape. The text of each element is then extracted and stored in a list.

# country_name
country_name = []
cn = demo.find_elements(By.XPATH, "//h3[@class='country-name']")

for country in cn:
result = country.text
country_name.append(result)


# country_capital
country_capital = []
cc = demo.find_elements(By.XPATH, "//span[@class='country-capital']")
for capital in cc:
result = capital.text
country_capital.append(result)

# country_population
country_population = []
cp = demo.find_elements(By.XPATH, "//span[@class='country-population']")
for population in cp:
result = population.text
country_population.append(result)

# country_area
country_area = []
ca = demo.find_elements(By.XPATH, "//span[@class='country-area']")
for area in ca:
result = area.text
country_area.append(result)

Once we have extracted all the data, we can use the pandas library to create a data frame with the extracted data and save it to an Excel file.

df = pd.DataFrame({
"Country Name": country_name,
"Capital City": country_capital,
"Population": country_population,
"Area(s.q km)": country_area
})
df.to_excel("WorldCountries.xlsx")

Conclusion

In this article, we have discussed how to use Selenium to extract data from a website and store it in an Excel file. Selenium provides a powerful and flexible way to extract data from websites, and its use cases extend beyond just web scraping. With the help of libraries like pandas, we can easily process and store the extracted data for further analysis.

--

--