This article will introduce how to scrape data from Tripadvisor using Python, which provides significant guidance for our future development work.
1.Import Python Libraries
Import the necessary Python libraries. Use the BeautifulSoup library to parse HTML and the csv library to save the data.。
Python
import requests
from bs4 import BeautifulSoup
import csv
2.Send a Request to Get Webpage Information
Use the requests library to send an HTTP request and retrieve the Tripadvisor webpage content. Store the response information in response and return the webpage data stored in html_content.
Python
url = ""
response = requests.get(url)
html_content = response.content
3.Parse the Webpage Information
Parse the webpage information using an HTML parser to extract the data.
Python
soup = BeautifulSoup(html_content, "html.parser")
4.Extract the Required Data
Extract the desired data.
Python
attractions = soup.select("div.location-meta-card")
data = [
{
"name": item.select_one("div.XfVdV").text.strip(),
"rating": (item.select_one("svg.UctUV")['title'].split()[0] if item.select_one("svg.UctUV") else "N/A"),
"reviews": (item.select_one("span.biGQs.z").text.strip() if item.select_one("span.biGQs.z") else "0")
}
for item in attractions
]
import json
print(json.dumps(data[:5], indent=2, ensure_ascii=False))data.append([name, rating, reviews])
5.Save the Data to a CSV File
Save the extracted data to a CSV file.
Python
if data_list_of_dicts:
filename = "tripadvisor_paris_attractions.csv"
headers_csv = data_list_of_dicts[0].keys()
with open(filename, "w", encoding="utf-8-sig", newline="") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=headers_csv)
writer.writeheader()
writer.writerows(data_list_of_dicts)