As a highly dynamic web application, Google Maps typically loads content asynchronously via JavaScript. New data continues to load as users scroll down the page. This characteristic makes traditional static web scraping tools ineffective. Selenium, with its powerful browser automation capabilities, can simulate user actions such as clicking, typing, and scrolling, making it an ideal tool for scraping such dynamic websites.
This article will focus on how to use Selenium to accurately extract place ratings and review counts from Google Maps, and will provide solutions to common XPath issues encountered when locating elements. Given the complexity of information on Google Maps, how can one efficiently obtain rating and review data? This article will detail the relevant methods and steps.
First, import the necessary Python and Selenium libraries, and configure the Chrome WebDriver. Navigate directly to the Google Maps webpage and perform a search.
Python
print(f"[{searchQuery}] The task has started, navigating to the map homepage....")
await page.goto("https://www.google.com/maps", wait_until="networkidle")
response_promise = page.wait_for_response(lambda res: apiQueryFragment in res.url)
search_input_selector = "#searchboxinput"
print(f"[{searchQuery}] Searching for keywords: '{searchQuery}'...")
await page.wait_for_selector(search_input_selector)
await page.fill(search_input_selector, searchQuery)
await page.press(search_input_selector, "Enter")
print(f"[{searchQuery}] Search has been submitted, waiting for API data to return....")
response = await response_promise
print(f"[{searchQuery}] ✅The API response has been intercepted!")
Next, implement a dynamic loading or scrolling mechanism. Google Maps search results often use a lazy-loading mechanism, meaning more results are loaded only when the user scrolls to the bottom of the page. To obtain more comprehensive data, it is necessary to simulate this scrolling behavior.
Python
console.log(`[${searchQuery}] The first page has loaded, ready to start scrolling....`);
const resultsSelector = 'div[role="feed"]';
await page.waitForSelector(resultsSelector, { timeout: 15000 });
let previousHeight;
for (let i = 0; i < 20; i++) {
const currentHeight = await page.evaluate(selector => {
const container = document.querySelector(selector);
if (!container) return -1; // If the container cannot be found, return an error code.
container.scrollTo(0, container.scrollHeight);
return container.scrollHeight;
}, resultsSelector);
await new Promise(resolve => setTimeout(resolve, 3000)); if (currentHeight === previousHeight) {
console.log(`[${searchQuery}] Scroll to the bottom, no more content will be added.`);
break;
}
previousHeight = currentHeight;
console.log(`[${searchQuery}] Rolled (the ${i + 1} time), current content height: ${currentHeight}`);
}
Finally, analyze the returned data, extract useful information, and use regular expressions to retrieve the ratings and review counts.
Python
if __name__ == "__main__":
raw_texts = [
"4.5 stars based on 1,234 reviews",
"Rating: 5.0 bubbles, 987 Reviews",
"4 stars (765)",
"No rating yet",
"Just 5 reviews",
"3.8 stars"
]
extracted_data = []
print("--- Currently using regular expressions to extract data. ---")
for text in raw_texts:
rating, reviews = extract_rating_and_reviews(text)
print(f"Original text: '{text}' -> Extraction result:Rating={rating}, Reviews={reviews}")
extracted_data.append({"original_text": text, "rating": rating, "review_count": reviews})