When scraping Google Maps, data is not displayed all at once upon page load; scrolling the mouse or clicking search is required to obtain more content. This issue can be resolved by implementing a lazy loading mechanism.
What is a Lazy Loading Mechanism?
Lazy loading is a technique used on long pages where content is loaded only when it enters the browser's viewport. Loading all content immediately upon entering the page would be time-consuming and result in a poor user experience.
Benefits of Lazy Loading:
Enhances User Experience: Prevents users from losing patience and leaving the interface due to long loading times, thereby speeding up the perceived page load speed.
Reduces Server Load and Browser Burden: Decreases the workload on the server and the browser, reducing memory consumption.
How to Simulate User Scrolling in Puppeteer?
Incorporate a loop for scrolling logic. The general principle is: define the container height; if the height stops increasing, it indicates the bottom has been reached, and the loop should exit. If the height increases, update the recorded height and continue to the next iteration of the loop.
Code Steps:
First, set up a listener:
Continuously listen for all subsequent API responses triggered in the page that meet the specified criteria, whether loaded initially or revealed by scrolling.
javascript
page.on('response', async (response) => {
if (response.url().includes(apiQueryFragment)) {
try {
const responseText = await response.text();
const jsonMatch = responseText.match(/({"c":[\s\S]*})/);
if (jsonMatch && jsonMatch[1]) {
const jsonBlock = jsonMatch[1];
const outerJson = JSON.parse(jsonBlock);
const dataString = outerJson.d;
const cleanedText = dataString.replace(")]}'\n", "");
const rawPostJson = JSON.parse(cleanedText);
allRawDataBlocks.push(rawPostJson);
responseCount++;
console.log(`[${searchQuery}] ✅ Captured to the ${responseCount} nth API data packet`);
}
} catch (e) {
}
}
});
Simulate the user search process:
First, navigate to the maps homepage. Then, type the desired search query into the input box. Click the search button or press the Enter key. Finally, wait for the page to load. After the first page loads completely, simulate scrolling the mouse to retrieve data.
javascript
console.log(`[${searchQuery}] The task has started, navigating to the map homepage....`);
await page.goto(postUrl, { waitUntil: "networkidle2" });
await page.type('#searchboxinput', searchQuery);
await page.keyboard.press('Enter');
console.log(`[${searchQuery}] Search submitted, waiting for the page to load....`);
await page.waitForNavigation({ waitUntil: "networkidle2", timeout: 60000 });
console.log(`[${searchQuery}] The page has loaded completely, ready to start scrolling....`);
Add the scrolling loop:
Define a selector (resultsSelector) for a container present on the page, representing the viewable area displaying the results.
Declare a variable previousHeight to store the height of the content container after the previous scroll, which will be used later to determine if the bottom has been reached.
Use a for loop with a maximum limit (e.g., 100 scrolls) to prevent infinite loops and resource consumption in case of unexpected behavior.
Use if (currentHeight === previousHeight) { ... } to compare the current height with the previous height. If they are equal, it indicates the bottom has been reached, and the loop should break.
container.scrollTo(0, container.scrollHeight); programmatically scrolls the container to its maximum height (i.e., the bottom), which is the key step in simulating scrolling. return container.scrollHeight; returns the current total scroll height to the Node.js environment, assigning it to currentHeight.
Print the obtained results for logging.
javascript
const resultsSelector = 'div[role="feed"]'; // This is the selector for the container that includes all results.
await page.waitForSelector(resultsSelector, { timeout: 15000 }); // Wait for the result container to appear.
let previousHeight;
for (let i = 0; i < 100; i++) { // Scroll a maximum of 100 times to prevent infinite loops.
const currentHeight = await page.evaluate(selector => {
const container = document.querySelector(selector);
if (!container) return -1;
container.scrollTo(0, container.scrollHeight);
return container.scrollHeight;
}, resultsSelector);
if (currentHeight === -1) {
console.log(`[${searchQuery}] Unable to find the result container, stop scrolling.`);
break;
}
await new Promise(resolve => setTimeout(resolve, 3000)); // Wait 3 seconds for the new data to load.
if (currentHeight === previousHeight) {
console.log(`[${searchQuery}] Scroll to the bottom, no more content will be added.`);
break; // If the height does not change after scrolling, it means you have reached the bottom.
}
previousHeight = currentHeight;
console.log(`[${searchQuery}] Scrolled (the ${i + 1} time), current content height: ${currentHeight}`);