This document discusses opening multiple web pages concurrently and retrieving data simultaneously. This approach enables efficient data processing, simulates parallel multi-tasking operations, and reduces resource waste. It is more convenient and faster than processing tasks sequentially one-by-one, and it only requires launching a single browser instance.
How to Open Multiple Pages Simultaneously?
Use Promise.all() for parallel processing. The steps are as follows:
First, identify the target location website information. Launch one browser; it is not necessary to start a new browser for each address.
Next, create a separate page for each address to be searched. Each address runs in its own page within the shared browser instance to prevent interference. Use Promise.all() to trigger the scraping tasks for all location pages concurrently.
Finally, after all tasks are completed, consolidate the scraped data into a single array for storage.
Processing Steps:
1.Define an array containing all the target locations to scrape.
javascript
const searchQueries = [
"New York, NY, Starbucks",
"Los Angeles, California, Vegan Restaurant",
];
2.Launch a browser.
javascript
try {
browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
3.Create a page for each search address to obtain information from the different target websites.
javascript
const scrapePromises = searchQueries.map(query =>
scrapeSinglePost(browser, query)
);
4.Use the Promise.all() function.
javascript
console.log("All tasks have been scheduled and are waiting for completion...");
const resultsFromAllTasks = await Promise.all(scrapePromises);
5.After all tasks finish, return the retrieved information and consolidate the scraped data into a single array.
javascript
const allRawData = resultsFromAllTasks
.filter(result => result.rawData !== null)
.map(result => ({
query: result.searchQuery,
data: result.rawData
}));
Complete Code Example:
javascript
const searchQueries = [
"New York, NY, Starbucks",
"Los Angeles, California, Vegan Restaurant",
];
let browser;
console.log(`
Start concurrent crawling tasks,A total of ${searchQueries.length} searches...`);
try {
browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
const scrapePromises = searchQueries.map(query =>
scrapeSinglePost(browser, query)
);
console.log("All tasks have been scheduled and are waiting for full completion....");
const resultsFromAllTasks = await Promise.all(scrapePromises);
const allRawData = resultsFromAllTasks
.filter(result => result.rawData !== null)
.map(result => ({
query: result.searchQuery,
data: result.rawData
}));