In the era of big data, review data is an extremely valuable type of information. For businesses, researchers, or data analysts, Google Maps review data can help them understand genuine user feedback on businesses, attractions, hotels, hospitals, and more, enabling them to make well-informed decisions.
1.The Value of Google Maps Review Data
①Business Analysis: Uncover user satisfaction points and pain points through reviews to optimize services.
②Travel Recommendations: Reviews for hotels, restaurants, and attractions serve as crucial references for user choices.
③Public Opinion Monitoring: The volume and sentiment of reviews can reflect market trends.
④Data Mining: Enables sentiment analysis and keyword extraction.
2.Target API Analysis
When accessing a specific location's page on Google Maps, you'll notice that reviews are not loaded all at once but are fetched in batches via asynchronous requests. The core API endpoint for this is:
https://www.google.com/maps/rpc/listugcposts?authuser=0&hl=el&pb=...
The pb parameter here is crucial. It contains necessary information such as the Place ID, pagination token, and request ID. If we can correctly construct this pb string, we can simulate frontend requests to obtain complete review data.
3.Core Parameter Parsing
Through reverse engineering of frontend requests, the URL construction rules can be summarized as follows:
placeID: Extracted from the Google Maps share link, typically found in the !1sxxxx segment.
pageToken: A token used for pagination. It is empty for the first page and returned in the API response for subsequent pages.
pageSize: The number of reviews returned per request, e.g., 20.
requestID: A session request ID, usually a randomly generated string.
The concatenation logic for the pb parameter resembles::
!1m6!1s{placeID}
!6m4!4m1!1e1!4m1!1e3
!2m2!1i{pageSize}!2s{pageToken}
!5m2!1s{requestID}!7e81
!8m9!2b1!3b1!5b1!7b1
!12m4!1b1!2b1!4m1!1e1!11m0!13m1!1e1
After final concatenation, appending this to the API URL completes a request.
4.Python Implementation: Generating the URL
Python
def _generate_url(self, map_url, page_token, page_size, request_id):
place_id_regex = re.compile(r"!1s([^!]+)")
match = place_id_regex.search(map_url)
if not match:
raise ValueError(f"Could not extract place ID from URL: {map_url}")
raw_place_id = match.group(1)
try:
raw_place_id = urllib.parse.unquote(raw_place_id)
except Exception:
pass
encoded_place_id = urllib.parse.quote(raw_place_id)
encoded_page_token = urllib.parse.quote(page_token)
pb_components = [
f"!1m6!1s{encoded_place_id}",
"!6m4!4m1!1e1!4m1!1e3",
f"!2m2!1i{page_size}!2s{encoded_page_token}",
f"!5m2!1s{request_id}!7e81",
"!8m9!2b1!3b1!5b1!7b1",
"!12m4!1b1!2b1!4m1!1e1!11m0!13m1!1e1",
]
pb_string = "".join(pb_components)
return f"https://www.google.com/maps/rpc/listugcposts?authuser=0&hl=el&pb={pb_string}"
5.Pagination Handling
Python
def extract_next_page_token(data):
text = data.decode("utf-8", errors="ignore")
prefix = ")]}'\n"
if text.startswith(prefix):
text = text[len(prefix) :]
try:
result = json.loads(text)
except json.JSONDecodeError:
return ""
token = get_nested_element(result, 1)
return token if isinstance(token, str) else ""
6.Simulating Requests
Using the URL generation and pagination handling described above, we simulate sending requests.
Python
def _fetch_review_page(self, url):
try:
resp = self.http_client.get(url, timeout=10)
resp.raise_for_status()
return resp.content
except httpx.RequestError as e:
raise Exception(f"Fetch error for {url}: {e}")
except httpx.HTTPStatusError as e:
raise Exception(f"{url}: unexpected status code: {e.response.status_code}")
Ultimately, we obtain the raw review data returned by the API.