TikTok is a global short-video social platform that has attracted sustained attention from millions of users, thanks to its unique algorithm and precise user targeting. Scraping TikTok data enables the quick identification of potential new customers and facilitates the discovery of new marketing opportunities. This article will explain how to scrape TikTok search data and parse the returned information.
First, we need to install the necessary Python libraries.
language
pip install requests pandas execjs loguru
Next, we create a TiktokUserSearch class. Within the init method, we initialize fixed headers, particularly the user-agent, and set up the output file.
Python
from datetime import datetime
from fake_useragent import UserAgent
class TiktokUserSearch:
def __init__(self, output_file=None, headers=None):
# 1. Dynamically generate a User-Agent and set a more complete default request header
try:
ua = UserAgent()
default_user_agent = ua.chrome
except Exception:
default_user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
default_headers = {
"User-Agent": default_user_agent,
"Referer": "https://www.tiktok.com/",
"Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7",
}
# 2. If the user passes in custom headers, use it to update the defaults
if headers and isinstance(headers, dict):
default_headers.update(headers)
self.headers = default_headers
self.cookies = None # Initialize to empty and will be set in subsequent methods
# 3. Maintains excellent dynamic filename logic
self.output_file = output_file if output_file else f'tiktok_videos_{datetime.now().strftime("%Y%m%d_%H%M%S")}.csv'
print(f"The crawler is initialized.User-Agent: {self.headers['User-Agent'][:30]}...")
print(f"The data will be saved to: {self.output_file}")
A method is needed to convert a cookie string into the dictionary format required by the requests library.
Python
def cookie_str_to_dict(self, cookie_str) -> dict:
cookie_dict = {}
cookies = [i.strip() for i in cookie_str.split('; ') if i.strip() != ""]
for cookie in cookies:
key, value = cookie.split('=', 1)
cookie_dict[key] = value
return cookie_dict
The next crucial steps involve extracting and carrying key device fingerprints like msToken from the Cookie. The most critical step is calling a pre-reverse-engineered JavaScript file to generate the X-Bogus dynamic signature in real-time using execjs. Finally, an automatic retry mechanism is added to all network requests to handle potential network fluctuations, ensuring the robustness and high success rate of the scraping task.
Finally, the returned data is parsed and saved to a file in CSV format. Requests are sent to fetch the data.
Python
def parse_data(self, data_list):
# ... Extract various fields ...
df = pd.DataFrame(video_data)
df.to_csv(self.output_file, mode='a', header=not file_exists, ...)
Define the keywords to scrape and proceed with scraping in a loop.