LinkedIn Scraper is a Python library specifically designed to extract data from LinkedIn profile pages. However, recently, the library has encountered functional issues when processing users’ work experience modules. After investigation, the root cause was identified as LinkedIn’s updates to the structure of its front-end pages.
We found that LinkedIn recently updated the page layout of the user experience module (corresponding to the /experience path). This adjustment directly caused the previous data parsing logic to malfunction. The specific issues are as follows:
1.The old CSS selector pvs-list has been completely removed.
2.The new main container is a
3.The internal structure has become more complex, likely involving additional nested
To address these issues, we propose the following solutions:
Analyze the new page structure by comparing old and new pages using a Diff tool to identify alternative positioning elements.
Update CSS selectors or XPath expressions. If the new page structure is clear and has stable IDs and data attributes, update the CSS selectors. If the structure is complex and lacks stability, use XPath.
Adjust the data extraction logic to adapt to the updated HTML structure. Sometimes, not only do the selectors change, but the way data is presented in the DOM may also change.
Additionally, it is recommended to add a validation layer and exception handling mechanisms to improve code robustness. Example:
Python
def get_title(element):
# Prioritize trying new and the most precise selectors
title = element.find_one_by_xpath("./h1[@data-testid='title']")
if title: return title.text
# If it fails, fall back to the old, possibly still valid selector
title = element.find_one_by_css("h1.old-title-class")
if title: return title.text
# If it still fails, go back to None and log it
log.warning("All title selectors are invalid!")
return None