我正坐在为我的硕士项目,我想刮LinkedIn。就我现在而言,当我想刮用户的教育页面时,我遇到了一个问题。https://www.linkedin.com/in/williamhgates/details/education/)
我想对用户的所有教育内容进行清查。在这个例子中,我想在mr1 hoverable-link-text t-bold下刮起“哈佛大学”,但我看不出来。
以下是Linkedin代码中的HTML:
<li class="pvs-list__paged-list-item artdeco-list__item pvs-list__item--line-separated " id="profilePagedListComponent-ACoAAA8BYqEBCGLg-vT-ca6mMEqkpp9nVffJ3hc-EDUCATION-VIEW-DETAILS-profile-ACoAAA8BYqEBCGLg-vT-ca6mMEqkpp9nVffJ3hc-NONE-da-DK-0">
<!----><div class="pvs-entity
pvs-entity--padded pvs-list__item--no-padding-when-nested
">
<div>
<a class="optional-action-target-wrapper
display-flex" target="_self" href="https://www.linkedin.com/company/1646/">
<div class="ivm-image-view-model pvs-entity__image ">
<div class="ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag display-flex
">
<!----> <img width="48" src="https://media-exp1.licdn.com/dms/image/C4E0BAQF5t62bcL0e9g/company-logo_100_100/0/1519855919126?e=1668643200&v=beta&t=BL0HxGNOasVbI3u39HBSL3n7H-yYADkJsqS3vafg-Ak" loading="lazy" height="48" alt="Harvard University logo" id="ember59" class="ivm-view-attr__img--centered EntityPhoto-square-3 lazy-image ember-view">
</div>
</div>
</a>
</div>
<div class="display-flex flex-column full-width align-self-center">
<div class="display-flex flex-row justify-space-between">
<a class="optional-action-target-wrapper
display-flex flex-column full-width" target="_self" href="https://www.linkedin.com/company/1646/">
<div class="display-flex align-items-center">
<span class="mr1 hoverable-link-text t-bold">
<span aria-hidden="true"><!---->Harvard University<!----></span><span class="visually-hidden"><!---->Harvard University<!----></span>
</span>
<!----><!----><!----> </div>
<!----> <span class="t-14 t-normal t-black--light">
<span aria-hidden="true"><!---->1973 - 1975<!----></span><span class="visually-hidden"><!---->1973 - 1975<!----></span>
</span>
<!----> </a>
<!---->
<div class="pvs-entity__action-container">
<!----> </div>
</div>
<div class="pvs-list__outer-container">
<!----> <ul class="pvs-list
">
<li class=" ">
<div class="pvs-list__outer-container">
<!----><!----><!----></div>
</li>
</ul>
<!----></div>
</div>
</div>
</li>我尝试了以下代码:
education = driver.find_element("xpath", '//*[@id="profilePagedListComponent-ACoAAA8BYqEBCGLg-vT-ca6mMEqkpp9nVffJ3hc-EDUCATION-VIEW-DETAILS-profile-ACoAAA8BYqEBCGLg-vT-ca6mMEqkpp9nVffJ3hc-NONE-da-DK-0"]/div/div[2]/div[1]/a/div/span/span[1]/').text
print(education)我不断地发现错误:
Message: no such element: Unable to locate element:有人能帮忙吗?我希望有一个脚本,循环通过教育,并节省地方的教育和教育年。
发布于 2022-08-12 22:48:03
要提取文本,,哈佛大学,,理想情况下,您需要为https://stackoverflow.com/a/50474905/7429447引入WebDriverWait,并且您可以使用以下任何一个https://stackoverflow.com/a/48056120/7429447
您可以在如何使用Selenium - Python检索WebElement的文本中找到相关的讨论
发布于 2022-08-18 10:04:39
谢谢大家!
最后,我得到了下面的代码。
get_education_school = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='pvs-list ']/li//span[contains(@class, 'hoverable-link-text')]//span[1]")))]
get_education_years = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='pvs-list ']/li//span[contains(@class, 't-14 t-normal t-black--light')]//span[1]")))]
results_education_school = []
results_education_years = []
for i,j in zip(get_education_school, get_education_years):
results_education_school.append(i)
results_education_years.append(j)
print(results_education_school)
print(results_education_years)发布于 2022-08-12 15:08:13
我会先拿到教育部门的名单。
education_list = driver.find_element(By.CSS_SELECTOR, 'ul.pvs-list')
# loop through education_list for place and years
# would recommend relative locators for this task.
# find the image and get the first and second span with text inside of them.我现在正在向代码中添加更多的细节。请稍等。
https://stackoverflow.com/questions/73336045
复制相似问题