文章/答案/技术大牛

发布

社区首页 >问答首页 >在Web抓取中处理UL标记，Python3.6

问在Web抓取中处理UL标记，Python3.6
EN

Stack Overflow用户

提问于 2019-03-08 09:26:38

回答 1查看 578关注 0票数 0

我想从URL：https://www150.statcan.gc.ca/n1/en/type/data?geoname=A0002&p=0#中搜索"Table:“&”发布日期：“

我正在使用salenium web驱动程序来刮刮。

下面是源代码中的标记。

<ul>
    # Some HTML Data
</ul>

<ul data-offset="0">
    <li class="ndm-item">
    # Some HTML Tags
</ul>

<ul>
    # Some HTML Tags
</ul>

我想知道第二个标签"ul“的详细信息，其中有”数据偏移“。

for Class_L1 in Soup.findAll('ul', {'data-offset': "0"}):
    for Class_L2 in Class_L1('li',  {'class': 'ndm-item'}):
    for Class_L3 in Class_L2('div',  {'class': 'ndm-result-container'}):
        for Class_L4 in Class_L3.findAll('div',  {'class': 'ndm-result-productid'}):
        Table = str(Class_L4.get_text()).strip()
        print(Table)
        for Class_L4 in Class_L3.findAll('div',  {'class': 'ndm-result-date'}):
        Release_Date = str(Class_L4.get_text()).strip()
        print(Release_Date)

问题是源包含多个数据偏移量=“0”的'ul‘标签，我只想从包含数据偏移量=“0”的第二个'ul’标签中获得详细信息。

web-scraping

beautifulsoup

python-3.x

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-08 09:56:06

你可以使用第n个类型的选择器。这是基于：

我想从URL中刮掉"Table:“&”发布日期：“

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www150.statcan.gc.ca/n1/en/type/data?geoname=A0002&p=0'
driver = webdriver.Chrome()
driver.get(url)
tableInfo = [table.text for table in WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#all .ndm-result-productid")))]
dates = [date.text for date in WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#all .ndm-result-date:nth-child(2)")))]
results = list(zip(tableInfo, dates))
print(results)
driver.quit()

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55060196

复制

相似问题

问在Web抓取中处理UL标记，Python3.6
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Web抓取中处理UL标记，Python3.6EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Web抓取中处理UL标记，Python3.6
EN