文章/答案/技术大牛

发布

问使用BS4解析日期
EN

Stack Overflow用户

提问于 2020-07-24 21:19:22

回答 1查看 74关注 0票数 0

我有以下部分的HTML数据，想提取日期信息(例如31-12月18日)。如果任何人都能分享使用BS4的指导之手，我将不胜感激。

<th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-19</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-18</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-17</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-16</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-15</time></th>

我使用bs4解析器选项'time'，所有条目都缺少文本数据(例如，12月31日-15日)，有人知道为什么吗？

import requests
page = equests.get("https://www.reuters.com/companies/MBBM.KL/financials")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
soup.find_all('time')

[<time class="TextLabel__text-label___3oCVw TextLabel__gray___1V4fk TextLabel__regular___2X0ym"></time>, <time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg"></time>, <time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg"></time>, <time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg"></time>, <time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg"></time>, <time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg"></time>]
>>>

html

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-25 00:20:39

试试这个：

from bs4 import  BeautifulSoup
URL = 'th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-19</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-18</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-17</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-16</time></th><th scope="column"><time class="TextLabel__text-label___3oCVw TextLabel__black___2FN-Z TextLabel__medium___t9PWg">31-Dec-15</time></th>'


soup = BeautifulSoup(URL, "html.parser")

times = [time.get_text() for time in soup.select('time')]
for time in times:
    print(time)

打印：

31-Dec-19
31-Dec-18
31-Dec-17
31-Dec-16
31-Dec-15

编辑以获取来自site use selenium的python时间：

from selenium import webdriver

driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')

driver.get('https://www.reuters.com/companies/MBBM.KL/financials')
driver.implicitly_wait(5)
times = driver.find_elements_by_css_selector('time')

for time in times[1:]:
    print(time.text)
driver.close()

输出：

31-Dec-19
31-Dec-18
31-Dec-17
31-Dec-16
31-Dec-15

注意，您需要selenium和geckodriver，在本例中，我从c:/program/geckodriver.exe导入它

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63074408

复制

相似问题

问使用BS4解析日期
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用BS4解析日期EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用BS4解析日期
EN