我试图取消新闻网站与新闻,是有效的,一个特定的日期。函数的输出返回:
<li class="meta-data"><time data-datetime="relative" datetime="2022-01-30T08:56:09Z" title="2022-01-30T08:56:09Z">January 30, 2022 08:56</time></li>我怎么能只打印日期时间?打印i.text似乎不起作用。
下面是密码。
从bs4导入BeautifulSoup导入日期时间作为日期时间从datetime导入BeautifulSoup导入熊猫为pd pd.set_option(‘display.max_columns,None) pd.set_option(’max_colwidth,None) def okx_scrap():B= [] url = 'https://www.okex.com/support/hc/en-us/sections/360000030652-Latest-Announcements‘页面= requests.get(url) BeautifulSoup= BeautifulSoup(page.content,'html.parser') small_soup = soup.find_all(class_ =“文章列表-链接”) url_1st = 'https://www.okex.com/support‘#在small_soup中获取昨天的日期: full_url = url_1st +(I’‘href’) page2 = requests.get(full_url) soup2 =BeautifulSoup(第2页内容,'html.parser') small_soup2 = soup2.find_all('li',{'class':‘meta’}) #print(small_soup2)用于I in small_soup2: print(i) okx_scrap()
发布于 2022-02-06 07:26:01
不要使用find_all,而是使用find,因为每个页面中只有一个条目,并且提取time标记而不是li
def okx_scrap():
b = []
url = 'https://www.okex.com/support/hc/en-us/sections/360000030652-Latest-Announcements'
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
small_soup = soup.find_all(class_ = "article-list-link")
url_1st = 'https://www.okex.com/support'
#Getting Yesterday's Date
for i in small_soup:
full_url = url_1st +(i['href'])
page2 = requests.get(full_url)
soup2 = BeautifulSoup(page2.content,'html.parser')
print(soup2.find('time')['datetime'])输出:
>>> okx_scrap()
2022-01-30T08:56:09Z
2022-01-29T05:41:18Z
2022-01-28T10:15:02Z
2022-01-28T07:29:11Z
2022-01-28T06:45:48Z
2022-01-28T03:13:18Z
...发布于 2022-02-06 07:03:34
将i视为字符串(如果不将变量i类型转换为字符串,则使用内置方法i = str(i))
i = str(i)
i = i.split("><")[1]
i = i.split("datetime=")[2]
i = i.split("\"")[1]
print(i)
# 2022-01-30T08:56:09Z发布于 2022-02-06 07:25:13
您可以使用regex:
import re
string = '<li class="meta-data"><time data-datetime="relative" datetime="2022-01-30T08:56:09Z" title="2022-01-30T08:56:09Z">January 30, 2022 08:56</time></li>'
datetime= r"(\d{1,4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z)"
output = re.findall(datetime, string)
#output:
['2022-01-30T08:56:09Z', '2022-01-30T08:56:09Z']https://stackoverflow.com/questions/71004860
复制相似问题