问BeautifulSoup网站scraping - html解析
EN

Stack Overflow用户

提问于 2018-10-20 22:34:00

回答 1查看 33关注 0票数 0

我正在尝试使用beautifulsoup4从网站中抓取数据，并且只检索html标记之间的信息以放入excel文档，目前我只能从页面中获取整个html数据。

import sys
import urllib3
import xlsxwriter
import lxml

page = requests.get('genericurlhere.com')
soup = BeautifulSoup(page.text, 'html.parser')

f = csv.writer(open('web_scrape.csv', 'w'))
f.writerow(['Item', 'Description'])


heading = soup.find_all("h4", class_="list-group-item-heading")
print(heading)
print('-------------------')
desc = soup.find_all("p", class_='list-group-item-text')
print(desc)

beautifulsoup

html-parsing

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-10-20 23:01:06

尝试使用text

desc = soup.find_all("p", class_='list-group-item-text')
desc = [e.text for e in desc] # only text within tags from the html elements.
print(desc)

请注意，还可以使用[]获取html标记的属性，如：each['id']。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52910659

复制

相似问题

问BeautifulSoup网站scraping - html解析
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup网站scraping - html解析EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup网站scraping - html解析
EN