我正在尝试使用beautifulsoup4从网站中抓取数据,并且只检索html标记之间的信息以放入excel文档,目前我只能从页面中获取整个html数据。
import sys
import urllib3
import xlsxwriter
import lxml
page = requests.get('genericurlhere.com')
soup = BeautifulSoup(page.text, 'html.parser')
f = csv.writer(open('web_scrape.csv', 'w'))
f.writerow(['Item', 'Description'])
heading = soup.find_all("h4", class_="list-group-item-heading")
print(heading)
print('-------------------')
desc = soup.find_all("p", class_='list-group-item-text')
print(desc)发布于 2018-10-20 23:01:06
尝试使用text
desc = soup.find_all("p", class_='list-group-item-text')
desc = [e.text for e in desc] # only text within tags from the html elements.
print(desc)请注意,还可以使用[]获取html标记的属性,如:each['id']。
https://stackoverflow.com/questions/52910659
复制相似问题