我想刮这个网站的全文:https://www.ecb.europa.eu/press/pressconf/2016/html/is161020.en.html。也就是说,从“女士们先生们,”最后,"...So实际上可以看到,中小企业贷款和大公司贷款之间的差额已经大大减少了。“
但是,我的代码只在“我们现在就可以为您解答问题”(文本的中间部分)之前进行筛选,.I非常感谢您能帮助我解决这个问题。
以下是代码:
from bs4 import BeautifulSoup
import urllib
import pandas as pd
import ssl
import os
import time
import string
# function loads html source code of given url
ssl._create_default_https_context = ssl._create_unverified_context
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15'
headers = {'User-Agent':user_agent,}
url = "https://www.ecb.europa.eu/press/pressconf/2016/html/is161020.en.html"
req = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(req)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
article = soup.find('article')
paragraphs = article.find_all('p')
print(article)发布于 2018-11-06 13:25:58
全文载于以下各段:
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.ecb.europa.eu/press/pressconf/2016/html/is161020.en.html')
soup = BeautifulSoup(resp.content, 'html5lib')
article = soup.find('article')
paragraphs = article.find_all('p')https://stackoverflow.com/questions/53172731
复制相似问题