我甚至尝试过pypi.org中的命令,但是没有文章被下载。
from newspaper import Article
url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article = Article(url)
article.download()
article.htmlarticle.html只给空刺‘’。当我尝试article.parse()时,它会产生错误*
你必须先
download()一篇文章!
我已经试过了
while article.download_state == ArticleDownloadState.NOT_STARTED:
# Raise exception if article download state does not change after 10 seconds
if slept > 9:
raise ArticleException('Download never started')
sleep(1)
slept += 1仍然无法解决这个问题。
发布于 2018-11-11 21:39:16
有时,您必须清理链接,例如,从RSS提要。
urlparse库可以用于Google。
示例
google_url = 'https://www.google.com/url?rct=j&sa=t&url=https://www.timesnownews.com/international/article/european-union-chief-donald-tusk-lashes-out-at-donald-trump-stance-on-europe/311933&ct=ga&cd=CAIyHDlhZGYyMmM4NzAwYzNlZDc6Y28udWs6ZW46R0I&usg=AFQjCNHrsEaxxjXvWB3wM_1aRjNg6aeZvw'url=后获取变量
from urllib.parse import urlparse, parse_qs
url = urlparse(google_url)
print (parse_qs(url.query)['url'][0])此外,还请注意,如果不同而不是单独分配,则输出将被覆盖。
在测试脚本时,输出将只包括article.text:
article = Article('https://www.google.com/url?rct=j&sa=t&url=https://www.timesnownews.com/international/article/european-union-chief-donald-tusk-lashes-out-at-donald-trump-stance-on-europe/311933&ct=ga&cd=CAIyHDlhZGYyMmM4NzAwYzNlZDc6Y28udWs6ZW46R0I&usg=AFQjCNHrsEaxxjXvWB3wM_1aRjNg6aeZvw')
article.download()
article.parse()
article.top_image
article.text这在测试脚本时起作用:
top_image = article.top_image
text = article.text
print (top_image, text)https://stackoverflow.com/questions/51793998
复制相似问题