这是我从wordinastence.com中抓取和解析必要信息的代码,它为给定的单词提供了有用的上下文语句:
#first import request to crawl the html from the target page
#this case the website is http://www,wordinasentence.com
import requests
target = input("The word you want to search : ")
res = requests.get("https://wordsinasentence.com/"+ target+"-in-a-sentence/")
#further, put this in so that res_process malfunction could flag the errors
try:
res.raise_for_status()
except Exception as e:
print("There's a problem while connecting to a wordsinasentence sever:", e)
#it's a unreadable information, so that we needs to parse it to make it readable.
## use the beautifulsoup to make it readable
import bs4
html_soup = bs4.BeautifulSoup(res.text, 'html.parser')
#check it has been well parsed
#now we'll extract the Defintion of target
keywords = html_soup.select('Definition')如果我运行给定的方法select('Defintion'),即使使用html_soup变量打印的以下内容,它仍然不返回空列表:
<p onclick='responsiveVoice.speak("not done for any particular reason; chosen or done at random");' style="font-weight: bold; font-family:Arial; font-size:20px; color:#504A4B;padding-bottom:0px;">Definition of Arbitrary</p>
[]可能的问题是什么?
发布于 2017-11-09 20:05:32
https://stackoverflow.com/questions/47195598
复制相似问题