我想把谷歌热门故事页面上所有排名前50的搜索结果都剔除出来:https://www.google.com/trends/
然而,当我运行下面的代码时,没有出现任何搜索结果,看起来我只得到了标签:
wikis = ["https://www.google.com/trends/"]
for wiki in wikis:
website = requests.get(wiki)
soup = BeautifulSoup(website.content, "lxml")
text = ''.join([element.text for element in soup.body.find_all(lambda tag: tag != 'script', recursive=False)])
new = re.sub(r'[^a-zA-Z \n]','',text)输出:
MyAccountSearchMapsYouTubePlayNewsGmailDriveCalendarTranslatePhotosMoreShoppingWalletFinanceDocsBooksBloggerContactsEven more from GoogleSign inYou are using unsupported browser Some features may not work correctly Upgrade to a modern browser such as Google ChromeTrends has upgraded to a newer version which is not supported by this devicedismiss有什么帮助吗?
发布于 2015-09-10 05:33:05
你的问题是https://www.google.com/trends/是由Javascript生成的。所以你不能使用requests,因为它是一个http库。
要在禁用Javascript的情况下在浏览器中测试网站: firefox示例:
about:config
javascript.enabled = false请查看google/SO以查找支持Javascript的库
发布于 2021-09-21 16:39:42
在我的例子中,它是:topdailytrends?hl=en-US&tz=-180&geo=UA.
单击该名称后,您将在Headers选项卡中看到
requests.get(),在我的示例中,它是:https://trends.google.com/trends/api/topdailytrends?hl=en-US&tz=-180&geo=UA.示例代码:
import requests, json
response = requests.get('https://trends.google.com/trends/api/topdailytrends?hl=en-US&tz=-180&geo=UA').text.replace(")]}',", "").strip()
json_data = json.loads(response)
trending_searches = json_data['default']['trendingSearches']
print(json.dumps(trending_searches, indent=2, ensure_ascii=False))
----------
'''
[
{
"title": "З днем Ангела Марії",
"formattedTraffic": "10K+",
"trendingSearchUrl": "/trends/trendingsearches/daily?geo=UA#%D0%97%20%D0%B4%D0%BD%D0%B5%D0%BC%20%D0%90%D0%BD%D0%B3%D0%B5%D0%BB%D0%B0%20%D0%9C%D0%B0%D1%80%D1%96%D1%97",
"country": "Ukraine"
}
...
]
'''https://stackoverflow.com/questions/32489042
复制相似问题