文章/答案/技术大牛

发布

社区首页 >问答首页 >抓取HTML标题标题并匹配wordlist - Python 3

问抓取HTML标题标题并匹配wordlist - Python 3
EN

Stack Overflow用户

提问于 2017-05-26 00:04:04

回答 1查看 217关注 0票数 0

我对Python比较陌生，我的问题是：

我想指定一个网站，并有一个Python模块(例如。( BeautifulSoup)刮掉标题标题，如果与单词列表中的任何单词匹配，则打印"Bingo“，否则打印"nothing here”

下面是我的代码，对于我如何使这个工作有任何建议或想法吗？

import urllib.request
from bs4 import BeautifulSoup

Match = ("Whois", "domain", "IP", "search")

soup = BeautifulSoup(urllib.request.Request("https://whois.domaintools.com/"))
if (soup.title.string in Match):
    print ("Bingo")
else:
    print ("Nothing here!")

python-3.x

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-05-26 01:02:17

使用“requests模块”：

import requests
from bs4 import BeautifulSoup

r = requests.get('https://whois.domaintools.com/')

soup = BeautifulSoup(r.text, 'html.parser')
print(r.text)

这将打印以下消息：

Please contact memberservices@domaintools.com and reference error #4311

我偷偷怀疑这可能是因为他们挡住了铲运机。实际上，当我们指定类似于浏览器的用户代理时，它现在正确地加载页面。因此，固定版本变成：

import requests
from bs4 import BeautifulSoup

Match = ("Whois", "domain", "IP", "search")

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get('https://whois.domaintools.com/', headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

for m in Match:
    if m in soup.title.string:
        print('Bingo!')
        break  # Exit checking loop

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44191561

复制

相似问题

问抓取HTML标题标题并匹配wordlist - Python 3
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问抓取HTML标题标题并匹配wordlist - Python 3EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问抓取HTML标题标题并匹配wordlist - Python 3
EN