首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >网络抓取-标题

网络抓取-标题
EN

Stack Overflow用户
提问于 2020-09-19 15:57:32
回答 1查看 52关注 0票数 1

我过去经常从网站上抓取标题,但这次我做不到,也不知道为什么。

您可以看到我的代码,如下:

代码语言:javascript
复制
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import pandas as pd
import ssl
from time import sleep
from random import randint
try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context
    
html = urlopen("https://officialblackwallstreet.com/directory/")
bsObj = soup(html.read())
bws_titles_bags = []
bws_names = bsObj.findAll(["a","title data-original-title"])

结果

代码语言:javascript
复制
<img alt="" class="attachment-javo-tiny size-javo-tiny wp-post-image" height="80" sizes="(max-width: 80px) 100vw, 80px" src="https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-80x80.jpg" srcset="https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-80x80.jpg 80w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-150x150.jpg 150w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-300x300.jpg 300w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-768x768.jpg 768w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-1024x1024.jpg 1024w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-600x600.jpg 600w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-250x250.jpg 250w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-132x133.jpg 132w" width="80"> </img></div>
</a>, <a href="https://officialblackwallstreet.com/biz/zmena-inc/">
<div class="img-wrap-shadow">

我如何检索,例如,标题“McClean摄影”,以及其他的?

谢谢你的帮助。

EN

回答 1

Stack Overflow用户

发布于 2020-09-19 16:21:14

数据通过其他URL通过Ajax动态加载。您可以使用此示例获取标题:

代码语言:javascript
复制
import json
import requests
from bs4 import BeautifulSoup


api_url = 'https://officialblackwallstreet.com/wp-admin/admin-ajax.php'

params = {
    'post_type':  'item',
    'type':    2,
    'page':    1,
    'ppp': 9,
    'action':  'post_list',
    'order':   'DESC',
    'orderby': 'date',
    'keyword': ''
}


data = requests.post(api_url, data=params).json()

#uncomment this to print all data:
#print(json.dumps(data, indent=4))

for m in data['markers']:
    print(BeautifulSoup(m['info']['post_title'], 'html.parser').text)

指纹:

代码语言:javascript
复制
McClean Photography
Zmena INC.
Hippie Adjacent
YourAdminOnline.com
Don’t Sweat The Technique
Asanee Coaching Services
Joy Street Design
Natural Ash Body
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63970689

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档