我过去经常从网站上抓取标题,但这次我做不到,也不知道为什么。
您可以看到我的代码,如下:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
import pandas as pd
import ssl
from time import sleep
from random import randint
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
html = urlopen("https://officialblackwallstreet.com/directory/")
bsObj = soup(html.read())
bws_titles_bags = []
bws_names = bsObj.findAll(["a","title data-original-title"])结果
<img alt="" class="attachment-javo-tiny size-javo-tiny wp-post-image" height="80" sizes="(max-width: 80px) 100vw, 80px" src="https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-80x80.jpg" srcset="https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-80x80.jpg 80w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-150x150.jpg 150w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-300x300.jpg 300w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-768x768.jpg 768w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-1024x1024.jpg 1024w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-600x600.jpg 600w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-250x250.jpg 250w, https://officialblackwallstreet.com/wp-content/uploads/2020/09/Newport-Avenue-Ocean-Beach-McClean-Photography-132x133.jpg 132w" width="80"> </img></div>
</a>, <a href="https://officialblackwallstreet.com/biz/zmena-inc/">
<div class="img-wrap-shadow">我如何检索,例如,标题“McClean摄影”,以及其他的?

谢谢你的帮助。
发布于 2020-09-19 16:21:14
数据通过其他URL通过Ajax动态加载。您可以使用此示例获取标题:
import json
import requests
from bs4 import BeautifulSoup
api_url = 'https://officialblackwallstreet.com/wp-admin/admin-ajax.php'
params = {
'post_type': 'item',
'type': 2,
'page': 1,
'ppp': 9,
'action': 'post_list',
'order': 'DESC',
'orderby': 'date',
'keyword': ''
}
data = requests.post(api_url, data=params).json()
#uncomment this to print all data:
#print(json.dumps(data, indent=4))
for m in data['markers']:
print(BeautifulSoup(m['info']['post_title'], 'html.parser').text)指纹:
McClean Photography
Zmena INC.
Hippie Adjacent
YourAdminOnline.com
Don’t Sweat The Technique
Asanee Coaching Services
Joy Street Design
Natural Ash Bodyhttps://stackoverflow.com/questions/63970689
复制相似问题