我试图使用这段代码获取有关youtube频道的一些公共信息(API不适合这项任务)。
代码示例:
import re
import json
import requests
from bs4 import BeautifulSoup
URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)
# Uncomment to view all the data
# print(json.dumps(data))
# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)
# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]
print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])预期结果(6个月前,效果良好):
Joined: Jun 30, 2007。。但现在有:
AttributeError: 'NoneType' object has no attribute 'group'回溯显示错误在这一行上:
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)您能帮助修复此代码继续工作并返回数据的问题吗?
任何帮助都将不胜感激,谢谢
发布于 2022-03-23 21:02:32
您的代码运行良好。
import re
import json
import requests
from bs4 import BeautifulSoup
URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)
# Uncomment to view all the data
# print(json.dumps(data))
# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)
# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]
print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])输出:
Channel Views: 1,12,94,125টি ভিউ
Joined: 30 জুন, 2007发布于 2022-03-23 20:49:37
实际上这里根本没有使用BeautifulSoup。您只是获取原始文本并搜索它的字符串。
这就是网络抓取的问题所在。YouTube已经更改了它们的JavaScript,并且该变量不再存在。我们不知道你想找什么,但你现在的方法行不通。实际上,您可能需要使用Selenium来运行Javascript并从DOM中提取信息。
https://stackoverflow.com/questions/71593817
复制相似问题