我有下面的代码,我最终想要网络抓取和分析。
我的代码已经运行了将近一个小时,但它似乎并没有从这个站点中恢复过来。
import bs4 as bs
from urllib.request import urlopen as ureq
my_url2 = 'https://www.dreamteamfc.com/g/#tournament/stats-centre-stats'
ureq(my_url2)发布于 2021-05-23 05:34:37
您要查找的数据是通过Ajax从其他网址加载的(这样BeautifulSoup就看不到它了)。此外,使用requests模块来获取页面/Json数据-它会自动处理压缩、重定向等。
要加载数据,请使用以下示例:
import json
import requests
url = "https://nuk-data.s3.eu-west-1.amazonaws.com/json/players_tournament.json"
data = requests.get(url).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some data to screen:
for player in data:
print(
"{:<15} {:<15} {}".format(
player["first_name"], player["last_name"], player["cost"]
)
)打印:
Cristiano Ronaldo 7000000
Goran Pandev 1000000
David Marshall 2000000
Jesús Navas 3000000
Kasper Schmeichel 3000000
Sergio Ramos 5000000
Raúl Albiol 2000000
Giorgio Chiellini 3500000
...and so on.编辑:要将数据加载到数据帧中,可以使用.json_normalize
import json
import requests
import pandas as pd
url = "https://nuk-data.s3.eu-west-1.amazonaws.com/json/players_tournament.json"
data = requests.get(url).json()
df = pd.json_normalize(data)
print(df)
df.to_csv("data.csv", index=None)打印:
id first_name last_name squad_id cost status positions locked injury_type injury_duration suspension_length cname stats.round_rank stats.season_rank stats.games_played stats.total_points stats.avg_points stats.high_score stats.low_score stats.last_3_avg stats.last_5_avg stats.selections stats.owned_by stats.MIN stats.SMR stats.SMB stats.GS stats.ASS stats.YC stats.RC stats.PM stats.PS stats.CS stats.GC stats.star_man_awards stats.7_plus_ratings stats.goals stats.assists stats.cards stats.clean_sheets tournament_stats.star_man_awards tournament_stats.7_plus_ratings tournament_stats.goals tournament_stats.assists tournament_stats.cards tournament_stats.clean_sheets
0 14937 Cristiano Ronaldo 359 7000000 playing [4] 0 None None None None 0 0 9 0 0 0 0 0 0 22760 41.3 764 0 0 15 0 1 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0
1 15061 Goran Pandev 504 1000000 playing [4] 0 None None None None 0 0 0 0 0 0 0 0 0 50 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 15144 David Marshall 115 2000000 playing [1] 0 None None None None 0 0 0 0 0 0 0 0 0 166 0.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 17740 Jesús Navas 118 3000000 playing [3] 0 None None None None 0 0 0 0 0 0 0 0 0 154 0.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 17745 Kasper Schmeichel 369 3000000 playing [1] 0 None None None None 0 0 9 0 0 0 0 0 0 3261 5.9 810 0 0 0 0 1 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 17861 Sergio Ramos 118 5000000 playing [2] 0 None None None None 0 0 9 0 0 0 0 0 0 14647 26.6 712 0 0 1 0 1 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0
...and so on.并保存data.csv (来自LibreOffice的截图):

https://stackoverflow.com/questions/67654312
复制相似问题