我正在Python3.8上做这个项目。我必须将数据下载到Pandas中,并最终为2018年和2019年的所有英超球队编写数据库(SQL或Access)。我正试着用漂亮的汤来做这个。我有一个与soccerbase.com一起工作的代码,但它在sofascore.com @压迫者身上不起作用,到目前为止,它已经帮助编写了这些代码。有人能帮帮我吗?
import json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)
json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
# 'Sheffield United'
json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name'] # 'Manchester United'
json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
# 3
json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']
print(json_object)我如何循环这些代码,以获得整个团队的宇宙?我的目标是获取每一支球队的数据,包括“赛事日期”、“比赛”、“主场队”、“主场得分”、“客场比分”、“客场得分”、“得分”,例如,2019英超联赛切尔西1:2: 1-2。
我是萨特人,我怎么能得到它?
发布于 2019-12-07 10:36:53
这段代码只起作用。虽然它并不能捕获网站的所有数据库,但是ith是一个强大的刮取器。
import simplejson as json
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = "https://www.sofascore.com/football///json"
r = requests.get(url)
soup = bs(r.content, 'lxml')
json_object = json.loads(r.content)
headers = ['Tournament', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Status', 'Start Date']
consolidated = []
for tournament in json_object['sportItem']['tournaments']:
rows = []
for event in tournament["events"]:
row = []
row.append(tournament["tournament"]["name"])
row.append(event["homeTeam"]["name"])
if "current" in event["homeScore"].keys():
row.append(event["homeScore"]["current"])
else:
row.append(-1)
row.append(event["awayTeam"]["name"])
if "current" in event["awayScore"].keys():
row.append(event["awayScore"]["current"])
else:
row.append(-1)
row.append(event["status"]["type"])
row.append(event["formatedStartDate"])
rows.append(row)
df = pd.DataFrame(rows, columns=headers)
consolidated.append(df)
pd.concat(consolidated).to_csv(r'Path.csv', sep=',', encoding='utf-8-sig',
index=False)礼貌祈祷@praful-surve
发布于 2019-11-26 05:24:00
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.soccerbase.com/teams/home.sd'
r = requests.get(url)
soup = bs(r.content, 'html.parser')
teams = soup.find('div', {'class': 'headlineBlock'}, text='Team').next_sibling.find_all('li')
teams_dict = {}
for team in teams:
link = 'https://www.soccerbase.com' + team.find('a')['href']
team = team.text
teams_dict[team] = link
consolidated = []
for k, v in teams_dict.items():
print('Acquiring %s data...' % k)
headers = ['Team', 'Competition', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Date Keep']
r = requests.get('%s&teamTabs=results' % v)
soup = bs(r.content, 'html.parser')
h_scores = [int(i.text) for i in soup.select('.score a em:first-child')]
a_scores = [int(i.text) for i in soup.select('.score a em + em')]
limit = len(a_scores)
team = [k for i in soup.select('.tournament', limit=limit)]
comps = [i.text for i in soup.select('.tournament a', limit=limit)]
dates = [i.text for i in soup.select('.dateTime .hide', limit=limit)]
h_teams = [i.text for i in soup.select('.homeTeam a', limit=limit)]
a_teams = [i.text for i in soup.select('.awayTeam a', limit=limit)]
df = pd.DataFrame(list(zip(team, comps, h_teams, h_scores, a_teams, a_scores, dates)),
columns=headers)
consolidated.append(df)
pd.concat(consolidated)(r'#your file location address sep=',', encoding='utf-8-sig', index=False)发布于 2019-11-25 03:27:45
从这里开始:
https://www.sofascore.com/football///json它以json格式给出分数。主页不会刮掉这些数据。这意味着它不在主页面源上。这应该能帮你开始工作。
您可以像这样加载它:
url = 'https://www.sofascore.com/football///json'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml') 下面是一个如何从json中提取数据的示例。最终,您必须使用循环来迭代所看到的数据,但这将使您开始了解如何获取数据:
json_object = json.loads(r.content)
json_object['sportItem']['tournaments'][0]['events'][0]['homeTeam']['name']
#'Sheffield United'
json_object['sportItem']['tournaments'][0]['events'][0]['awayTeam']['name'] #'Manchester United'
json_object['sportItem']['tournaments'][0]['events'][0]['homeScore']['current']
#3
json_object['sportItem']['tournaments'][0]['events'][0]['awayScore']['current']
#3我希望这能帮到你
更新:
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.soccerbase.com/teams/home.sd'
r = requests.get(url)
soup = bs(r.content, 'html.parser')
teams = soup.find('div', {'class': 'headlineBlock'}, text='Team').next_sibling.find_all('li')
teams_dict = {}
for team in teams:
link = 'https://www.soccerbase.com' + team.find('a')['href']
team = team.text
teams_dict[team] = link
team = []
comps = []
dates = []
h_teams = []
a_teams = []
h_scores = []
a_scores = []
consolidated = []
for k, v in teams_dict.items():
print('Acquiring %s data...' % k)
headers = ['Team', 'Competition', 'Home Team', 'Home Score', 'Away Team', 'Away Score', 'Date Keep']
r = requests.get('%s&teamTabs=results' % v)
soup = bs(r.content, 'html.parser')
h_scores.extend([int(i.text) for i in soup.select('.score a em:first-child')])
limit_scores = [int(i.text) for i in soup.select('.score a em + em')]
a_scores.extend([int(i.text) for i in soup.select('.score a em + em')])
limit = len(limit_scores)
team.extend([k for i in soup.select('.tournament', limit=limit)])
comps.extend([i.text for i in soup.select('.tournament a', limit=limit)])
dates.extend([i.text for i in soup.select('.dateTime .hide', limit=limit)])
h_teams.extend([i.text for i in soup.select('.homeTeam a', limit=limit)])
a_teams.extend([i.text for i in soup.select('.awayTeam a', limit=limit)])
df = pd.DataFrame(list(zip(team, comps, h_teams, h_scores, a_teams, a_scores, dates)),
columns=headers)您可以使用以下方法进行搜索和打印:
df[df['Team'] == 'Wolves']
print(df.to_string())得到很酷的数据:
df.groupby('Team').agg({'Home Score': 'mean', 'Away Score': 'mean'})
Home Score Away Score
Team
Arsenal 2.105263 1.368421
Aston Villa 1.687500 1.625000
Bournemouth 1.266667 1.066667
Brighton 1.533333 1.200000
Burnley 1.642857 1.357143
Chelsea 1.900000 1.850000
Crystal Palace 1.142857 0.928571
Everton 1.375000 1.312500
Leicester 1.312500 1.750000
Liverpool 1.857143 1.761905
Man City 2.050000 1.600000
Man Utd 1.421053 0.894737
Newcastle 1.571429 0.785714
Norwich 1.642857 1.357143
Sheff Utd 1.066667 1.066667
Southampton 1.125000 2.187500
Tottenham 1.888889 1.555556
Watford 1.500000 1.125000
West Ham 1.533333 1.466667
Wolves 1.280000 1.440000或
df[df['Away Team'] == 'Leicester'].agg({'Home Score': 'mean', 'Away Score': 'mean'})
Home Score 0.722222
Away Score 2.388889
dtype: float64真是太棒了。DF.T很好,如果您走这条路线,就会有一个df.to_sql()。我希望我的改变能有所帮助,而且我总是很高兴能提供更多的帮助
https://stackoverflow.com/questions/59024776
复制相似问题