我有一个玩家统计数据框架,我想要做的是在玩家之间为MB统计数据创建一个协方差矩阵,以了解哪些玩家在一起表现得很好,哪一个通常会互相贬低。
注意,并不是所有的玩家都会在每一场比赛中玩。
我想要这样的东西,其中明显的'x‘是相关的协方差值。
Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh etc, etc
1 Damian Lillard x x x x
2 C.J. McCollum x x x x
3 Allen Crabbe x x x x
4 Noah Vonleh x x x x
5 Ed Davis x x x x
6 Al-Farouq Aminu x x x x
7 Evan Turner x x x x
8 Maurice Harkless x x x x
9 Meyers Leonard x x x x
10 Mason Plumlee x x x x
11 Shabazz Napier x x x x
> df
Player.Name Tm MB DS Game
1 Damian Lillard POR 54.8 59.50 20161025
11 C.J. McCollum POR 30.9 32.50 20161025
16 Allen Crabbe POR 24.1 28.25 20161025
19 Noah Vonleh POR 14.2 15.25 20161025
22 Ed Davis POR 17.9 18.00 20161025
26 Al-Farouq Aminu POR 16.3 18.25 20161025
34 Evan Turner POR 20.5 19.25 20161025
64 Maurice Harkless POR 4.7 5.25 20161025
65 Meyers Leonard POR 2.7 2.25 20161025
68 Mason Plumlee POR 4.7 4.00 20161025
290 Maurice Harkless POR 35.6 35.75 20161027
295 Mason Plumlee POR 36.6 36.75 20161027
299 Damian Lillard POR 41.5 44.25 20161027
309 C.J. McCollum POR 26.8 27.50 20161027
318 Allen Crabbe POR 17.2 16.25 20161027
349 Noah Vonleh POR 5.0 4.75 20161027
358 Evan Turner POR 10.7 10.50 20161027
359 Ed Davis POR 5.6 5.50 20161027
364 Shabazz Napier POR 0.0 0.00 20161027
369 Al-Farouq Aminu POR 13.6 13.25 20161027
545 Damian Lillard POR 56.5 58.25 20161029
557 C.J. McCollum POR 49.5 51.25 20161029
610 Mason Plumlee POR 22.9 22.50 20161029
611 Allen Crabbe POR 22.6 22.75 20161029
637 Evan Turner POR 15.6 16.75 20161029
649 Al-Farouq Aminu POR 27.9 28.25 20161029
673 Ed Davis POR 8.9 9.50 20161029
704 Noah Vonleh POR 4.8 5.00 20161029
719 Maurice Harkless POR 9.6 11.00 20161029
723 Meyers Leonard POR 6.2 6.25 20161029
728 Shabazz Napier POR 0.0 0.00 20161029数据
structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu",
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee",
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier",
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee",
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis",
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1,
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8,
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9,
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18,
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25,
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75,
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L,
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L,
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L,
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName",
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")发布于 2016-12-15 01:49:12
我认为您首先需要做的是对数据进行reshape,这样每一行都是一个游戏,每个列都是一个玩家的游戏的MB。假设我们的数据在dat中
dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB" "Game"
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
timevar = 'PlayerName')
dat.wide[1:4, 1:4]
Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1 20161025 54.8 30.9 24.1
11 20161027 41.5 26.8 17.2
21 20161029 56.5 49.5 22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard 67.46333 71.10833 28.370 17.23
MB.C.J. McCollum 71.10833 146.34333 20.495 -23.61
MB.Allen Crabbe 28.37000 20.49500 13.170 12.75
MB.Noah Vonleh 17.23000 -23.61000 12.750 28.84发布于 2016-12-15 01:44:59
您可以使用cov()函数来实现这一点,例如:
cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName
> cov_mat[1:3,1:3]
Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard 11.0450 3.76 9.75250
C.J. McCollum 3.7600 1.28 3.32000
Allen Crabbe 9.7525 3.32 8.61125如果您想要计算相关性,只需将cov()替换为cor()即可。
https://stackoverflow.com/questions/41155014
复制相似问题