我是R程序的新手,我想知道我是否能在一个问题上得到一些帮助。目前我有两个数据框架。一个是ign_temp,它有一个视频游戏标题列表(大约18,000个)和相应的平台(关于30+类型)。有些标题条目由于在多个平台上发布而多次出现,如下所示。这个df被过滤成只显示原始数据库中的标题和平台,原始数据库中有许多列(id、url、release等)。
ign_temp:
title platform
LittleBigPlanet Playstation Vita
Splice Playstation Vita
NHL 13 Xbox
NHL 13 Android
Wild iPhone
Mark of the Ninja Xbox 360
Mark of the Ninja PC
.......我有另一个数据框架ign_revised,它有来自上述数据的游戏示例集,但是还有分数、年份等额外的列值。每个游戏每一行只明显地出现一次,并且我为它们可能出现的平台添加了新的列(从Android开始到Xbox One,大约24个平台),如下所示(浓缩视图):
ign_revised:
id score_phrase title score genre year Android Arcade ... Xbox One
315 Cool Abzu 7.5 Puzzle 2012 Android Arcade ... Xbox One
87 Poor Alan 5.0 Action 2014 Android Arcade ... Xbox One
.....
598 Great NHL 13 8.5 Sports 2013 Android Arcade ... Xbox OneIgn_revised按字母顺序排列,游戏平台列(安卓拱门)。XboxOne),只需对此数据文件中出现的所有1600+标题重复使用平台名称即可。
我的主要问题是,是否有一种类似于for循环的方法,从ign_revised,使用标题和平台列(Android .( XboxOne)将ign_temp与相应的标题和平台相匹配,并更改Android _revised列中的值。相反,XboxOne将显示1(视频游戏标题出现在该平台中)或0(如果没有)。因此,它看起来如下所示:
ign_revised (最终结果):
id score_phrase title score genre year Android Arcade ... Xbox One
315 Cool Abzu 7.5 Puzzle 2012 0 1 ... 0
87 Poor Alan 5.0 Action 2014 0 0 ... 1
.....
598 Great NHL 13 8.5 Sports 2013 1 0 ... 1在我实际的ign_revised数据中,标题在第三列,而以安卓开头的平台名称是第12列,如果这有帮助的话。
伪码:
for (i in 1:nrow(ign_revised)) {
for (j in 12:ncol(ign_revised)) {
* Match current title and platform to ign_temp
* Assign current cell (i,j) value with 1 or 0 based on match
}
}谢谢!
@Gregor
编辑1:对不起,我似乎不能在回复评论中对修改后的代码进行正确的注释,但是由于ign_temp需要完整的18,625个游戏,而不仅仅是上面我从最初的df ()中列出的7个游戏,我是否应该将它修改成这样:
all_title <- ign$title
all_platform <- ign$platform
ign_temp <- structure(list(title = all_title, platform = all_platform, .Names = c("title","platform"), row.names = c(1, -18625L), class = c("data.frame")))
ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,value.var = "value", fill = 0)
merge(ign_revised[1:11], ign_temp_wide)我不确定,因为我搞错了:
rep中的错误(1,nrow(数据)):无效的‘时间’参数
编辑2:为ign_revised、ign_temp、ign_temp_wide添加dput。
dput(droplevels(head(ign_temp, 7)))
structure(list(title = c("LittleBigPlanet PS Vita", "LittleBigPlanet PS Vita -- Marvel Super Hero Edition",
"Splice: Tree of Life", "NHL 13", "NHL 13", "Total War Battles: Shogun",
"Double Dragon: Neon"), platform = c("PlayStation Vita", "PlayStation Vita",
"iPad", "Xbox 360", "PlayStation 3", "Macintosh", "Xbox 360"),
value = c(1, 1, 1, 1, 1, 1, 1)), .Names = c("title", "platform",
"value"), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
dput(droplevels(head(ign_temp_wide, 7)))
structure(list(title = c("#IDARB", "007 Legends", "1001 Spikes",
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), Android = c(0, 0, 0, 0, 0, 0, 0), Arcade = c(0, 0, 0, 0, 0,
0, 0), iPad = c(0, 0, 0, 0, 0, 0, 0), iPhone = c(0, 0, 0, 0,
0, 0, 0), Linux = c(0, 0, 0, 0, 0, 0, 0), Macintosh = c(0, 0,
0, 0, 0, 0, 0), `New Nintendo 3DS` = c(0, 0, 0, 0, 0, 0, 0),
`Nintendo 3DS` = c(0, 0, 1, 0, 0, 0, 0), `Nintendo DS` = c(0,
0, 0, 0, 0, 0, 0), `Nintendo DSi` = c(0, 0, 0, 0, 0, 0, 1
), Ouya = c(0, 0, 0, 0, 0, 0, 0), PC = c(0, 0, 1, 1, 1, 0,
0), `PlayStation 3` = c(0, 1, 0, 0, 0, 1, 0), `PlayStation 4` = c(0,
0, 1, 0, 0, 0, 0), `PlayStation Portable` = c(0, 0, 0, 0,
0, 0, 0), `PlayStation Vita` = c(0, 0, 1, 0, 0, 0, 0), SteamOS = c(0,
0, 0, 0, 0, 0, 0), `Web Games` = c(0, 0, 0, 0, 0, 0, 0),
Wii = c(0, 0, 0, 0, 0, 0, 0), `Wii U` = c(0, 1, 1, 0, 0,
0, 0), `Windows Phone` = c(0, 0, 0, 0, 0, 0, 0), `Windows Surface` = c(0,
0, 0, 0, 0, 0, 0), `Xbox 360` = c(0, 1, 0, 0, 0, 1, 0), `Xbox One` = c(1,
0, 0, 0, 0, 0, 0)), .Names = c("title", "Android", "Arcade",
"iPad", "iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS",
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3",
"PlayStation 4", "PlayStation Portable", "PlayStation Vita",
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface",
"Xbox 360", "Xbox One"), row.names = c(NA, 7L), class = "data.frame")
dput(droplevels(head(ign_revised, 7)))
structure(list(X1 = c(18007L, 145L, 17730L, 17325L, 18475L, 17699L,
16486L), score_phrase = c("Good", "Bad", "Great", "Great", "Great",
"Good", "Mediocre"), title = c("#IDARB", "007 Legends", "1001 Spikes",
"140", "1979 Revolution", "2014 FIFA World Cup Brazil", "3 Heroes -- Crystal Soul"
), url = c("/games/it-draws-a-red-box/xbox-one-20014945", "/games/007-legends/xbox-360-132394",
"/games/1001-spikes/wii-u-132248", "/games/140-game/pc-20007190",
"/games/1979-the-game/pc-115360", "/games/2014-fifa-world-cup/ps3-20012688",
"/games/3-heroes-crystal-soul/dsi-126064"), platform = c("Xbox One",
"Xbox 360", "Wii U", "PC", "PC", "PlayStation 3", "Nintendo DSi"
), score = c(7.5, 4.5, 8, 8, 8, 7.5, 5), genre = c("Party", "Action",
"Platformer", "Platformer", "Action, Adventure", "Sports", "Adventure"
), editors_choice = c("N", "N", "N", "N", "N", "N", "N"), release_year = c(2015L,
2012L, 2014L, 2013L, 2016L, 2014L, 2012L), release_month = c(1L,
10L, 6L, 10L, 4L, 4L, 1L), release_day = c(14L, 16L, 8L, 16L,
21L, 17L, 5L), Android = c("Android", "Android", "Android", "Android",
"Android", "Android", "Android"), Arcade = c("Arcade", "Arcade",
"Arcade", "Arcade", "Arcade", "Arcade", "Arcade"), iPad = c("iPad",
"iPad", "iPad", "iPad", "iPad", "iPad", "iPad"), iPhone = c("iPhone",
"iPhone", "iPhone", "iPhone", "iPhone", "iPhone", "iPhone"),
Linux = c("Linux", "Linux", "Linux", "Linux", "Linux", "Linux",
"Linux"), Macintosh = c("Macintosh", "Macintosh", "Macintosh",
"Macintosh", "Macintosh", "Macintosh", "Macintosh"), `New Nintendo 3DS` = c("New Nintendo 3DS",
"New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS",
"New Nintendo 3DS", "New Nintendo 3DS", "New Nintendo 3DS"
), `Nintendo 3DS` = c("Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS",
"Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS", "Nintendo 3DS"
), `Nintendo DS` = c("Nintendo DS", "Nintendo DS", "Nintendo DS",
"Nintendo DS", "Nintendo DS", "Nintendo DS", "Nintendo DS"
), `Nintendo DSi` = c("Nintendo DSi", "Nintendo DSi", "Nintendo DSi",
"Nintendo DSi", "Nintendo DSi", "Nintendo DSi", "Nintendo DSi"
), Ouya = c("Ouya", "Ouya", "Ouya", "Ouya", "Ouya", "Ouya",
"Ouya"), PC = c("PC", "PC", "PC", "PC", "PC", "PC", "PC"),
`PlayStation 3` = c("PlayStation 3", "PlayStation 3", "PlayStation 3",
"PlayStation 3", "PlayStation 3", "PlayStation 3", "PlayStation 3"
), `PlayStation 4` = c("PlayStation 4", "PlayStation 4",
"PlayStation 4", "PlayStation 4", "PlayStation 4", "PlayStation 4",
"PlayStation 4"), `PlayStation Portable` = c("PlayStation Portable",
"PlayStation Portable", "PlayStation Portable", "PlayStation Portable",
"PlayStation Portable", "PlayStation Portable", "PlayStation Portable"
), `PlayStation Vita` = c("PlayStation Vita", "PlayStation Vita",
"PlayStation Vita", "PlayStation Vita", "PlayStation Vita",
"PlayStation Vita", "PlayStation Vita"), SteamOS = c("SteamOS",
"SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS", "SteamOS"
), `Web Games` = c("Web Games", "Web Games", "Web Games",
"Web Games", "Web Games", "Web Games", "Web Games"), Wii = c("Wii",
"Wii", "Wii", "Wii", "Wii", "Wii", "Wii"), `Wii U` = c("Wii U",
"Wii U", "Wii U", "Wii U", "Wii U", "Wii U", "Wii U"), `Windows Phone` = c("Windows Phone",
"Windows Phone", "Windows Phone", "Windows Phone", "Windows Phone",
"Windows Phone", "Windows Phone"), `Windows Surface` = c("Windows Surface",
"Windows Surface", "Windows Surface", "Windows Surface",
"Windows Surface", "Windows Surface", "Windows Surface"),
`Xbox 360` = c("Xbox 360", "Xbox 360", "Xbox 360", "Xbox 360",
"Xbox 360", "Xbox 360", "Xbox 360"), `Xbox One` = c("Xbox One",
"Xbox One", "Xbox One", "Xbox One", "Xbox One", "Xbox One",
"Xbox One")), .Names = c("X1", "score_phrase", "title", "url",
"platform", "score", "genre", "editors_choice", "release_year",
"release_month", "release_day", "Android", "Arcade", "iPad",
"iPhone", "Linux", "Macintosh", "New Nintendo 3DS", "Nintendo 3DS",
"Nintendo DS", "Nintendo DSi", "Ouya", "PC", "PlayStation 3",
"PlayStation 4", "PlayStation Portable", "PlayStation Vita",
"SteamOS", "Web Games", "Wii", "Wii U", "Windows Phone", "Windows Surface",
"Xbox 360", "Xbox One"), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))我还从两个df中检查了两个标题列的类型,因为它们都是“字符”。
typeof(ign_temp$title)
[1] "character"
> typeof(ign_revised$title)
[1] "character"@Gregor然而,合并似乎仍然不起作用。因为它是内部连接,所以我也尝试用"title“来指定,但是平台列在ign_revised中仍然保持不变。有什么建议吗?
merge(ign_revised[1:11], ign_temp_wide, by = "title")发布于 2017-02-19 21:27:59
我首先将您的ign_temp数据帧转换为宽格式,根据需要创建虚拟变量,然后加入到ign_revised数据。
使用此输入:
ign_temp = structure(list(title = c("LittleBigPlanet", "Splice", "NHL 13",
"NHL 13", "Wild", "Mark of the Ninja", "Mark of the Ninja"),
platform = c("Playstation Vita", "Playstation Vita", "Xbox",
"Android", "iPhone", "Xbox 360", "PC")), .Names = c("title",
"platform"), row.names = c(NA, -7L), class = c("data.frame"))
ign_temp$value = 1
ign_temp_wide = reshape2::dcast(title ~ platform, data = ign_temp,
value.var = "value", fill = 0)
ign_temp_wide
# title Android iPhone PC Playstation Vita Xbox Xbox 360
# 1 LittleBigPlanet 0 0 0 1 0 0
# 2 Mark of the Ninja 0 0 1 0 0 1
# 3 NHL 13 1 0 0 0 1 0
# 4 Splice 0 0 0 1 0 0
# 5 Wild 0 1 0 0 0 0那么连接就很简单了。这应该是可行的:
merge(ign_revised[1:11], ign_temp_wide)您只需要在非平台列 of ign_revised (我使用1:11,因为您说平台从第12列开始)和整个ign_temp_wide之间进行内部连接。base::merge可以工作,但您可以从How to join in R中选择您最喜欢的方法。如果连接有问题,请确保title是两个数据帧中的character类列。我还假设列名"title"在两个数据帧中是相同的。
https://stackoverflow.com/questions/42332416
复制相似问题