文章/答案/技术大牛

发布

社区首页 >问答首页 >将列表转换为缺少某些列值的data.frame的更快捷方法

问将列表转换为缺少某些列值的data.frame的更快捷方法
EN

Stack Overflow用户

提问于 2017-02-01 14:23:16

回答 2查看 72关注 0票数 0

我有这张名单

> head(train)
[[1]]
[[1]]$Physics
[1] 8

[[1]]$Chemistry
[1] 7

[[1]]$PhysicalEducation
[1] 3

[[1]]$English
[1] 4

[[1]]$Mathematics
[1] 6

[[1]]$serial
[1] 195490

.
.
[[6]]
[[6]]$Physics
[1] 2

[[6]]$Chemistry
[1] 1

[[6]]$Biology
[1] 2

[[6]]$English
[1] 4

[[6]]$Mathematics
[1] 8

[[6]]$serial
[1] 182318

在这12个元素中，每个子列表都有任意5个元素，另外还有一个名为serial的元素。

columns <- c("Physics", "Chemistry", "PhysicalEducation", "English", 
             "Mathematics", "serial", "ComputerScience", "Hindi", "Biology", 
             "Economics", "Accountancy", "BusinessStudies")

我正在尝试把这个列表转换成数据帧。

目前，我正在使用这个for循环，一次迭代一行。虽然这是可行的，但它需要大量的时间。

colclass <- rep("numeric",12)
comby <- read.table(text = '', colClasses = colclass, col.names = columns)  
for(i in 1:length(train)){
    comby[i,names(train[[i]])] <- train[[i]]
}

我试过使用do.call(rbind, train)，但这不起作用，因为它从第一次迭代开始就一直在向旧列中添加新数据。

更好更快的方法是什么？我有大约150万次观测。

所需的o/p：数据帧应该有所有的列。我想要没有价值的地方。此外，我感兴趣的是，如果它可以做得更快，而不用任何额外的包。

 Physics Chemistry PhysicalEducation English Mathematics serial ComputerScience Hindi Biology Economics Accountancy
1       8         7                 3       4           6 195490              NA    NA      NA        NA          NA
2       1         1                 1       3           3 190869              NA    NA      NA        NA          NA
3       1         2                 2       1           2   3111              NA    NA      NA        NA          NA
4       8         7                 6       7           7  47738              NA    NA      NA        NA          NA
5       1         1                 1       3           2  85520              NA    NA      NA        NA          NA
6       2         1                NA       4           8 182318              NA    NA       2        NA          NA
  BusinessStudies
1              NA
2              NA
3              NA
4              NA
5              NA
6              NA

这是可复制的代码

train <- [{\"Physics\":8,\"Chemistry\":7,\"PhysicalEducation\":3,\"English\":4,\"Mathematics\":6,\"serial\":195490},{\"Physics\":1,\"Chemistry\":1,\"PhysicalEducation\":1,\"English\":3,\"Mathematics\":3,\"serial\":190869},{\"Physics\":1,\"Chemistry\":2,\"PhysicalEducation\":2,\"English\":1,\"Mathematics\":2,\"serial\":3111},{\"Physics\":8,\"Chemistry\":7,\"PhysicalEducation\":6,\"English\":7,\"Mathematics\":7,\"serial\":47738},{\"Physics\":1,\"Chemistry\":1,\"PhysicalEducation\":1,\"English\":3,\"Mathematics\":2,\"serial\":85520},{\"Physics\":2,\"Chemistry\":1,\"Biology\":2,\"English\":4,\"Mathematics\":8,\"serial\":182318},{\"Physics\":3,\"Chemistry\":4,\"PhysicalEducation\":5,\"English\":5,\"Mathematics\":8,\"serial\":77482},{\"Accountancy\":2,\"BusinessStudies\":5,\"Economics\":3,\"English\":6,\"Mathematics\":7,\"serial\":152940},{\"Physics\":5,\"Chemistry\":6,\"Biology\":7,\"English\":3,\"Mathematics\":8,\"serial\":132620}]
train <- rjson::fromJSON(train)

data-manipulation

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-02-01 15:34:19

作为起点，您可以使用purrr::map，如下所示：

样本数据集：

x <- list(list(physics=8,
               Chemistry=7,
               PhysicalEducation=3,
               English=4,
               serial=195490),
          list(physics=2,
               Chemistry=1,
               Biology=2,
               English=4,
               Mathematics=8,
               serial=182318))

Sol.1最短以避免循环

zzz <- sapply(columns, function(n) map_dbl(x,n,.null=NA) ) %>% 
        data.frame()

这意味着：

> zzz
  Physics Chemistry PhysicalEducation English Mathematics serial ComputerScience Hindi Biology Economics
1      NA         7                 3       4          NA 195490              NA    NA      NA        NA
2      NA         1                NA       4           8 182318              NA    NA       2        NA
  Accountancy BusinessStudies
1          NA              NA
2          NA              NA

如果您想了解这是如何工作的，您可以检查下面的较长的解决方案。

Sol.2手动分配

-pick每列的值：

z <- data.frame(
    serial = map_dbl(x,"serial",.null=NA),
    Biology = map_dbl(x,"Biology",.null=NA),
    Chemistry = map_dbl(x,"Chemistry",.null=NA)
        )

这意味着：

> z
  serial Biology Chemistry
1 195490      NA         7
2 182318       2         1
>

Sol.3预定义数据帧和for-循环

创建一个固定大小的数据文件 zz <- data.frame(matrix(NA, nrow = length(x), ncol = 12))
指定姓名 names(zz) <- columns
从列表中赋值 for(i in 1:ncol(zz)){ zz[columns[i]] <- map_dbl(x,columns[i],.null=NA) }

这意味着：

> zz
  Physics Chemistry PhysicalEducation English Mathematics serial ComputerScience Hindi Biology Economics
1      NA         7                 3       4          NA 195490              NA    NA      NA        NA
2      NA         1                NA       4           8 182318              NA    NA       2        NA
  Accountancy BusinessStudies
1          NA              NA
2          NA              NA

票数 1

Stack Overflow用户

发布于 2017-02-01 15:08:34

通过将Reduce和Map结合起来，您可以在基本R中完成这一任务。

数据

以下是与您的结构相匹配的数据集。

set.seed(1234)
temp <- replicate(7, setNames(replicate(7, sample(1:10, 1), simplify=FALSE), letters[1:7]),
                  simplify=FALSE)

若要从中生成data.frame，可以使用

Reduce(rbind, Map(data.frame, temp))
  a b c  d e f  g
1 2 7 7  7 9 7  1
2 3 7 6  7 6 3 10
3 3 9 3  3 2 3  4
4 4 2 1  3 9 6 10
5 9 1 5  3 4 6  2
6 8 3 3 10 9 6  7
7 4 7 4  6 7 5  3

其中，data.frame用内部元素构造data.frames。Map将其应用于外部列表的每个元素，从而生成一个data.frames列表。最后，Reduce rbind在列表中的data.frames中生成一个data.frame。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41982086

复制

相似问题

问将列表转换为缺少某些列值的data.frame的更快捷方法
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将列表转换为缺少某些列值的data.frame的更快捷方法EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将列表转换为缺少某些列值的data.frame的更快捷方法
EN