首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用purrr:map()从不规则列表中提取数据

使用purrr:map()从不规则列表中提取数据
EN

Stack Overflow用户
提问于 2019-08-15 20:02:37
回答 4查看 506关注 0票数 4

给定一个包含多个元素的列表,目标是将它们放入数据框架中。purr包中的map_df函数对于常规列表非常有用,但对于不规则列表则会出现错误。

例如,遵循教程,可以完成以下工作:

代码语言:javascript
复制
library(purrr)
library(repurrrsive) # The data comes from this package


map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))

 A tibble: 30 x 6
   name               culture  gender    id born                                   alive
   <chr>              <chr>    <chr>  <int> <chr>                                  <lgl>
 1 Theon Greyjoy      Ironborn Male    1022 In 278 AC or 279 AC, at Pyke           TRUE 
 2 Tyrion Lannister   ""       Male    1052 In 273 AC, at Casterly Rock            TRUE 
 3 Victarion Greyjoy  Ironborn Male    1074 In 268 AC or before, at Pyke           TRUE 
 4 Will               ""       Male    1109 ""                                     FALSE
 5 Areo Hotah         Norvoshi Male    1166 In 257 AC or before, at Norvos         TRUE 
 6 Chett              ""       Male    1267 At Hag's Mire                          FALSE
 7 Cressen            ""       Male    1295 In 219 AC or 220 AC                    FALSE
 8 Arianne Martell    Dornish  Female   130 In 276 AC, at Sunspear                 TRUE 
 9 Daenerys Targaryen Valyrian Female  1303 In 284 AC, at Dragonstone              TRUE 
10 Davos Seaworth     Westeros Male    1319 In 260 AC or before, at King's Landing TRUE 
# … with 20 more rows

但是,如果从列表中删除元素,则该函数将失败。

代码语言:javascript
复制
got_chars[[1]]["gender"]<-NULL
map_dfr(got_chars, magrittr::extract, c("name", "culture", "gender", "id", "born", "alive"))

#Error: Argument 3 is a list, must contain atomic vectors

所需的输出将是缺失元素的NA值。什么是优雅的解决方案?我怀疑这个解决方案包括使用purrr:possibly(),但我还没有弄清楚。

EN

回答 4

Stack Overflow用户

发布于 2019-08-15 20:41:16

一种方法是定义一个partial()ly指定的pluck(),它提取感兴趣的名称,如果缺少,返回NA。将修改后的pluck()传递给双映射,内部映射遍历要提取的名称,外部映射遍历got_chars列表:

代码语言:javascript
复制
v <- set_names(c("name", "culture", "gender", "id", "born", "alive"))
map_dfr( got_chars, ~map(v, partial(pluck, .x, .default=NA)) )
# # A tibble: 30 x 6
#    name             culture  gender    id born                             alive
#    <chr>            <chr>    <chr>  <int> <chr>                            <lgl>
#  1 Theon Greyjoy    Ironborn NA      1022 In 278 AC or 279 AC, at Pyke     TRUE 
#  2 Tyrion Lannister ""       Male    1052 In 273 AC, at Casterly Rock      TRUE 
#  3 Victarion Greyj… Ironborn Male    1074 In 268 AC or before, at Pyke     TRUE 
#  4 Will             ""       Male    1109 ""                               FALSE
#  5 Areo Hotah       Norvoshi Male    1166 In 257 AC or before, at Norvos   TRUE 
#  6 Chett            ""       Male    1267 At Hag's Mire                    FALSE
#  7 Cressen          ""       Male    1295 In 219 AC or 220 AC              FALSE
#  8 Arianne Martell  Dornish  Female   130 In 276 AC, at Sunspear           TRUE 
#  9 Daenerys Targar… Valyrian Female  1303 In 284 AC, at Dragonstone        TRUE 
# 10 Davos Seaworth   Westeros Male    1319 In 260 AC or before, at King's … TRUE 
# # … with 20 more rows

为了澄清,.xgot_chars上迭代,因为它存在于用~指定的lambda函数中,因此它对应于外部map。内部map的函数是用partial()指定的,它将当前查看的got_chars元素(即.x)作为pluck()的第一个参数。修改后的pluck()接受要提取的名称作为其(新的)第一个参数,因此它可以按原样传递到内部映射,而不需要任何额外的~

票数 3
EN

Stack Overflow用户

发布于 2019-08-16 10:35:21

一个固有的问题是[ (或其别名magrittr::extract)在缺少我们试图提取的元素时的行为:

代码语言:javascript
复制
list(a = 1)["b"]
# $<NA>
# NULL

magrittr::extract(list(a = 1), "b")
# $<NA>
# NULL

我们可以界定:

代码语言:javascript
复制
extract_if_present <- function(x, y) {
  x[intersect(y, names(x))]
}

表现为:

代码语言:javascript
复制
extract_if_present(list(a = 1), "b")
# named list()

然后用缺少的元素进行行绑定“只起作用”:

代码语言:javascript
复制
map_dfr(
  got_chars_mutilated,
  extract_if_present,
  c("name", "culture", "gender", "id", "born", "alive")
)
# # A tibble: 30 x 6
#    name               culture     id born                                   alive gender
#    <chr>              <chr>    <int> <chr>                                  <lgl> <chr> 
#  1 Theon Greyjoy      Ironborn  1022 In 278 AC or 279 AC, at Pyke           TRUE  NA    
#  2 Tyrion Lannister   ""        1052 In 273 AC, at Casterly Rock            TRUE  Male  
#  3 Victarion Greyjoy  Ironborn  1074 In 268 AC or before, at Pyke           TRUE  Male  
#  4 Will               ""        1109 ""                                     FALSE Male  
#  5 Areo Hotah         Norvoshi  1166 In 257 AC or before, at Norvos         TRUE  Male  
#  6 Chett              ""        1267 At Hag's Mire                          FALSE Male  
#  7 Cressen            ""        1295 In 219 AC or 220 AC                    FALSE Male  
#  8 Arianne Martell    Dornish    130 In 276 AC, at Sunspear                 TRUE  Female
#  9 Daenerys Targaryen Valyrian  1303 In 284 AC, at Dragonstone              TRUE  Female
# 10 Davos Seaworth     Westeros  1319 In 260 AC or before, at King's Landing TRUE  Male  
# # … with 20 more rows

列的顺序有点混乱,取决于行的顺序和它们遗漏的内容。

票数 3
EN

Stack Overflow用户

发布于 2021-09-15 23:00:01

喜欢那个教程!在本教程的末尾,作者说:

在编程时,以通常的方式显式指定类型和构建数据框架是更安全的,但更麻烦。

您可以使用更详细的方法将默认值设置为NA。

代码语言:javascript
复制
got_chars %>% {
  tibble(
    name = map_chr(., "name"),
    culture = map_chr(., "culture"),
    gender = map_chr(., "gender", .default = NA),
    id = map_chr(., "id"),
    born = map_chr(., "born"),
    alive = map_chr(., "alive")
  )
}
# # A tibble: 30 x 6
# name               culture    gender id    born                                     alive
# <chr>              <chr>      <chr>  <chr> <chr>                                    <chr>
#   1 Theon Greyjoy      "Ironborn" NA     1022  "In 278 AC or 279 AC, at Pyke"           TRUE 
# 2 Tyrion Lannister   ""         Male   1052  "In 273 AC, at Casterly Rock"            TRUE 
# 3 Victarion Greyjoy  "Ironborn" Male   1074  "In 268 AC or before, at Pyke"           TRUE 
# 4 Will               ""         Male   1109  ""                                       FALSE
# 5 Areo Hotah         "Norvoshi" Male   1166  "In 257 AC or before, at Norvos"         TRUE 
# 6 Chett              ""         Male   1267  "At Hag's Mire"                          FALSE
# 7 Cressen            ""         Male   1295  "In 219 AC or 220 AC"                    FALSE
# 8 Arianne Martell    "Dornish"  Female 130   "In 276 AC, at Sunspear"                 TRUE 
# 9 Daenerys Targaryen "Valyrian" Female 1303  "In 284 AC, at Dragonstone"              TRUE 
# 10 Davos Seaworth     "Westeros" Male   1319  "In 260 AC or before, at King's Landing" TRUE 
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57515535

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档