首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >单列dcast摘要

单列dcast摘要
EN

Stack Overflow用户
提问于 2017-03-22 23:06:58
回答 3查看 650关注 0票数 0

我想将我的数据转到数据中心,这样我就可以使用dcast获得平均存活率,但似乎不可能:

数据

代码语言:javascript
复制
PassengerId Survived    Pclass  Name    Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked
1   0   3   Braund, Mr. Owen Harris male    22  1   0   A/5 21171   7.25        S
2   1   1   Cumings, Mrs. John Bradley (Florence Briggs Thayer) female  38  1   0   PC 17599    71.2833 C85 C
3   1   3   Heikkinen, Miss. Laina  female  26  0   0   STON/O2. 3101282    7.925       S

样本数据代码:

代码语言:javascript
复制
df <- structure(list(PassengerId = 1:6, Survived = structure(c(1L, 
                                                                  2L, 2L, 2L, 1L, 1L), .Label = c("0", "1"), class = "factor"), 
                        Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Name = c("Braund, Mr. Owen Harris", 
                                                                     "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", "Heikkinen, Miss. Laina", 
                                                                     "Futrelle, Mrs. Jacques Heath (Lily May Peel)", "Allen, Mr. William Henry", 
                                                                     "Moran, Mr. James"), Sex = c("male", "female", "female", 
                                                                                                  "female", "male", "male"), Age = c(22, 38, 26, 35, 35, NA
                                                                                                  ), SibSp = c(1L, 1L, 0L, 1L, 0L, 0L), Parch = c(0L, 0L, 0L, 
                                                                                                                                                  0L, 0L, 0L), Ticket = c("A/5 21171", "PC 17599", "STON/O2. 3101282", 
                                                                                                                                                                          "113803", "373450", "330877"), Fare = c(7.25, 71.2833, 7.925, 
                                                                                                                                                                                                                  53.1, 8.05, 8.4583), Cabin = c("", "C85", "", "C123", "", 
                                                                                                                                                                                                                                                 ""), Embarked = c("S", "C", "S", "S", "S", "Q")), .Names = c("PassengerId", 
                                                                                                                                                                                                                                                                                                              "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", 
                                                                                                                                                                                                                                                                                                              "Ticket", "Fare", "Cabin", "Embarked"), row.names = c(NA, 6L), class = "data.frame")

迄今的职能:

代码语言:javascript
复制
reshape2::dcast(titanic, Sex ~ ., mean)

期望产出:

代码语言:javascript
复制
Row Label  Average of Survived 
Male       3.14156  
Female     3.14156

当前,它返回以下错误:

代码语言:javascript
复制
     Sex  .
1 female NA
2   male NA
Warning messages:
1: In mean.default(.value[0], ...) :
  argument is not numeric or logical: returning NA

我认为这可能与重塑中的转换函数有关,但这是否可能与reshape2有关?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-03-24 18:06:21

因此,您确实可以使用dcast,但是幸存下来是一个因素,它会抛出一个错误,您需要定义要使用哪个列作为值来计算。结果,列顺序也不重要,这是令人惊讶的。

代码语言:javascript
复制
df$Survived <- as.numeric(as.character(df$Survived))
reshape2::dcast(df, Sex~., mean, value.var = "Survived")
#     Sex .
#1 female 1
#2   male 0
票数 2
EN

Stack Overflow用户

发布于 2017-03-22 23:31:59

dplyr试试怎么样?

代码语言:javascript
复制
library(dplyr)
output <-  df  %>% 
  dplyr::mutate(Survived = as.numeric(as.character(Survived))) %>%  
  dplyr::select(Sex, Survived) %>% 
  dplyr::group_by(Sex) %>% 
  dplyr::summarise(average_of_survived = mean(Survived))
output
## A tibble: 2 × 2
#     Sex average_of_survived
#   <chr>               <dbl>
#1 female                   1
#2   male                   0
票数 2
EN

Stack Overflow用户

发布于 2017-03-24 18:24:43

这可以用来自dcast()reshape2 (或data.table)包来完成,如OP's own answer所示。

没有dcast(),您也可以直接使用data.table进行聚合:

代码语言:javascript
复制
library(data.table)
setDT(df)[, Survived := as.numeric(as.character(Survived))][, mean(Survived), by = Sex]
#      Sex V1
#1:   male  0
#2: female  1

在Q中,dfdput()给出的,链是用来形成“一行”的。

上面的一个更简洁的版本是

代码语言:javascript
复制
setDT(df)[, mean(as.numeric(as.character(Survived))), by = Sex]
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42964309

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档