首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >转换列类/类型时维护date/POSIXct列

转换列类/类型时维护date/POSIXct列
EN

Stack Overflow用户
提问于 2018-04-27 00:21:16
回答 1查看 254关注 0票数 0

我有一个包含多个日期列的400列数据格式。在下面的有代表性的例子中,我想实现以下目标:

  1. 将因素转换为数字或字符或POSIXct
  2. 将字符转换为数字/整数,在其中有意义
  3. 将包含日期的任何列转换为POSIXct,而不管它是因子、字符还是日期。 set.seed(123) df1 <- data.frame( A= as.numeric(1:10),B=样本(as.POSIXct(‘2000/01/01’),as.POSIXct('2018/01/01'),by=“日”),C=as.numeric(样本(20:90,大小= 10)),D=样本(“是”,“否”,size=10,替换为真),E=as.factor(样本(1000:2000,大小= 10)),F=as.character(c(“测试”、"test2“、"test3”、"test4“、"test5”、"test6“、"test7”、"test8“、"test9”、“test10”)、G=as.factor(“测试”、"test2“、"test3”、"test4“、"test5”、"test6“、"test7”、"test8“、”“),H= as.character(sample(seq(as.POSIXct('2000/01/01'),as.POSIXct('2018/01/01'),by=“day”,size=10)( stringsAsFactors=FALSE ) df1 A、C、D、E、G、H 1、2005-03-06 00:00:00 87无1963年测试测试2002-07-27 23:00:00 2 2 2014-03-00:00 51无1902 test2 test2 2007-06 23:00:00 3 3 2007-05-11 23:00:00 66 no 1690 test3 test3 2007-06-11 : 00:00:00004 2015-11-22 00: 00 :00 58不1793 test4 test4 2006-20 23:00:00 5 2016-12-02 :00:00 26无1024 test5 test5 2002-09-27 :00:00 6 6 2000-10-26 00:00 79无1475 test6 test6 2002-06-30 23:00:00 7 2009-06-30 :00:00 35 no 1754 test7 test7 2004-11 00:00:00 8 2016-0119 :00:00 22 no 1215 test8 test8 2008-05-17 23:00 9 9 2009-11-30 00:00 40 yes 1315 test9 test9 2004- 10 -12 00:00 10 10 2008-03-17 00:00:00 85 yes 1229 test10 test10 2015-06-03 23:00( B1 B2 C、D、E、F、G、H、“数字”、"POSIXct“、"POSIXt”、“数字”、“因素”、“字符”、“因素”、“字符”)

到目前为止,我尝试了以下方法(但它没有保留POSIXct列B)或将字符日期列(H列)转换为POSIXct:

代码语言:javascript
复制
df1_clean <- df1 %>% mutate_all(funs(type.convert(as.character(.), as.is = TRUE)))
unlist(lapply(df1_clean, class))
      A           B           C           D           E           F            G           H 
      "integer" "character"   "integer" "character"   "integer" "character" "character" "character" 

对于这个小数据集,我只需调用列,并使用lubridate将B和H转换为POSIXct,但我希望它能够自动跨越数据帧。

任何帮助都将不胜感激!谢谢莫伊

EN

回答 1

Stack Overflow用户

发布于 2018-04-27 04:29:01

这可能不是最优雅的方式--但它似乎适合我。

代码语言:javascript
复制
#install.packages("tidyverse")
#install.packages("dataCompareR")
library("tidyverse")
library("dataCompareR")



# create reproducible df
set.seed(123)
df1 <- data.frame(
  A = as.numeric(1:10),
  B = sample(seq(as.POSIXct('2000/01/01', tz = "UTC"), as.POSIXct('2018/01/01', tz = "UTC"), by="day"), size=10),
  C = as.numeric(sample(20:90, size = 10)),
  D = sample(c("yes", "no"), size=10, replace = TRUE),
  E = as.factor(sample(1000:2000, size = 10)),
  F = as.character(c("test","test2","test3","test4","test5","test6","test7","test8","test9","test10")),
  G = as.factor(c("test","test2","test3","test4","test5","test6","test7","test8","test9","test10")),
  H = as.character(sample(seq(as.POSIXct('2000/01/01', tz = "UTC"), as.POSIXct('2018/01/01', tz = "UTC"), by="day"), size=10)),stringsAsFactors=FALSE
)
df1 #look at df

unlist(lapply(df1, class)) #look at df classes


df1_clean <- df1 %>% mutate_all(funs(type.convert(as.character(.), as.is = TRUE))) #reassign classes by running type.convert (input are all variables from the df but as.character)
unlist(lapply(df1_clean, class)) #look at df classes now

#check if a column is a Date - https://stackoverflow.com/questions/18178451/is-there-a-way-to-check-if-a-column-is-a-date-in-r
tmp=sapply(df1_clean, function(x) !all(is.na(as.Date(as.character(x),format="%Y-%m-%d", tz = "UTC")))) 

# if tmp is True, change according column to as.POSIXct 
for (i in 1:ncol(df1_clean)){
  if (tmp[i] == T){
    df1_clean[,i]<- as.POSIXct(df1_clean[,i], tz = "UTC")
  }
}

df1_clean #look at df
unlist(lapply(df1_clean, class)) #look at df classes


comp <- rCompare(df1, df1_clean) #compare your dfs before and after using the dataCompareR package
summary(comp) # check summary
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50053578

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档