我正试着做我的第一个数据顶点项目。我导入了csv文件,并将它们添加到一个没有问题的框架中。我能够删除一些列通过使用。
all_trip <- all_trip %>%
select(-c(start_lat, start_lng, end_lat, end_lng))我用的是过去12个月的不同数据。https://divvy-tripdata.s3.amazonaws.com/index.html编辑*我试图分离列,但有人建议使用difftime。现在我加了这个。
all_trips$ride_length <- as.difftime(all_trips$ended_at, all_trips$started_at, units = "mins")它创建了新的专栏,但我得到了NA。我怀疑这是因为数据类型是chr。还没有找到一种方法来更改数据类型而不丢失数据。还在找。任何帮助都是非常感谢的。
我被要求编辑和添加dput(head(add_trips))
dput(head(all_trips))
structure(list(ride_id = c("3564070EEFD12711", "0B820C7FCF22F489",
"89EEEE32293F07FF", "84D4751AEB31888D", "5664BCF0D1DE7A8B", "AA9EB7BD2E1FC128"
), rideable_type = c("electric_bike", "classic_bike", "classic_bike",
"classic_bike", "electric_bike", "classic_bike"), started_at = c("4/6/2022 17:42",
"4/24/2022 19:23", "4/20/2022 19:29", "4/22/2022 21:14", "4/16/2022 15:56",
"4/21/2022 16:52"), ended_at = c("4/6/2022 17:54", "4/24/2022 19:43",
"4/20/2022 19:35", "4/22/2022 21:23", "4/16/2022 16:02", "4/21/2022 16:56"
), start_station_name = c("Paulina St & Howard St", "Wentworth Ave & Cermak Rd",
"Halsted St & Polk St", "Wentworth Ave & Cermak Rd", "Halsted St & Polk St",
"Desplaines St & Randolph St"), start_station_id = c("515", "13075",
"TA1307000121", "13075", "TA1307000121", "15535"), end_station_name = c("University Library (NU)",
"Green St & Madison St", "Green St & Madison St", "Delano Ct & Roosevelt Rd",
"Clinton St & Madison St", "Canal St & Adams St"), end_station_id = c("605",
"TA1307000120", "TA1307000120", "KA1706005007", "TA1305000032",
"13011"), member_casual = c("member", "member", "member", "casual",
"member", "member"), ride_length = structure(c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = "difftime", units = "mins")), row.names = c(NA,
6L), class = "data.frame")发布于 2022-05-21 19:16:36
听起来这是一个问题,把started_at和ended_at数据转换成正确的格式。as.POSIXlt()函数可以在这里提供帮助:
all_trips$started_at <- as.POSIXlt(all_trips$started_at, format = "%m/%d/%Y %H:%M", tz="EST")
all_trips$ended_at <- as.POSIXlt(all_trips$ended_at, format = "%m/%d/%Y %H:%M", tz="EST")
all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at)https://stackoverflow.com/questions/72325692
复制相似问题