我使用R,我有一组天气读数,分成四列,用逗号分隔,如下所示:
pollutant air_quality_idx air_quality_cat air_quality_cat_idx
PM2.5,PM10,OZONE 28,6,24 Good,Good,Good 1,1,1
PM2.5,PM10,OZONE 28,5,25 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 26,23,4 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 26,23,3 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 27,22,3 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 27,24,2 Good,Good,Good 1,1,1
PM2.5,PM10,OZONE 50,4,27 Good,Good,Good 1,1,1
PM2.5,PM10,OZONE 54,4,22 Moderate,Good,Good 2,1,1
PM2.5,PM10,OZONE 56,5,22 Moderate,Good,Good 2,1,1
PM2.5,PM10,OZONE 60,5,28 Moderate,Good,Good 2,1,1三种读数(臭氧、10微米、2.5微米)的顺序随机变化。我想将这4列用逗号分隔的3个值分成12列,并将它们分组在一起,这样四列中的第一个值将放在一起,第二个值放在一起,第三个值放在一起。有谁有什么想法吗?谢谢!
发布于 2018-02-09 15:46:31
我的"splitstackshape“包非常适合这种转换:
library(splitstackshape)
cSplit(mydf, names(mydf), direction = "long")
# pollutant air_quality_idx air_quality_cat air_quality_cat_idx
# 1: PM2.5 28 Good 1
# 2: PM10 6 Good 1
# 3: OZONE 24 Good 1
# 4: PM2.5 28 Good 1
# 5: PM10 5 Good 1
# ---
# 26: PM10 5 Good 1
# 27: OZONE 22 Good 1
# 28: PM2.5 60 Moderate 2
# 29: PM10 5 Good 1
# 30: OZONE 28 Good 1在base R中,您还可以执行以下操作:
data.frame(lapply(mydf, function(x)
type.convert(unlist(strsplit(as.character(x), ",", TRUE)))))或者,使用"tidyverse",您可以尝试:
library(tidyverse)
mydf %>%
mutate_all(funs(strsplit(as.character(.), ","))) %>%
unnest()发布于 2018-02-09 02:10:32
这里有一个解决方案,尽管可能不是最优雅的。它依赖于使用tidyr::separate将每列拆分为三列,然后使用dplyr::select简单地将包含12列的表拆分为三个包含4列的表,为它们提供相同的列名,并使用bind_rows将它们堆叠在一起。
library(tidyverse)
tbl <- read_table2(
"pollutant air_quality_idx air_quality_cat air_quality_cat_idx
PM2.5,PM10,OZONE 28,6,24 Good,Good,Good 1,1,1
PM2.5,PM10,OZONE 28,5,25 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 26,23,4 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 26,23,3 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 27,22,3 Good,Good,Good 1,1,1
OZONE,PM2.5,PM10 27,24,2 Good,Good,Good 1,1,1
PM2.5,PM10,OZONE 50,4,27 Good,Good,Good 1,1,1
PM2.5,PM10,OZONE 54,4,22 Moderate,Good,Good 2,1,1
PM2.5,PM10,OZONE 56,5,22 Moderate,Good,Good 2,1,1
PM2.5,PM10,OZONE 60,5,28 Moderate,Good,Good 2,1,1",
col_types = "cccc"
)
separated <- tbl %>%
separate(pollutant, c("pol1", "pol2", "pol3"), sep = ",") %>%
separate(air_quality_idx, c("aqi1", "aqi2", "aqi3"), sep = ",") %>%
separate(air_quality_cat, c("aqc1", "aqc2", "aqc3"), sep = ",") %>%
separate(air_quality_cat_idx, c("aci1", "aci2", "aci3"), sep = ",")
output <- bind_rows(
separated %>%
select(ends_with("1")) %>%
set_names(c("pol", "aqi", "aqc", "aci")),
separated %>%
select(ends_with("2")) %>%
set_names(c("pol", "aqi", "aqc", "aci")),
separated %>%
select(ends_with("3")) %>%
set_names(c("pol", "aqi", "aqc", "aci"))
)
# A tibble: 30 x 4
pollution aq_idx aq_cat aq_cat_idx
<chr> <chr> <chr> <chr>
1 PM2.5 28 Good 1
2 PM2.5 28 Good 1
3 OZONE 26 Good 1
4 OZONE 26 Good 1
5 OZONE 27 Good 1
6 OZONE 27 Good 1
7 PM2.5 50 Good 1
8 PM2.5 54 Moderate 2
9 PM2.5 56 Moderate 2
10 PM2.5 60 Moderate 2
# ... with 20 more rowshttps://stackoverflow.com/questions/48684556
复制相似问题