我正在寻找一种方法,从现有变量名中提取出部分信息,创建新变量来存储提取的信息,并根据新定义的变量将数据帧转换为长格式。处理心理治疗调查的回应。
使用R或SQL。
包含在退出变量名称中的信息:
每一次个人参与节目都是一集
主体=填写调查的个人(可以是参与者、母亲、父亲等)
类型=当前调查的名称(注:有些调查有以“_”分隔的其他识别信息)
实例=自入院或出院后的天数
description =问题号或该列特有的其他信息
目前,每一段信息都由"_“分隔。
这是格式:episode_subject_type_instance_description
## Have data currently in this format, but with almost 5000 variables
tibble(case_name = c("Joe", "Mary", "Jane"),
episode1_student_survey1_day0_Q1 = c(1, 2, 3),
episode1_student_survey1_day0_Q2 = c("A", "B", "C"))
# A tibble: 3 x 3
case_name episode1_student_survey1_day0_Q1 episode1_student_survey1_day0_Q2
<chr> <dbl> <chr>
1 Joe 1 A
2 Mary 2 B
3 Jane 3 C
## Want to transform to long like this:
tibble(case_name = c("Joe", "Joe", "Mary", "Mary", "Jane", "Jane"),
episode = "episode1",
subject = "student",
type = "survey1",
instance = "day0",
description = c("Q1", "Q1", "Q1", "Q2", "Q2", "Q2"),
value = c(1, 2, 3, "A", "B", "C"))
# A tibble: 6 x 7
case_name episode subject type instance description value
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Joe episode1 student survey1 day0 Q1 1
2 Joe episode1 student survey1 day0 Q1 2
3 Mary episode1 student survey1 day0 Q1 3
4 Mary episode1 student survey1 day0 Q2 A
5 Jane episode1 student survey1 day0 Q2 B
6 Jane episode1 student survey1 day0 Q2 C 我假设有办法一次把每一条信息都取出来,但不知道该怎么做。
感谢所有的帮助!!
发布于 2021-07-16 20:48:05
在R中,将列类型转换为'case_name‘以外的character,然后使用pivot_longer和separate name列将其重新定义为'long’格式
library(dplyr)
library(tidyr)
nm1 <- c("episode", "subject", "type", "instance", "description")
df1 %>%
mutate(across(-case_name, as.character)) %>%
pivot_longer(cols = -case_name) %>%
separate(name, into = nm1)-output
# A tibble: 6 x 7
case_name episode subject type instance description value
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Joe episode1 student survey1 day0 Q1 1
2 Joe episode1 student survey1 day0 Q2 A
3 Mary episode1 student survey1 day0 Q1 2
4 Mary episode1 student survey1 day0 Q2 B
5 Jane episode1 student survey1 day0 Q1 3
6 Jane episode1 student survey1 day0 Q2 C 数据
df1 <- tibble(case_name = c("Joe", "Mary", "Jane"),
episode1_student_survey1_day0_Q1 = c(1, 2, 3),
episode1_student_survey1_day0_Q2 = c("A", "B", "C"))发布于 2021-07-16 21:04:48
这里有一个解决方案,它使用melt()从wide格式到long格式进行整形,并使用新的measure()函数来拆分列名:
library(data.table) # development version 1.14.1 used
melt(setDT(df1), measure.vars = measure(episode, subject, type, instance, description,
sep = "_"))case\_name episode subject type instance description value 1: Joe episode1 student survey1 day0 Q1 1 2: Mary episode1 student survey1 day0 Q1 2 3: Jane episode1 student survey1 day0 Q1 3 4: Joe episode1 student survey1 day0 Q2 A 5: Mary episode1 student survey1 day0 Q2 B 6: Jane episode1 student survey1 day0 Q2 C
https://stackoverflow.com/questions/68415057
复制相似问题