我的数据集如下所示(只是一个小的摘录):对于给定的主题(这里是subject=5),我在D1、D1-8h和D2-24小时进行了3次测试:
SUBJECT TIME TEST RESULT UNITS RANGES
591 5 D-1 Leukoyte count urine 1 /?L |-< 15|-
592 5 D-1 Erythrocyte count urine 0 /?L |-< 19|-
593 5 D-1 Glucose dipstick urine Normal None |+ from 50 mg/dL-|-
684 5 D1 8h Leukoyte count urine 0 /?L |-< 15|-
687 5 D1 8h Erythrocyte count urine 0 /?L |-< 19|-
683 5 D1 8h Glucose dipstick urine Normal None |+ from 50 mg/dL-|-
694 5 D2 24h Leukoyte count urine 1 /?L |-< 15|-
695 5 D2 24h Erythrocyte count urine 0 /?L |-< 19|-
696 5 D2 24h Glucose dipstick urine Normal None |+ from 50 mg/dL-|-我希望以以下形式在列设置的表中重新组织这些数据:
试验D-1 D1-8h D2-24小时单位范围
这样我就可以通过测试得到一条线。
我对“表”和“汇总”感到困惑,但我找不到合适的方法,尽管我确信这并不复杂.
能帮我个忙吗?
谢谢
以下是dput:
> dput(dataset)
structure(list(SUBJECT = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L
), TIME = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("D-1",
"D1 8h", "D2 24h", "D4 72h"), class = "factor"), TEST = structure(c(35L,
24L, 28L, 35L, 24L, 28L, 35L, 24L, 28L), .Label = c("", "Alkaline phosphatase",
"APTT", "Basophils", "Basophils (%)", "Calcium", "CD19", "CD19 abs.",
"CD3", "CD3 abs.", "CD4/CD8 ratio", "CD4+", "CD4+ abs.", "CD56",
"CD56 absolute", "CD8+", "CD8+ abs.", "Chloride", "CK (creatine kinase)",
"Creatinine", "Direct bilirubin (conjug)", "Eosinophils", "Eosinophils (%)",
"Erythrocyte count urine", "Erythrocyte dipstick urine", "Gamma GT",
"Glucose", "Glucose dipstick urine", "GOT (AST)", "GPT (ALT)",
"Hematocrit", "Hemoglobin", "Ketone bodies urine", "Leukocyte esterase urine",
"Leukoyte count urine", "Lymphocytes", "Lymphocytes (%)", "Monocytes",
"Monocytes (%)", "Neutrophils", "Neutrophils (%)", "pH urine",
"Platelet count", "Potassium", "Protein urine", "PT INR", "Red blood cell count",
"Reticulocytes", "Reticulocytes %", "Serum Albumine", "Sodium",
"Total bilirubin", "Total cholesterol", "Total protein", "Triglycerides",
"Urea", "Urine glucose quantitative", "Urine protein quantitative",
"White blood cell count"), class = "factor"), RESULT = c("1",
"0", "Normal", "0", "0", "Normal", "1", "0", "Normal"), UNITS = c("/?L",
"/?L", "None", "/?L", "/?L", "None", "/?L", "/?L", "None"), RANGES = c("|-< 15|-",
"|-< 19|-", "|+ from 50 mg/dL-|-", "|-< 15|-", "|-< 19|-", "|+ from 50 mg/dL-|-",
"|-< 15|-", "|-< 19|-", "|+ from 50 mg/dL-|-")), .Names = c("SUBJECT",
"TIME", "TEST", "RESULT", "UNITS", "RANGES"), row.names = c(591L,
592L, 593L, 684L, 687L, 683L, 694L, 695L, 696L), class = "data.frame")发布于 2019-02-23 18:20:26
是这个吗?如果是这样的话,我相信它应该被标记为R中由长到宽的数据整形的副本。
library(tidyverse)
spread(dataset, key = TIME, value = UNITS)
# SUBJECT TEST RESULT RANGES D-1 D1 8h D2 24h
#1 5 Erythrocyte count urine 0 |-< 19|- /?L /?L /?L
#2 5 Glucose dipstick urine Normal |+ from 50 mg/dL-|- None None None
#3 5 Leukoyte count urine 0 |-< 15|- <NA> /?L <NA>
#4 5 Leukoyte count urine 1 |-< 15|- /?L <NA> /?L编辑
在他的评论中,埃文纠正了上述观点。正确的解决办法是
spread(dataset, key = TIME, value = RESULT)
# SUBJECT TEST UNITS RANGES D-1 D1 8h D2 24h
#1 5 Erythrocyte count urine /?L |-< 19|- 0 0 0
#2 5 Glucose dipstick urine None |+ from 50 mg/dL-|- Normal Normal Normal
#3 5 Leukoyte count urine /?L |-< 15|- 1 0 1或者,如果OP希望重新排序列,请执行以下操作。
dataset %>%
spread(key = TIME, value = RESULT) %>%
select(SUBJECT,TEST, `D-1`:`D2 24h`, UNITS, RANGES)
# SUBJECT TEST D-1 D1 8h D2 24h UNITS RANGES
#1 5 Erythrocyte count urine 0 0 0 /?L |-< 19|-
#2 5 Glucose dipstick urine Normal Normal Normal None |+ from 50 mg/dL-|-
#3 5 Leukoyte count urine 1 0 1 /?L |-< 15|-https://stackoverflow.com/questions/54843788
复制相似问题