文章/答案/技术大牛

发布

社区首页 >问答首页 >用给定变量重新组织数据

问用给定变量重新组织数据
EN

Stack Overflow用户

提问于 2019-02-23 16:46:30

回答 1查看 67关注 0票数 0

我的数据集如下所示(只是一个小的摘录)：对于给定的主题(这里是subject=5)，我在D1、D1-8h和D2-24小时进行了3次测试：

    SUBJECT   TIME                    TEST RESULT UNITS              RANGES
591       5    D-1    Leukoyte count urine      1   /?L            |-< 15|-
592       5    D-1 Erythrocyte count urine      0   /?L            |-< 19|-
593       5    D-1  Glucose dipstick urine Normal  None |+ from 50 mg/dL-|-
684       5  D1 8h    Leukoyte count urine      0   /?L            |-< 15|-
687       5  D1 8h Erythrocyte count urine      0   /?L            |-< 19|-
683       5  D1 8h  Glucose dipstick urine Normal  None |+ from 50 mg/dL-|-
694       5 D2 24h    Leukoyte count urine      1   /?L            |-< 15|-
695       5 D2 24h Erythrocyte count urine      0   /?L            |-< 19|-
696       5 D2 24h  Glucose dipstick urine Normal  None |+ from 50 mg/dL-|-

我希望以以下形式在列设置的表中重新组织这些数据：

试验D-1 D1-8h D2-24小时单位范围

这样我就可以通过测试得到一条线。

我对“表”和“汇总”感到困惑，但我找不到合适的方法，尽管我确信这并不复杂.

能帮我个忙吗？

谢谢

以下是dput：

> dput(dataset)
structure(list(SUBJECT = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L
), TIME = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("D-1", 
"D1 8h", "D2 24h", "D4 72h"), class = "factor"), TEST = structure(c(35L, 
24L, 28L, 35L, 24L, 28L, 35L, 24L, 28L), .Label = c("", "Alkaline phosphatase", 
"APTT", "Basophils", "Basophils (%)", "Calcium", "CD19", "CD19 abs.", 
"CD3", "CD3 abs.", "CD4/CD8 ratio", "CD4+", "CD4+ abs.", "CD56", 
"CD56 absolute", "CD8+", "CD8+ abs.", "Chloride", "CK (creatine kinase)", 
"Creatinine", "Direct bilirubin (conjug)", "Eosinophils", "Eosinophils (%)", 
"Erythrocyte count urine", "Erythrocyte dipstick urine", "Gamma GT", 
"Glucose", "Glucose dipstick urine", "GOT (AST)", "GPT (ALT)", 
"Hematocrit", "Hemoglobin", "Ketone bodies urine", "Leukocyte esterase urine", 
"Leukoyte count urine", "Lymphocytes", "Lymphocytes (%)", "Monocytes", 
"Monocytes (%)", "Neutrophils", "Neutrophils (%)", "pH urine", 
"Platelet count", "Potassium", "Protein urine", "PT INR", "Red blood cell count", 
"Reticulocytes", "Reticulocytes %", "Serum  Albumine", "Sodium", 
"Total bilirubin", "Total cholesterol", "Total protein", "Triglycerides", 
"Urea", "Urine glucose quantitative", "Urine protein quantitative", 
"White blood cell count"), class = "factor"), RESULT = c("1", 
"0", "Normal", "0", "0", "Normal", "1", "0", "Normal"), UNITS = c("/?L", 
"/?L", "None", "/?L", "/?L", "None", "/?L", "/?L", "None"), RANGES = c("|-< 15|-", 
"|-< 19|-", "|+ from 50 mg/dL-|-", "|-< 15|-", "|-< 19|-", "|+ from 50 mg/dL-|-", 
"|-< 15|-", "|-< 19|-", "|+ from 50 mg/dL-|-")), .Names = c("SUBJECT", 
"TIME", "TEST", "RESULT", "UNITS", "RANGES"), row.names = c(591L, 
592L, 593L, 684L, 687L, 683L, 694L, 695L, 696L), class = "data.frame")

回答 1

Stack Overflow用户

发布于 2019-02-23 18:20:26

是这个吗？如果是这样的话，我相信它应该被标记为R中由长到宽的数据整形的副本。

library(tidyverse)

spread(dataset, key = TIME, value = UNITS)
#  SUBJECT                    TEST RESULT              RANGES  D-1 D1 8h D2 24h
#1       5 Erythrocyte count urine      0            |-< 19|-  /?L   /?L    /?L
#2       5  Glucose dipstick urine Normal |+ from 50 mg/dL-|- None  None   None
#3       5    Leukoyte count urine      0            |-< 15|- <NA>   /?L   <NA>
#4       5    Leukoyte count urine      1            |-< 15|-  /?L  <NA>    /?L

编辑

在他的评论中，埃文纠正了上述观点。正确的解决办法是

spread(dataset, key = TIME, value = RESULT)
#  SUBJECT                    TEST UNITS              RANGES    D-1  D1 8h D2 24h
#1       5 Erythrocyte count urine   /?L            |-< 19|-      0      0      0
#2       5  Glucose dipstick urine  None |+ from 50 mg/dL-|- Normal Normal Normal
#3       5    Leukoyte count urine   /?L            |-< 15|-      1      0      1

或者，如果OP希望重新排序列，请执行以下操作。

dataset %>%
  spread(key = TIME, value = RESULT) %>%
  select(SUBJECT,TEST, `D-1`:`D2 24h`, UNITS, RANGES)
#  SUBJECT                    TEST    D-1  D1 8h D2 24h UNITS              RANGES
#1       5 Erythrocyte count urine      0      0      0   /?L            |-< 19|-
#2       5  Glucose dipstick urine Normal Normal Normal  None |+ from 50 mg/dL-|-
#3       5    Leukoyte count urine      1      0      1   /?L            |-< 15|-

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54843788

复制

相似问题

问用给定变量重新组织数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用给定变量重新组织数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用给定变量重新组织数据
EN