首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于多个列值创建具有连续序列和表示的新列

基于多个列值创建具有连续序列和表示的新列
EN

Stack Overflow用户
提问于 2020-10-22 01:14:28
回答 2查看 62关注 0票数 0

当涉及到R编码时,我现在有点墨守成规。我一直在尝试使用mutate、seq和rep函数来生成一个新列,该列迭代多个列值和不同的条件,但结果并不正确。下面是我的一些数据片段:

代码语言:javascript
复制
library(tidyverse)
library(data.table)
library(stringr)

lipidData <- data.frame("Type"=c(rep("LDL",5),rep("HDL",5)),
                        "featureID"=c(12,12,12,12,13,13,14,15,16,17),
                        "featureID2"=c(21,22,23,26,31,31,31,31,38,40))
lipidWrong <- lipidData %>%
group_by(Type,featureID) %>% 
group_by(Type,featureID2) %>% 
mutate(lipidName=paste0(rep("lipid",n()),"_",seq(1,n())))
lipidWrong
  Type  featureID featureID2 lipidName
   <fct>     <dbl>      <dbl> <chr>    
 1 LDL          12         21 lipid_1  
 2 LDL          12         22 lipid_1  
 3 LDL          12         23 lipid_1  
 4 LDL          12         26 lipid_1  
 5 LDL          13         31 lipid_1  
 6 HDL          13         31 lipid_1  
 7 HDL          14         31 lipid_2  
 8 HDL          15         31 lipid_3  
 9 HDL          16         38 lipid_1  
10 HDL          17         40 lipid_1 

我希望将lipidName按类型和featureID分组,然后查看类型特性ID2,而不是不正确数据表。如果它们具有相同的类型和featureID,则将它们算作lipidName的相同脂质。如果它们具有相同的类型和featureID2,则将它们算作lipidName的相同脂质。由于我的实际数据集包含超过100,000行,因此如果知道如何对整个数据集的数字进行排序,而不仅仅是group_by中的n()结果,那就太好了。

我希望看到我的结果如下:

代码语言:javascript
复制
lipidCorrect
   Type featureID featureID2 lipidName
1   LDL        12         21   lipid_1 # same type and featureID
2   LDL        12         22   lipid_1 # same type and featureID
3   LDL        12         23   lipid_1 # same type and featureID
4   LDL        12         26   lipid_1 # same type and featureID
5   LDL        13         31   lipid_2 # although featureID is the same with row6, it has a different type
6   HDL        13         31   lipid_3 # same type and featureID2
7   HDL        14         31   lipid_3 # same type and featureID2
8   HDL        15         31   lipid_3 # same type and featureID2
9   HDL        16         38   lipid_4 
10  HDL        17         40   lipid_5

如果我的group_by()和mutate()做错了什么,请让我知道,还有更好的方法来产生想要的结果。

谢谢!

EN

回答 2

Stack Overflow用户

发布于 2020-10-22 05:11:59

如果我正确理解了这个问题(使用@Gregor Thomas的漂亮的澄清问题和评论),那么基于tidyverse的(笨拙的)解决方案可能如下所示。

代码语言:javascript
复制
library(dplyr)
library(stringr)

lipidData %>%
  group_by(Type, featureID) %>%
  mutate(lipidGroup1 = +(n() > 1)) %>%
  group_by(Type, featureID2) %>%
  mutate(lipidGroup2 = +(n() > 1)) %>%
  ungroup() %>%
  mutate(lipidGroup3 = +(lipidGroup1 == 0 & lipidGroup2 == 0)) %>%
  group_by(Type, featureID) %>%
  mutate(lipidGroup1 = if_else(n() > 1 & row_number() == min(row.names(.)), 1, 0)) %>%
  group_by(Type, featureID2) %>%
  mutate(lipidGroup2 = if_else(n() > 1 & row_number() == min(row.names(.)), 1, 0)) %>%
  ungroup() %>%
  mutate(lipidName = str_c('lipid_', cumsum(lipidGroup1 + lipidGroup2 + lipidGroup3))) %>%
  select(-starts_with('lipidGroup'))

#    Type  featureID featureID2 lipidName
#    <chr>     <dbl>      <dbl> <chr>    
#  1 LDL          12         21 lipid_1  
#  2 LDL          12         22 lipid_1  
#  3 LDL          12         23 lipid_1  
#  4 LDL          12         26 lipid_1  
#  5 LDL          13         31 lipid_2  
#  6 HDL          13         31 lipid_3  
#  7 HDL          14         31 lipid_3  
#  8 HDL          15         31 lipid_3  
#  9 HDL          16         38 lipid_4  
# 10 HDL          17         40 lipid_5 
票数 0
EN

Stack Overflow用户

发布于 2020-10-22 21:44:14

下面是一个使用helper变量来跟踪哪个分组生成唯一ID的版本,然后将其转换为final变量:

代码语言:javascript
复制
lipidData %>%
  group_by(Type, featureID) %>% 
  mutate(
    name_id = case_when(n() > 1 ~ paste("fid1", cur_group_id()), TRUE ~ NA_character_)
  ) %>%
  group_by(Type,featureID2) %>% 
  mutate(
    name_id = case_when(is.na(name_id) ~ paste("fid2", cur_group_id()), TRUE ~ name_id)
  ) %>%
  ungroup() %>%
  mutate(
    lipidName = paste("lipid", as.integer(factor(name_id, levels = unique(name_id))), sep = "_")
  ) %>%
  select(-name_id)
# # A tibble: 10 x 4
#    Type  featureID featureID2 lipidName
#    <chr>     <dbl>      <dbl> <chr>    
#  1 LDL          12         21 lipid_1  
#  2 LDL          12         22 lipid_1  
#  3 LDL          12         23 lipid_1  
#  4 LDL          12         26 lipid_1  
#  5 LDL          13         31 lipid_2  
#  6 HDL          13         31 lipid_3  
#  7 HDL          14         31 lipid_3  
#  8 HDL          15         31 lipid_3  
#  9 HDL          16         38 lipid_4  
# 10 HDL          17         40 lipid_5  
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64468534

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档