问重复ID检查
EN

Stack Overflow用户

提问于 2017-08-11 04:28:09

回答 1查看 57关注 0票数 0

我有一份包含人名和身份证号码的数据。有些人被列了两到三次。每个人都有一个ID号-如果他们被多次列出，只要是同一个人，他们的ID号就会保持不变。如下所示：

Name  david david john john john john megan bill barbara chris chris

ID     1      1    2    2    2    2   3    4    5   6   6

我需要确保这些ID号是正确的，并且不同的人没有相同的ID号。在执行此操作时，我希望创建一个分配新ID号的新变量，以便可以将新ID号与旧ID号进行比较。我想创建一个命令，说“如果他们的名字相同，请使他们的ID号相同”。我该怎么做呢？这有意义吗？

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-08-11 20:08:40

有很多方法可以做到这一点，其中一些已经在上面提出了。我通常使用dplyr版本来发现和删除重复的/坏的情况。这是一个根据您的目标具有不同输出的示例。

library(dplyr)

# example with one bad case
dt = data.frame(Name = c("david","davud","John","John","megan"),
                ID = c(1,1,2,3,3), stringsAsFactors = F)


# spot names with more than 1 unique IDs
dt %>%
  group_by(Name) %>%
  summarise(NumIDs = n_distinct(ID)) %>%
  filter(NumIDs > 1)

# # A tibble: 1 x 2
#    Name NumIDs
#   <chr>  <int>
# 1  John      2


# spot names with more than 1 unique IDs and the actual IDs
dt %>%
  group_by(Name) %>%
  mutate(NumIDs = n_distinct(ID)) %>%
  filter(NumIDs > 1) %>%
  ungroup()

# # A tibble: 2 x 3
#    Name    ID NumIDs
#   <chr> <dbl>  <int>
# 1  John     2      2
# 2  John     3      2


# spot names with more than 1 unique IDs and the actual IDs - alternative
dt %>%
  group_by(Name) %>%
  mutate(NumIDs = n_distinct(ID)) %>%
  filter(NumIDs > 1) %>%
  group_by(Name, NumIDs) %>%
  summarise(IDs = paste0(ID, collapse=",")) %>%
  ungroup()

# # A tibble: 1 x 3
#      Name NumIDs   IDs
#     <chr>  <int> <chr>
#   1  John      2   2,3

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45622917

复制

相似问题

问重复ID检查
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问重复ID检查EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问重复ID检查
EN