我试着在论坛上搜寻答案,但找不到答案。我想浏览一下dataframe列(IN_FID)的唯一值,并将与该值相关联的另一列(NEAR_FID)的值(可能有一个或多个)添加到列表中。然后将IN_FID添加到列表中。如果在此过程中以前见过NEAR_FID中的值,则不会将IN_FID添加到列表中。我知道我没有将它包含在代码中,但理想情况下,我也希望随机地循环IN_FID值,而不是按顺序循环。我在这代码里做错了什么?
eagle
IN_FID NEAR_FID
1 2 1
2 2 2
3 2 3
4 8 4
5 9 2
6 9 7
7 9 8
8 9 9
9 16 2
10 16 11
11 21 12
p.good = list()
p.bad = list()
INFIDS = unique(eagle$IN_FID)
NEARFIDS = unique(eagle$NEAR_FID)
t.used = NEARFIDS
for (i in INFIDS) {
sub = eagle[eagle$IN_FID == i, ]
x = sub$NEAR_FID
if (all(x) %in% t.used){
p.good = c(p.good, i)
t.used[t.used != all(x)]
} else {
p.bad = c(p.bad, i)
}预期的产出将是:
p.good
[1] 2 8 21 (because NEAR_FID of 2 is present in 9 and 16)
p.bad
[1] 9 16
t.used
= empty because it will have used the values during the loop发布于 2017-07-12 19:43:35
您可以使用函数duplicated()
index_dup = which(duplicated(eagle$NEAR_FID))
p.bad = unique(eagle$IN_FID[index_dup])
index_bad = c()
for (i in p.bad){
index_bad = c(index_bad,which(eagle$IN_FID == i))
}
p.good = unique(eagle$IN_FID[-index_bad])对于随机化,您可以随机选择数据的行顺序,然后再次应用上面的代码。
eagle_random <- eagle[sample(1:nrow(eagle)), ]发布于 2017-07-12 19:22:02
而不是一个列表,声明为vector
p.good = NULL
p.bad = NULL
INFIDS = unique(eagle$IN_FID)
NEARFIDS = unique(eagle$NEAR_FID)
t.used = NEARFIDS代替min:max,迭代向量for (i in INFIDS)的元素
for (i in INFIDS) {
x = (eagle %>% filter(IN_FID == i))$NEAR_FID # combine into single statement
if (all(x %in% t.used)) { # was all(x) %in% t.used before
p.good = c(p.good, i)
t.used = t.used[!(t.used %in% x)] # was t.used != all(x)
} else {
p.bad = c(p.bad, i)
}
}输出:
p.good
[1] 2 8 21
p.bad
[1] 9 16
t.used
[1] 7 8 9 11 # some values were not eliminated as you expected--随机抽样
改变for (i in INFIDS)
敬for (i in sample(INFIDS))。用set.seed(1)控制随机抽样。
https://stackoverflow.com/questions/45063619
复制相似问题