我试图做一个以前看起来很简单的操作,但是我没有在网络上找到一个明确的解决方案。
我有这样的桌子:
tibble(
block = c(1,1,1,2,2,2),
tag = letters[1:6],
start = c(15,54,78,27,45,80),
end = c(50,80,90,40,76,100),
direction = c(-1,-1,1,1,1,1),
anchor = c(FALSE,TRUE,FALSE,FALSE,TRUE,FALSE)
) -> df1
# A tibble: 6 x 6
block tag start end direction anchor
<dbl> <chr> <dbl> <dbl> <dbl> <lgl>
1 1 a 15 50 -1 FALSE
2 1 b 54 80 -1 TRUE
3 1 c 78 90 1 FALSE
4 2 d 27 40 1 FALSE
5 2 e 45 76 1 TRUE
6 2 f 80 100 1 FALSE 我在block列中得到了组,一个组中只有一个anchor。给定这个(**direction * -1**),如果锚方向是,则需要反转块内的坐标,-1 (**direction[anchor] == -1),也需要保持锚坐标(**start & end**)**,并调整另一个坐标和anchor == FALSE坐标,以保持它们保持新月,但具有相同的比例(长度和距离到上下游标记)。
为了简化,如果组的锚点是-1,则需要重新确定坐标。这意味着,如果anchor == -1那么:
ancho * -1之间的距离
然后,输出只需要如下所示:
tibble(
block = c(1,1,1,2,2,2),
tag = c("c", "b", "a", "d", "e", "f"),
start = c(44,54,84,27,45,80),
end = c(56,80,119,40,76,100),
direction = c(-1,1,1,1,1,1),
anchor = c(FALSE,TRUE,FALSE,FALSE,TRUE,FALSE)
) -> df2
# A tibble: 6 x 6
block tag start end direction anchor
<dbl> <chr> <dbl> <dbl> <dbl> <lgl>
1 1 c 44 56 -1 FALSE
2 1 b 54 80 1 TRUE
3 1 a 84 119 1 FALSE
4 2 d 27 40 1 FALSE
5 2 e 45 76 1 TRUE
6 2 f 80 100 1 FALSE如下所示,长度和对距离保持不变:
df1 %>%
group_by(block) %>%
mutate(
TagDistance = lead(start) - end,
len = end - start
)
# A tibble: 6 x 8
# Groups: block [2]
block tag start end direction anchor TagDistance len
<dbl> <chr> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
1 1 a 15 50 -1 FALSE 4 35
2 1 b 54 80 -1 TRUE -2 26
3 1 c 78 90 1 FALSE NA 12
4 2 d 27 40 1 FALSE 5 13
5 2 e 45 76 1 TRUE 4 31
6 2 f 80 100 1 FALSE NA 20
df2 %>%
group_by(block) %>%
mutate(
TagDistance = lead(start) - end,
len = end - start
)
# A tibble: 6 x 8
# Groups: block [2]
block tag start end direction anchor TagDistance len
<dbl> <chr> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
1 1 c 44 56 -1 FALSE -2 12
2 1 b 54 80 1 TRUE 4 26
3 1 a 84 119 1 FALSE NA 35
4 2 d 27 40 1 FALSE 5 13
5 2 e 45 76 1 TRUE 4 31
6 2 f 80 100 1 FALSE NA 20图形表示是这样的:
library(ggplot2)
library(gggenes)
df1 %>%
ggplot(aes(xmin = start, xmax = end, y = as.factor(block), forward = direction, fill = anchor)) +
geom_gene_arrow() +
geom_gene_label(aes(label = tag)) +
theme_genes()
#
df2 %>%
ggplot(aes(xmin = start, xmax = end, y = as.factor(block), forward = direction, fill = anchor)) +
geom_gene_arrow() +
geom_gene_label(aes(label = tag)) +
theme_genes()


提前感谢
发布于 2020-08-19 15:13:31
我解决了,很蠢,也许还有更好的解决方案?
tibble(
block = c(1,1,1,2,2,2),
tag = letters[1:6],
start = c(15,54,78,27,45,80),
end = c(50,80,90,40,76,100),
direction = c(-1,-1,1,1,1,1),
anchor = c(FALSE,TRUE,FALSE,FALSE,TRUE,FALSE)
) -> a
a
# A tibble: 6 x 6
block tag start end direction anchor
<dbl> <chr> <dbl> <dbl> <dbl> <lgl>
1 1 a 15 50 -1 FALSE
2 1 b 54 80 -1 TRUE
3 1 c 78 90 1 FALSE
4 2 d 27 40 1 FALSE
5 2 e 45 76 1 TRUE
6 2 f 80 100 1 FALSE 然后,我按block分组并执行许多算术操作,这些操作是启发式的,如下所示:
a %>%
group_by(block) %>%
mutate(
anchor_direction = direction[anchor],
position_relative_to_anchor = case_when(
anchor ~ NA_character_,
(start < start[anchor]) | (start == start[anchor] && end < end[anchor]) ~ "upstream",
start > start[anchor] ~ "downstream"
),
TagDistance = if_else(
position_relative_to_anchor == "upstream",
start[anchor] - end,
start - end[anchor]
),
length = end - start,
newstart = case_when(
anchor ~ start,
anchor_direction == 1 ~ start,
position_relative_to_anchor == "upstream" ~ end[anchor] + TagDistance,
position_relative_to_anchor == "downstream" ~ start[anchor] - TagDistance
),
newend = case_when(
anchor ~ end,
anchor_direction == 1 ~ end,
position_relative_to_anchor == "upstream" ~ newstart + length,
position_relative_to_anchor == "downstream" ~ newstart - length
),
start = case_when(
anchor ~ start,
anchor_direction == 1 ~ start,
position_relative_to_anchor == "upstream" ~ newstart,
position_relative_to_anchor == "downstream" ~ newend
),
end = case_when(
anchor ~ end,
anchor_direction == 1 ~ end,
position_relative_to_anchor == "upstream" ~ newend,
position_relative_to_anchor == "downstream" ~ newstart
)
) %>%
arrange(block,start,end) %>%
mutate(
direction = direction * anchor_direction
) %>%
select(
-c(
anchor_direction,
position_relative_to_anchor,
TagDistance,
length,
newstart,
newend
)
) -> a
a
# A tibble: 6 x 6
# Groups: block [2]
block tag start end direction anchor
<dbl> <chr> <dbl> <dbl> <dbl> <lgl>
1 1 c 44 56 -1 FALSE
2 1 b 54 80 1 TRUE
3 1 a 84 119 1 FALSE
4 2 d 27 40 1 FALSE
5 2 e 45 76 1 TRUE
6 2 f 80 100 1 FALSE最后,我与预期的结果进行了比较:
tibble(
block = c(1,1,1,2,2,2),
tag = c("c", "b", "a", "d", "e", "f"),
start = c(44,54,84,27,45,80),
end = c(56,80,119,40,76,100),
direction = c(-1,1,1,1,1,1),
anchor = c(FALSE,TRUE,FALSE,FALSE,TRUE,FALSE)
) -> b
setdiff(a, b)
# A tibble: 0 x 6
# Groups: block [0]
# … with 6 variables: block <dbl>, tag <chr>, start <dbl>, end <dbl>, direction <dbl>, anchor <lgl>任何更好的解决办法都是受欢迎的。
https://stackoverflow.com/questions/63476981
复制相似问题