我有一些图表数据,其中包含位置ID和网格ID以及实际数据。在同一个网格中可以有多个位置,但有些网格只有一个位置。我需要为每个位置ID分配一个基于条件的页码,每个页面只包含相同网格ID的数据,并且每个页面最多可以绘制4个位置。示例数据和我的解决方案如下所示。我的方式似乎很笨拙,我很想知道是否有人有data.table或dplyr的方式来做到这一点。
数据
Sol3<-structure(list(Location = c("29N05W21H001M", "29N05W33A004M",
"29N04W20A001M", "29N04W20A002M", "29N04W20A003M", "29N04W20A004M",
"29N04W28D001M", "29N05W14L001M", "28N04W04P001M", "27N04W05G002M",
"27N04W34P001M", "29N03W18M001M", "29N04W15E002M", "29N04W35B001M",
"27N03W20C001M", "27N04W25Q001M", "27N04W35E001M", "26N03W17B001M",
"26N04W25J001M", "25N03W19N001M", "27N03W10B001M", "27N03W16K003M",
"27N02W31C001M", "27N03W23D001M", "25N03W10L001M", "25N03W10L002M",
"25N03W10L003M", "25N03W10L004M", "25N03W10L005M", "25N03W11B001M",
"25N03W11B002M", "25N03W11B003M", "27N02W30C002M", "27N02W30C003M",
"26N02W14G001M", "26N02W15C001M", "26N02W16C001M", "26N02W17E001M",
"26N02W21Q001M", "26N02W29R001M", "25N02W09G001M", "25N02W21B001M",
"24N02W02E001M", "24N02W12P001M", "24N02W23G001M", "25N02W34K001M",
"24N01W05J003M", "24N01W05Q002M", "25N01W32P001M", "24N01W18N001M",
"24N02W25G001M"), G_ID = c("C-2", "C-2", "D-2", "D-2", "D-2",
"D-2", "D-2", "D-2", "D-3", "D-4", "D-5", "E-2", "E-2", "E-2",
"E-4", "E-4", "E-4", "E-5", "E-5", "E-6", "F-4", "F-4", "F-5",
"F-5", "F-6", "F-6", "F-6", "F-6", "F-6", "F-6", "F-6", "F-6",
"G-4", "G-4", "G-5", "G-5", "G-5", "G-5", "G-5", "G-5", "G-6",
"G-6", "G-7", "G-7", "G-7", "G-7", "H-7", "H-7", "H-7", "H-8",
"H-8"), G_N = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 5L, 6L,
6L, 6L, 7L, 7L, 7L, 8L, 8L, 9L, 10L, 10L, 11L, 11L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 14L, 14L, 14L, 14L, 14L,
14L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L)), row.names = c(NA,
-51L), class = c("data.table", "data.frame"), .internal.selfref = "<pointer: 0x0c632498>")溶液
library("data.table") #Working with data.table object
Sol3$Pagenumber<-NA
Sol3$Pagenumber[1]<-1
for(n in 2:length(Sol3$Location)) {
if(Sol3$G_ID[n-1]!=Sol3$G_ID[n]) Sol3$Pagenumber[n]<-Sol3$Pagenumber[n-1]+1 else Sol3$Pagenumber[n]<-Sol3$Pagenumber[n-1]
if(n>=5) {if(Sol3$G_ID[n-4]==Sol3$G_ID[n]) Sol3$Pagenumber[n]<-Sol3$Pagenumber[n-4]+1}
}
Sol3$Pagenumber #Desired Result发布于 2021-02-23 02:38:39
这个怎么样?
基R
k <- 6
Sol3$Page2 <-
Sol3$PG +
cumsum(
zoo::rollapplyr(Sol3$PG, k,
function(z) length(z) == 6 && z[1] != z[k] && all(z[-1] == z[k]),
partial = TRUE)
)
Sol3
# Location PG Pagenumber Page2
# <char> <int> <num> <int>
# 1: 29N05W21H001M 1 1 1
# 2: 29N05W33A004M 1 1 1
# 3: 29N04W20A001M 2 2 2
# 4: 29N04W20A002M 2 2 2
# 5: 29N04W20A003M 2 2 2
# 6: 29N04W20A004M 2 2 2
# 7: 29N04W28D001M 3 3 3
# 8: 29N05W14L001M 3 3 3
# 9: 28N04W04P001M 3 3 3
# 10: 27N04W05G002M 4 4 4
# 11: 27N04W34P001M 5 5 5
# 12: 29N03W18M001M 6 6 6
# 13: 29N04W15E002M 6 6 6
# 14: 29N04W35B001M 6 6 6
# 15: 27N03W20C001M 7 7 7
# 16: 27N04W25Q001M 7 7 7
# 17: 27N04W35E001M 7 7 7
# 18: 26N03W17B001M 8 8 8
# 19: 26N04W25J001M 8 8 8
# 20: 25N03W19N001M 9 9 9
# 21: 27N03W10B001M 10 10 10
# 22: 27N03W16K003M 10 10 10
# 23: 27N02W31C001M 11 11 11
# 24: 27N03W23D001M 11 11 11
# 25: 25N03W10L001M 12 12 12
# 26: 25N03W10L002M 12 12 12
# 27: 25N03W10L003M 12 12 12
# 28: 25N03W10L004M 12 12 12
# 29: 25N03W10L005M 13 13 13
# 30: 25N03W11B001M 13 13 13
# 31: 25N03W11B002M 13 13 13
# 32: 25N03W11B003M 13 13 13
# 33: 27N02W30C002M 13 14 14
# 34: 27N02W30C003M 13 14 14
# 35: 26N02W14G001M 14 15 15
# 36: 26N02W15C001M 14 15 15
# 37: 26N02W16C001M 14 15 15
# 38: 26N02W17E001M 14 15 15
# 39: 26N02W21Q001M 15 16 16
# 40: 26N02W29R001M 15 16 16
# 41: 25N02W09G001M 15 16 16
# 42: 25N02W21B001M 15 16 16
# 43: 24N02W02E001M 16 17 17
# 44: 24N02W12P001M 16 17 17
# 45: 24N02W23G001M 16 17 17
# 46: 25N02W34K001M 16 17 17
# 47: 24N01W05J003M 17 18 18
# 48: 24N01W05Q002M 17 18 18
# 49: 25N01W32P001M 17 18 18
# 50: 24N01W18N001M 18 19 19
# 51: 24N02W25G001M 18 19 19
# Location PG Pagenumber Page2k <- 6是特定于您的n-4引用:由于您关心的是一个PG重复超过4次(回首5次),那么我们需要k至少是5。然而,如果我们只看了5回,那么我们就会检测到5,6,7等等。这将逐渐增加新的价值。为了阻止这一点,我们捕捉当它重复第5次,其中5的6个是相同的,但仍然与第一个窗口。
基本上,滚动窗口看起来像:
# [1] 1 1 2 2 2 2 3 3 3 4 5 6 6 6 7 7 7 8 8 9 10 10 11 11 12
^ `-----------'
| if these five are the same value,
| and this value is different from the first --,
| |
`----------------------------------------------'
then we return a TRUE (effectively a 1)在后面的序列中,
# [26] 12 12 12 13 13 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 18
^--`------------'
the window of len six has one 12 and five 13s,
the last five are all the same (trigger), and different from the first (trigger),
and for safe keeping (since we need partial=TRUE), the length of the vector
is six (trigger)然而,当窗口向右移动时,
# [26] 12 12 12 13 13 13 13 13 13 14 14 14 14 15 15 15 15 16 16 16 16 17 17 17 18
^--`------------'
the last five are the same (trigger), but the first is not different (NO trigger)窗口本身产生逻辑序列。在cumsum中,它们被转换为integer,并最终根据需要递增。
data.table
既然你给data.table加了标签,你就来了,虽然这看起来有点不靠谱。
library(data.table)
setDT(Sol3)[, Page2 := PG +
cumsum(
zoo::rollapplyr(PG, k,
function(z) length(z) == 6 && z[1] != z[k] && all(z[-1] == z[k]),
partial = TRUE)
) ]发布于 2021-02-23 02:03:37
一个dplyr解决方案,您首先使用它在组中的位置row_number()通过PG生成一个页码,然后使用它作为第二个分组变量。然后,页码就是组id。
Sol3 %>%
group_by(PG) %>%
mutate( PageInGroup = floor( ( row_number() - 1 ) / 4 ) ) %>%
group_by( PageInGroup, .add = TRUE ) %>%
mutate( PageNum = cur_group_id() )https://stackoverflow.com/questions/66325734
复制相似问题