首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >创建一个月id列

创建一个月id列
EN

Stack Overflow用户
提问于 2021-09-13 16:06:01
回答 2查看 61关注 0票数 1

我有一个数据框架,目前如下所示:

代码语言:javascript
复制
 Month      Park             
   <date>     <chr>            
  2019-04-01 Arbour Lake East   
  2019-07-01 Arbour Lake East             
  2019-07-01 Arbour Lake East                      
  2019-09-01 Arbour Lake East                         
  2019-09-01 Arbour Lake East                       
  2019-10-01 Arbour Lake East                       
  2020-01-01 Arbour Lake East                        
  2020-01-01 Arbour Lake East                       
  2020-02-01 Arbour Lake East                       
  2020-02-01 Arbour Lake East                    
  2020-03-01 Arbour Lake East              
  2020-04-01 Arbour Lake East                 
  2020-05-01 Arbour Lake East            
  2020-11-01 Arbour Lake East        
  2020-12-01 Arbour Lake East                      
  2021-04-01 Arbour Lake East               
  2019-09-01 Arbour Lake West                
  2019-09-01 Arbour Lake West             
  2019-10-01 Arbour Lake West                
  2020-05-01 Arbour Lake West 

我想创建一个新的列,月份id,其中1是在一个特定的公园中发现的第一个月,2个是在同一个公园中的第二个月(独立于这些月是否实际上是连续的)。例如,1月份可能是9月份,因为这是Nosehill砾石坑公园的第一个月;2月份是11月,因为这是Nosehill砾石坑公园的第二个月)。一些id (1,2,3,.)在不同的公园里是一样的,因为它们只代表公园的第一个月。同一个公园内完全相同(月/年)的月份也会收到相同的id。

下面是我想让这个专栏看起来的样子:

代码语言:javascript
复制
 Month    Month_id  Park                        
  2019-04-01  01 Arbour Lake East   
  2019-07-01  02 Arbour Lake East            
  2019-07-01  02 Arbour Lake East                      
  2019-09-01  03 Arbour Lake East                         
  2019-09-01  03 Arbour Lake East                       
  2019-10-01  04 Arbour Lake East                       
  2020-01-01  05 Arbour Lake East                        
  2020-01-01  05 Arbour Lake East                       
  2020-02-01  06 Arbour Lake East                       
  2020-02-01  06 Arbour Lake East                    
  2020-03-01  07 Arbour Lake East              
  2020-04-01  08 Arbour Lake East                 
  2020-05-01  09 Arbour Lake East            
  2020-11-01  10 Arbour Lake East         
  2020-12-01  11 Arbour Lake East                      
  2021-04-01  12 Arbour Lake East               
  2019-09-01  01 Arbour Lake West                
  2019-09-01  01 Arbour Lake West             
  2019-10-01  02 Arbour Lake West                
  2020-05-01  03 Arbour Lake West 

我真的不知道怎么做,所以任何线索都会很感激!

更多信息:

代码语言:javascript
复制
> dput(Data.frame[1:4])
structure(list(Month = structure(c(18383, 18383, 18414, 18414, 
18444, 18718, 18322, 18687, 18687, 18293, 18293, 18383, 18444, 
18475, 18506, 18536, 18567, 18567, 18628, 18748, 18748, 18779, 
18809, 18078, 18078, 18109, 18109, 18628, 18628, 18444, 18444, 
18475), class = "Date"), Park = c("Aspen Heights", "Aspen Heights", 
"Aspen Heights", "Aspen Heights", "Aspen Heights", "Aspen Heights", 
"Auburn Bay", "Auburn Bay", "Auburn Bay", "Bayview", "Bayview", 
"Bayview", "Bayview", "Bayview", "Bayview", "Bayview", "Bayview", 
"Bayview", "Bayview", "Bayview", "Bayview", "Bayview", "Bayview", 
"Cranston", "Cranston", "Cranston", "Cranston", "Cranston", "Cranston", 
"Currie Barracks", "Currie Barracks", "Currie Barracks"), Aggr_Code = c("1", 
"2", "1", "2", "1", "1", "1", "1", "2", "1", "2", "1", "1", "1", 
"1", "1", "1", "2", "1", "1", "2", "1", "1", "1", "2", "1", "2", 
"1", "2", "1", "2", "1"), AC_events_per_month = c(4, 1, 4, 1, 
2, 1, 1, 2, 1, 1, 1, 1, 3, 2, 4, 2, 6, 2, 3, 1, 1, 1, 1, 8, 4, 
2, 1, 3, 3, 2, 1, 1)), row.names = c(NA, -32L), groups = structure(list(
    Month = structure(c(18078, 18109, 18293, 18322, 18383, 18383, 
    18414, 18444, 18444, 18444, 18475, 18475, 18506, 18536, 18567, 
    18628, 18628, 18687, 18718, 18748, 18779, 18809), class = "Date"), 
    Park = c("Cranston", "Cranston", "Bayview", "Auburn Bay", 
    "Aspen Heights", "Bayview", "Aspen Heights", "Aspen Heights", 
    "Bayview", "Currie Barracks", "Bayview", "Currie Barracks", 
    "Bayview", "Bayview", "Bayview", "Bayview", "Cranston", "Auburn Bay", 
    "Aspen Heights", "Bayview", "Bayview", "Bayview"), .rows = structure(list(
        24:25, 26:27, 10:11, 7L, 1:2, 12L, 3:4, 5L, 13L, 30:31, 
        14L, 32L, 15L, 16L, 17:18, 19L, 28:29, 8:9, 6L, 20:21, 
        22L, 23L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -22L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-09-13 17:39:18

代码语言:javascript
复制
library(tidyverse)
df %>%
  group_by(Park) %>%
  mutate(ID = sprintf("%02d",as.integer(factor(Month))))

在R基地,您将做:

代码语言:javascript
复制
transform(df, ID = ave(as.character(Month), Park,FUN = ordered))
票数 0
EN

Stack Overflow用户

发布于 2021-09-13 16:42:26

下面是使用来自datastep()包的libr函数的解决方案。

首先,创建示例数据:

代码语言:javascript
复制
# Create data
df <- read.table(header = TRUE, text = '
 Month      Park             
  2019-04-01 "Arbour Lake East"   
  2019-07-01 "Arbour Lake East"             
  2019-07-01 "Arbour Lake East"                      
  2019-09-01 "Arbour Lake East"                         
  2019-09-01 "Arbour Lake East"                       
  2019-10-01 "Arbour Lake East"                       
  2020-01-01 "Arbour Lake East"                        
  2020-01-01 "Arbour Lake East"                       
  2020-02-01 "Arbour Lake East"                       
  2020-02-01 "Arbour Lake East"                    
  2020-03-01 "Arbour Lake East"              
  2020-04-01 "Arbour Lake East"                 
  2020-05-01 "Arbour Lake East"            
  2020-11-01 "Arbour Lake East"        
  2020-12-01 "Arbour Lake East"                      
  2021-04-01 "Arbour Lake East"               
  2019-09-01 "Arbour Lake West"                
  2019-09-01 "Arbour Lake West"             
  2019-10-01 "Arbour Lake West"                
  2020-05-01 "Arbour Lake West"')
 
df$Month <- as.Date(df$Month)

第二,生成ID列。数据步骤将逐行遍历dataframe。by参数在月份和公园上设置by组。然后,您可以使用data[n. -1, "Park"]结构查看不断变化的Park值,以重置每个Park的ID。

代码语言:javascript
复制
library(libr)

# Perform datastep to calculate id
df2 <- datastep(df, by = c("Month", "Park"),
                retain = list(Month_id = 0),
                keep = c("Month", "Month_id", "Park"),
                {
                  if (n. > 1) {
                    if (Park != data[n. - 1, "Park"])
                      Month_id <- 0
                  }
                  
                  if (first.) {
                  
                    Month_id <- Month_id + 1
                    
                  }
                })

# Add leading zero to id
df2$Month_id <- sprintf("%02d", df2$Month_id)

以下是研究结果:

代码语言:javascript
复制
df2
#         Month Month_id             Park
# 1  2019-04-01       01 Arbour Lake East
# 2  2019-07-01       02 Arbour Lake East
# 3  2019-07-01       02 Arbour Lake East
# 4  2019-09-01       03 Arbour Lake East
# 5  2019-09-01       03 Arbour Lake East
# 6  2019-10-01       04 Arbour Lake East
# 7  2020-01-01       05 Arbour Lake East
# 8  2020-01-01       05 Arbour Lake East
# 9  2020-02-01       06 Arbour Lake East
# 10 2020-02-01       06 Arbour Lake East
# 11 2020-03-01       07 Arbour Lake East
# 12 2020-04-01       08 Arbour Lake East
# 13 2020-05-01       09 Arbour Lake East
# 14 2020-11-01       10 Arbour Lake East
# 15 2020-12-01       11 Arbour Lake East
# 16 2021-04-01       12 Arbour Lake East
# 17 2019-09-01       01 Arbour Lake West
# 18 2019-09-01       01 Arbour Lake West
# 19 2019-10-01       02 Arbour Lake West
# 20 2020-05-01       03 Arbour Lake West

您还可以使用dplyr完成此操作。但我会让别人来回答。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69165884

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档