我在r中有下面提到的df:
df<-
ID Date Instance Price Volume Grade
K-1 2021-07-01 04:01:20 1 1000 2 5000
K-1 2021-07-02 22:01:29 2 1000 1.4 4500
K-1 2021-07-04 13:05:04 3 800 1.4 4500
K-2 2021-07-01 09:11:26 1 7000 1.5 3500
K-2 2021-07-01 14:01:40 2 2500 1.5 3000
K-3 2021-07-05 12:23:24 1 1000 2.1 2000
K-4 2021-07-06 14:11:40 1 15000 2.5 1500
K-4 2021-07-08 14:19:25 2 18000 2.7 2500通过使用上面的dataframe,我需要根据特定的条件导出以下细节。
在哪里,我需要根据Price列,Volume和Grade中的变化是在更高的方面还是在更低的方面来识别。Instance只不过是一个特殊的惟一ID的原始修改。
Diff_DateTime是唯一ID的最后一个实例减去该ID的第一个实例之间的日期差异。如果小于24小时,则生成一个值(以小时为单位)或在Days.Rng_Price中为桶0-500的差值范围,500-1500、1500-3000和3000+Rng_Vol为桶0-0.5的差值范围,0.5-1、1-2和2+Rng_Grd为桶0-5001, 500-1000, 1000-2000and2000+`的差值范围。
初始和最终是基于实例的第一和最后的价格、数量和等级值。
所需Df<-
ID Count_Instance Chng_Pri Chng_Vol Chng_Grd Diff_DateTime Rng_Pric Rng_Vol Rng_Grd Initial_Pri Final_Pri Initial_Vol Final_Vol Initial_Grd Final_Grd
K-1 3 Low Low Low 3.5 Days 0-500 0.5-1.0 0-500 1000 800 2 1.4 5000 4500
K-2 2 Low Constant Low 5h 3000+ 0 0-500 7000 2500 1.5 1.5 3500 3000
K-3 1 No Inst No Inst No Inst No Inst No Inst No Inst No Inst No Inst No Inst No Inst No Inst No Inst No Inst
K-4 2 High High High 2 Days 1500-3000 0-0.5 500-1000 15000 18000 2.5 2.7 1500 2500发布于 2021-07-29 19:42:11
你可以用
library(tidyverse)
df %>%
group_by(ID) %>%
arrange(ID, Instance) %>%
summarise(
Count_Instance = n(),
Diff_Date = if_else(Count_Instance != 1,
difftime(last(Date), first(Date), units=c("hours")),
NA_real_),
Rng_Price = case_when(
Count_Instance == 1 ~ NA_character_,
abs(last(Price) - first(Price)) <= 500 ~ "0-500",
abs(last(Price) - first(Price)) <= 1500 ~ "500 - 1500",
abs(last(Price) - first(Price)) <= 3000 ~ "1500 - 3000",
TRUE ~ "3000+"
),
Rng_Vol = case_when(
Count_Instance == 1 ~ NA_character_,
abs(last(Volume) - first(Volume)) == 0 ~ "0",
abs(last(Volume) - first(Volume)) <= 0.5 ~ "0 - 0.5",
abs(last(Volume) - first(Volume)) <= 1 ~ "0.5 - 1",
abs(last(Volume) - first(Volume)) <= 2 ~ "1 - 2",
TRUE ~ "2+"
),
Rng_Grd = case_when(
Count_Instance == 1 ~ NA_character_,
abs(last(Grade) - first(Grade)) == 0 ~ "0",
abs(last(Grade) - first(Grade)) <= 500 ~ "0 - 500",
abs(last(Grade) - first(Grade)) <= 1000 ~ "500 - 1000",
abs(last(Grade) - first(Grade)) <= 2000 ~ "1000 - 2000",
TRUE ~ "2000+"
),
Initital_Pri = if_else(Count_Instance != 1, first(Price), NA_real_),
Final_Pri = if_else(Count_Instance != 1, last(Price), NA_real_),
Initial_Vol = if_else(Count_Instance != 1, first(Volume), NA_real_),
Final_Vol = if_else(Count_Instance != 1, last(Volume), NA_real_),
Initial_Grd = if_else(Count_Instance != 1, first(Grade), NA_real_),
Final_Grd = if_else(Count_Instance != 1, last(Grade), NA_real_)
)回传
# A tibble: 4 x 12
ID Count_Instance Diff_Date Rng_Price Rng_Vol Rng_Grd Initital_Pri Final_Pri Initial_Vol Final_Vol
<chr> <int> <drtn> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 K-1 3 81.062222 hours 0-500 0.5 - 1 0 - 500 1000 800 2 1.4
2 K-2 2 4.837222 hours 3000+ 0 0 - 500 7000 2500 1.5 1.5
3 K-3 1 NA hours NA NA NA NA NA NA NA
4 K-4 2 48.129167 hours 1500 - 3000 0 - 0.5 500 - 10~ 15000 18000 2.5 2.7
# ... with 2 more variables: Initial_Grd <dbl>, Final_Grd <dbl>缺少三件事:
您需要一些函数才能将provided.
,所以不能将No Inst放入具有实数的列中,除非您想将这些数字转换为character/string.。
数据
df <- structure(list(ID = c("K-1", "K-1", "K-1", "K-2", "K-2", "K-3",
"K-4", "K-4"), Date = structure(c(1625104880, 1625256089, 1625396704,
1625123486, 1625140900, 1625480604, 1625573500, 1625746765), class = c("POSIXct",
"POSIXt"), tzone = ""), Instance = c(1, 2, 3, 1, 2, 1, 1, 2),
Price = c(1000, 1000, 800, 7000, 2500, 1000, 15000, 18000
), Volume = c(2, 1.4, 1.4, 1.5, 1.5, 2.1, 2.5, 2.7), Grade = c(5000,
4500, 4500, 3500, 3000, 2000, 1500, 2500)), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))https://stackoverflow.com/questions/68581067
复制相似问题