首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >根据等级和条件过滤数据

根据等级和条件过滤数据
EN

Stack Overflow用户
提问于 2019-10-13 23:52:22
回答 1查看 44关注 0票数 1

我有一些数据,它们看起来类似于以下内容:

代码语言:javascript
复制
# A tibble: 2,717 x 6
# Groups:   date [60]
   symbol date       monthly.returns score totals score_rank
   <chr>  <date>               <dbl> <dbl>  <dbl>      <int>
 1 GIS    2010-01-29        0.0128   0.436  119.           2
 2 GIS    2010-02-26        0.00982  0.205  120.           1
 3 GIS    2010-03-31       -0.0169   0.549   51.1          3
 4 GIS    2010-04-30        0.0123   0.860   28.0          4
 5 GIS    2010-05-28        0.000984 0.888   91.6          4
 6 GIS    2010-06-30       -0.00267  0.828   15.5          4
 7 GIS    2010-07-30       -0.0297   0.482   81.7          2
 8 GIS    2010-08-31        0.0573   0.408   57.2          3
 9 GIS    2010-09-30        0.0105   0.887   93.3          4
10 GIS    2010-10-29        0.0357   0.111   96.6          1
# ... with 2,707 more rows

我有一个score_rank,我想要做的是每当totals列大于100时,用以下方式过滤数据:

1)当score_rank = 1时,取基于score列的前5%的观测值

2)当score_rank =2或3时,随机抽取5%的观测值

3)当score_rank =4时,取基于score列的最低5%的观测值。

数据:

代码语言:javascript
复制
tickers <- c("GIS", "KR", "MKC", "SJM", "EL", "HRL", "HSY", "K", 
             "KMB", "MDLZ", "MNST", "PEP", "PG", "PM", "SYY", "TAP", "TSN", "WBA", "WMT",
             "MMM", "ABMD", "ACN", "AMD", "AES", "AON", "ANTM", "APA", "CSCO", "CMS", "KO", "GRMN", "GPS",
             "JEC", "SJM", "JPM", "JNPR", "KSU", "KEYS", "KIM", "NBL", "NEM", "NWL", "NFLX", "NEE", "NOC", "TMO", "TXN", "TWTR")

library(tidyquant)
data <- tq_get(tickers,
               get = "stock.prices",              # Collect the stock price data from 2010 - 2015
               from = "2010-01-01",
               to = "2015-01-01") %>%
  group_by(symbol) %>%
  tq_transmute(select = adjusted,                 # Convert the data from daily prices to monthly prices
               mutate_fun = periodReturn,
               period = "monthly",
               type = "arithmetic")

data$score <- runif(nrow(data), min = 0, max = 1)
data$totals <- runif(nrow(data), min = 10, max = 150)

data <- data %>%
  group_by(date) %>%
  mutate(
    score_rank = ntile(score, 4)
  )

编辑:添加代码。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-10-14 00:59:58

这里是filter的一个选择。为每个相应的‘list _fs’创建一个list of functions (fs),使用map2循环遍历list函数和相应的score_rank‘list of vectorfilter 'totals’大于100的'score_rank',以及'score_rank‘%in% map2向量的输入,将'score’列上的函数应用于filter行样本,并将子集数据与数据filtered绑定,其中‘total’小于或等于100

代码语言:javascript
复制
library(purrr)
library(dplyr)
fs <- list(as_mapper(~  . >= quantile(., prob = 0.95)), 
       as_mapper(~ row_number() %in% sample(row_number(), round(0.05 * n()) )),
       as_mapper(~  . <= quantile(., prob = 0.05))
       )


map2_df(list(1, c(2, 3), 4), fs, ~          

    data %>%        
        filter(totals > 100, score_rank %in% .x) %>%
        filter(.y(score))

         )%>%   bind_rows(data %>%
                            filter(totals <= 100))
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58365244

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档