文章/答案/技术大牛

发布

社区首页 >问答首页 >R data.table -不同抽样比例组的样本

问R data.table -不同抽样比例组的样本
EN

Stack Overflow用户

提问于 2019-10-15 13:21:56

回答 1查看 623关注 0票数 1

我想从一个data.table中一组一组地随机抽取一个样本，但是应该可以为每个组抽取不同比例的样本。

如果我想从每一组中抽取分数sampling_fraction，我可以从这的问题和相关的答案中得到灵感，去做这样的事情：

DT = data.table(a = sample(1:2), b = sample(1:1000,20))

group_sampler <- function(data, group_col, sample_fraction){
  # this function samples sample_fraction <0,1> from each group in the data.table
  # inputs:
  #   data - data.table
  #   group_col - column(s) used to group by
  #   sample_fraction - a value between 0 and 1 indicating what % of each group should be sampled
  data[,.SD[sample(.N, ceiling(.N*sample_fraction))],by = eval(group_col)]
}

# what % of data should be sampled
sampling_fraction = 0.5

# perform the sampling
sampled_dt <- group_sampler(DT, 'a', sampling_fraction)

但是，如果我想从第一组中抽取10%的样本，从第2组中抽取50%的样本呢？

data.table

oversampling

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-10-15 13:33:38

您可以使用.GRP，但要确保正确的组匹配。您可能需要将group_col定义为一个因素变量。

group_sampler <- function(data, group_col, sample_fractions) {
  # this function samples sample_fraction <0,1> from each group in the data.table
  # inputs:
  #   data - data.table
  #   group_col - column(s) used to group by
  #   sample_fraction - a value between 0 and 1 indicating what % of each group should be sampled
  stopifnot(length(sample_fractions) == uniqueN(data[[group_col]]))
  data[, .SD[sample(.N, ceiling(.N*sample_fractions[.GRP]))], keyby = group_col]
}

编辑回应chinsoon12 12的评论：

使用函数的最后一行将更安全(而不是依赖正确的顺序)：

data[, .SD[sample(.N, ceiling(.N*sample_fractions[[unlist(.BY)]]))], keyby = group_col]

然后将sample_fractions作为命名向量传递：

group_sampler(DT, 'a', sample_fractions= c(x = 0.1, y = 0.9))

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58395772

复制

相似问题

问R data.table -不同抽样比例组的样本
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R data.table -不同抽样比例组的样本EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R data.table -不同抽样比例组的样本
EN