文章/答案/技术大牛

发布

社区首页 >问答首页 >将样本数据集分为相等的正样本和负样本

问将样本数据集分为相等的正样本和负样本
EN

Stack Overflow用户

提问于 2018-01-21 09:03:55

回答 1查看 881关注 0票数 1

我正在尝试以一种内存高效的方式获取一个巨大数据集的样本，其中+ve样本的数量= -ve样本的数量。

数据的比例是4:2 +ve与-ve，所以我尝试制作一个数据比例为2:2的样本

    A   B   C class   
0   0   1   2   0
1   3   4   5   0
2   6   7   8   1
3   9   10  11  1
4   12  13  14  1
5   15  16  17  1

所需输出：

    A   B   C   class   
0   0   1   2   0
1   3   4   5   0
2   6   7   8   1
3   9   10  11  1

我尝试使用python代码和pandas value_counts函数对其进行采样，但它的内存效率不高。

python

pandas

dataframe

machine-learning

回答 1

Stack Overflow用户

发布于 2018-01-21 09:13:01

positive=data[data['class']==0]
negative=data[data['class']==1].sample(n=positive.shape[0])
final=pd.concat([positive,negative])

或

positive_len=np.sum(data['class']==0) #gives the number of rows with class=0
final=data.sort_values('class')[:2*positive_len] #sort values. now rows with class 0 are on the top, rows with class 1 are on the bottom. pick top 2* length of positive.

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48362692

复制

相似问题

问将样本数据集分为相等的正样本和负样本
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将样本数据集分为相等的正样本和负样本EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将样本数据集分为相等的正样本和负样本
EN