问样本脚本样本限制？
EN

Stack Overflow用户

提问于 2020-04-08 15:30:41

回答 1查看 68关注 0票数 1

我正在编写一个脚本，它从excel文件中的每个类别获取一个示例。有不同的百分比，取决于长度，但我想知道是否有办法设定一个限制5项每样本，即使1%带回，比如说，2项。任何帮助都将不胜感激。

import pandas as pd
df = pd.read_excel(r"C:\Users\****\Desktop\Audit_catalogs\****.xlsx")

df2 = df.loc[(df['Track Item']=='Y')]
print(len(df2))

def sample_per(df2):
    if len(df2) >= 15000:
        return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.01)))
    elif len(df2) < 15000 and len(df2) > 10000:
        return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.03)))
   else:
    return (df2.groupby('Category').apply(lambda x: x.sample(frac=0.05)))


final = sample_per(df2)

df.loc[df['Retailer Item ID'].isin(final['Retailer Item ID']), 'Track Item'] = 'Audit'

df.to_csv('****_Audit.csv',index=False)

pandas

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-04-08 16:01:37

您可以使用x.size * 0.01来检查可以得到多少值，并使用sample(n=5)而不是sample(frac=0.01)

.apply(lambda x: x.sample(n=5) if x.size*0.01 < 5 else x.sample(frac=0.01))

import pandas as pd
import random

random.seed(1) #  to generate always the same random data

data = {'Category': [random.choice([1,2,2,2,3]) for x in range(1000)]} # columns
df = pd.DataFrame(data)
print(df)

# --- before ---
df1 = df.groupby('Category').apply(lambda x: x.sample(frac=0.01))
print('--- before ---')
print(df1['Category'].value_counts())

# --- after ---
df2 = df.groupby('Category').apply(lambda x: x.sample(n=5) if x.size*.01 < 5 else x.sample(frac=0.01))
print('--- after ---')
print(df2['Category'].value_counts())

结果

--- before ---
2    6
3    2
1    2
Name: Category, dtype: int64

--- after ---
2    6
3    5
1    5
Name: Category, dtype: int64

编辑:以更易读的方式使用

def myfunction(x):
    if x.size*0.01 < 5:
         return x.sample(n=5)
    else:
         return x.sample(frac=0.01)

df1 = df.groupby('Category').apply(myfunction)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61104380

复制

相似问题

问样本脚本样本限制？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问样本脚本样本限制？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问样本脚本样本限制？
EN