我是Python的新手,目前正在尝试使用Pandas从我的CSV文件中删除符合特定条件的行。
CSV file1结构:
id product userCount warehouse date
1 Guitar 0 D-12 2020-03-10
2 Drums 1 F-1 2020-03-10
3 Piano 1 E-9 2020-03-10
1 Guitar 0 D-12 2020-03-09
2 Drums 12 F-1 2020-03-09
3 Piano 0 E-9 2020-03-09
1 Guitar 0 D-12 2020-03-08
2 Drums 10 F-1 2020-03-08
3 Piano 0 E-9 2020-03-08
...我的目标(条件):将所有行中userCount为0的特定in的记录(result1)与所有行中userCount中至少为1或更多的其余行(result2)分开。
result1:
id product userCount warehouse date
1 Guitar 0 D-12 2020-03-10
1 Guitar 0 D-12 2020-03-09
1 Guitar 0 D-12 2020-03-08
...result2:
id product userCount warehouse date
2 Drums 1 F-1 2020-03-10
3 Piano 1 E-9 2020-03-10
2 Drums 12 F-1 2020-03-09
3 Piano 0 E-9 2020-03-09
2 Drums 10 F-1 2020-03-08
3 Piano 0 E-9 2020-03-08
...作为期望的输出,我需要两个单独的文件。包含result1行的Output1.csv和包含result2行的Output2.csv。
到目前为止,我的代码如下。然而,不知何故,它将一些只有0值的行移动到Output2.csv中,而这些行根本不应该有这些值。有人能帮帮忙吗?
import pandas as pd
import os
#Read input file
df = pd.read_csv('path/to/my/input.csv')
#separate IDs that are =0 and >0 into separate files
mask = df['id'].isin(df.loc[df['userCount'] != 0, 'id'])
ids1 = df.loc[~mask, 'id'].unique()
ids2 = df.loc[mask, 'id'].unique()
df1 = pd.DataFrame({'id':ids1, 'userCount':0})
df2 = pd.DataFrame({'id':ids2, 'userCount':'>0'})
df1.to_csv('path/to/my/temp/file/zeroFlagID.csv')
df2.to_csv('path/to/my/temp/file/NonZeroFlagID.csv')
fields=['id','product','userCount','warehouse','date']
#read IDs from temp file and merge with relevant columns from input file
df3 = pd.read_csv('path/to/my/temp/file/zeroFlagID.csv', sep = ',')
df4 = pd.read_csv('path/to/my/input.csv', sep = ',', usecols=fields)
df5 = df3.merge(df4, on="id")
df5.to_csv('path/to/my/output1.csv', index = False)
fields2=['id','product','userCount','warehouse','date']
#read IDs from temp file and merge with relevant columns from input file
df6 = pd.read_csv('path/to/my/temp/file/NonZeroFlagID.csv', sep = ',')
df7 = pd.read_csv('path/to/my/input.csv', sep = ',', usecols=fields2)
df8 = df6.merge(df7, on="id")
df8.to_csv('path/to/my/output2.csv', index = False)
#remove temp files
os.remove('path/to/my/temp/file/NonZeroFlagID.csv')
os.remove('path/to/my/temp/file/zeroFlagID.csv')发布于 2020-03-12 16:28:37
您可以使用:
import pandas as pd
df = pd.read_csv('Output.csv', sep=',')
prod_cnt = df.groupby('product').userCount.sum()
df[df['product'].isin(prod_cnt[prod_cnt == 0].index)].to_csv('Output1.csv')
df[df['product'].isin(prod_cnt[prod_cnt != 0].index)].to_csv('Output2.csv')输入文件截图:

https://stackoverflow.com/questions/60650108
复制相似问题