首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >csv文件中的Python条件过滤

csv文件中的Python条件过滤
EN

Stack Overflow用户
提问于 2014-10-07 14:07:56
回答 2查看 4.4K关注 0票数 0

请帮帮我!我尝试过不同的东西/包,编写一个程序,它接受4个输入,并根据csv文件的输入组合返回组的写作分数统计数据。这是我的第一个项目,所以我会感谢任何见解/提示/提示!

下面是csv示例(共有200行):

代码语言:javascript
复制
id  gender  ses schtyp  prog        write
70  male    low public  general     52
121 female  middle  public  vocation    68
86  male    high    public  general     33
141 male    high    public  vocation    63      
172 male    middle  public  academic    47
113 male    middle  public  academic    44
50  male    middle  public  general     59
11  male    middle  public  academic    34      
84  male    middle  public  general     57      
48  male    middle  public  academic    57      
75  male    middle  public  vocation    60      
60  male    middle  public  academic    57  

以下是我到目前为止所拥有的:

代码语言:javascript
复制
import csv
import numpy
csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file
header=csv_file_object.next() #skips header
data=[] #loads data into array for processing
for row in csv_file_object:
    data.append(row)
data=numpy.array(data)

#asks for inputs 
gender=raw_input('Enter gender [male/female]: ')
schtyp=raw_input('Enter school type [public/private]: ')
ses=raw_input('Enter socioeconomic status [low/middle/high]: ')
prog=raw_input('Enter program status [general/vocation/academic: ')

#makes them lower case and strings
prog=str(prog.lower())
gender=str(gender.lower())
schtyp=str(schtyp.lower())
ses=str(ses.lower())

我所缺少的是如何只为特定的组过滤和获取统计数据。例如,我输入男性,公共,中级和学术性--我想得到这个子集的平均写作分数。我尝试了熊猫的群比功能,但这只会让你获得更广泛的群体的统计数据(比如公共和私人)。我也尝试过来自熊猫的DataFrame,但这只让我过滤了一个输入,不知道如何获得写作分数。任何提示都将不胜感激!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-10-07 18:10:54

同意雷蒙的观点,Pandas绝对是最好的选择,一旦你习惯了它,它就有了非凡的过滤/子设置功能。但是,首先很难把你的头绕过来(至少对我来说是这样!),所以我从我的一些旧代码中找到了一些你需要的子设置的例子。下面的变量itu是一个Pandas DataFrame,具有不同国家随时间变化的数据。

代码语言:javascript
复制
# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US
票数 1
EN

Stack Overflow用户

发布于 2014-10-07 17:28:31

看看熊猫。我认为这将缩短您的csv解析工作,并使您所要求的子集功能.

代码语言:javascript
复制
import pandas as pd
data = pd.read_csv('fileName.txt', delim_whitespace=True)

#get all of the male students
data[data['gender'] == 'male']
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/26237985

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档