请帮帮我!我尝试过不同的东西/包,编写一个程序,它接受4个输入,并根据csv文件的输入组合返回组的写作分数统计数据。这是我的第一个项目,所以我会感谢任何见解/提示/提示!
下面是csv示例(共有200行):
id gender ses schtyp prog write
70 male low public general 52
121 female middle public vocation 68
86 male high public general 33
141 male high public vocation 63
172 male middle public academic 47
113 male middle public academic 44
50 male middle public general 59
11 male middle public academic 34
84 male middle public general 57
48 male middle public academic 57
75 male middle public vocation 60
60 male middle public academic 57 以下是我到目前为止所拥有的:
import csv
import numpy
csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file
header=csv_file_object.next() #skips header
data=[] #loads data into array for processing
for row in csv_file_object:
data.append(row)
data=numpy.array(data)
#asks for inputs
gender=raw_input('Enter gender [male/female]: ')
schtyp=raw_input('Enter school type [public/private]: ')
ses=raw_input('Enter socioeconomic status [low/middle/high]: ')
prog=raw_input('Enter program status [general/vocation/academic: ')
#makes them lower case and strings
prog=str(prog.lower())
gender=str(gender.lower())
schtyp=str(schtyp.lower())
ses=str(ses.lower())我所缺少的是如何只为特定的组过滤和获取统计数据。例如,我输入男性,公共,中级和学术性--我想得到这个子集的平均写作分数。我尝试了熊猫的群比功能,但这只会让你获得更广泛的群体的统计数据(比如公共和私人)。我也尝试过来自熊猫的DataFrame,但这只让我过滤了一个输入,不知道如何获得写作分数。任何提示都将不胜感激!
发布于 2014-10-07 18:10:54
同意雷蒙的观点,Pandas绝对是最好的选择,一旦你习惯了它,它就有了非凡的过滤/子设置功能。但是,首先很难把你的头绕过来(至少对我来说是这样!),所以我从我的一些旧代码中找到了一些你需要的子设置的例子。下面的变量itu是一个Pandas DataFrame,具有不同国家随时间变化的数据。
# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania' # returns True/False values
itu[subset] # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania'] # one-line command, equivalent to the above two lines
# Pandas has many built-in functions like .isin() to provide params to filter on
itu[itu.cntrycode.isin(['USA','FRA'])] # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])] # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])] # Both of above at same time
# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]
# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) &
itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]
# Finally, if you're comfortable with using map() and list comprehensions,
you can do some advanced subsetting that includes evaluations & functions
to determine what elements you want to select from the whole, such as all
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName'] # gives us UAE, UK, & US发布于 2014-10-07 17:28:31
看看熊猫。我认为这将缩短您的csv解析工作,并使您所要求的子集功能.
import pandas as pd
data = pd.read_csv('fileName.txt', delim_whitespace=True)
#get all of the male students
data[data['gender'] == 'male']https://stackoverflow.com/questions/26237985
复制相似问题