我正在运行Python2.6。我有下面的示例,其中我试图连接csv文件中的日期和时间字符串列。基于我设置的dtype (None vs object),我看到了一些我无法解释的行为差异,见文章末尾的问题1和问题2。返回的异常不是太描述性的,当dtype设置为object时,dtype文档没有提到预期的任何特定行为。
下面是片段:
#! /usr/bin/python
import numpy as np
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())
# (Fail) case 1: dtype=None splicing a column fails
next(data) # eat away the title line
header = [item.strip() for item in next(data).split(',')] # get the headers
arr1 = np.genfromtxt(data, dtype=None, delimiter=',',skiprows=1)# skiprows=1 for the row with units
arr1.dtype.names = header # assign the header to names
# so we can do y=arr['Speed']
y1 = arr1['Speed']
# Q1 IndexError: invalid index
#a1 = arr1[:,0]
#print a1
# EDIT1:
print "arr1.shape "
print arr1.shape # (3,)
# Fails as expected TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'
# z1 = arr1['Date'] + arr1['Time']
# This can be workaround by specifying dtype=object, which leads to case 2
data.seek(0) # resets
# (Fail) case 2: dtype=object assign header fails
next(data) # eat away the title line
header = [item.strip() for item in next(data).split(',')] # get the headers
arr2 = np.genfromtxt(data, dtype=object, delimiter=',',skiprows=1) # skiprows=1 for the row with units
# Q2 ValueError: there are no fields define
#arr2.dtype.names = header # assign the header to names. so we can use it to do indexing
# ie y=arr['Speed']
# y2 = arr['Date'] + arr['Time'] # column headings were assigned previously by arr.dtype.names = header
data.seek(0) # resets
# (Good) case 3: dtype=object but don't assign headers
next(data) # eat away the title line
header = [item.strip() for item in next(data).split(',')] # get the headers
arr3 = np.genfromtxt(data, dtype=object, delimiter=',',skiprows=1) # skiprows=1 for the row with units
y3 = arr3[:,0] + arr3[:,1] # slice the columns
print y3
# case 4: dtype=None, all data are ints, array dimension 2-D
# simulate a csv file
from StringIO import StringIO
data2 = StringIO("""
Title
Date,Time,Speed
,,(m/s)
45,46,85
12,13,86
50,46,87
""".strip())
next(data2) # eat away the title line
header = [item.strip() for item in next(data2).split(',')] # get the headers
arr4 = np.genfromtxt(data2, dtype=None, delimiter=',',skiprows=1)# skiprows=1 for the row with units
#arr4.dtype.names = header # Value error
print "arr4.shape "
print arr4.shape # (3,3)
data2.seek(0) # resets问题1:在评论Q1,为什么我不能分割专栏,而dtype=None呢?( a) arr1 1=np-genfromtxt.是用dtype=object初始化的,如案例3,b) arr1.dtype.names=.被注释掉,以避免案例2中的值错误。
问题2:在评论Q2,为什么不能设置dtype.names当dtype=object?
EDIT1:
添加了一个案例4,它显示如果模拟的csv文件中的值都是ints,数组的维数将是2-D。可以对列进行切片,但是分配dtype.names仍然会失败。
将“剪接”一词更新为“切片”。
发布于 2013-07-19 10:00:46
问题1
这是索引,而不是“剪接”,而且您不能索引到data的列中,原因与我之前在回答这里时解释的完全相同。看看arr1.shape -它是(3,),也就是说arr1是1D,而不是2D。没有可编入索引的列。
现在看看arr2的形状--您将看到它是(3,3)。为什么会这样呢?如果您确实指定了dtype=desired_type,np.genfromtxt将将输入字符串的每个分隔部分处理为相同(即作为desired_type),并且它将给您返回一个普通的、非结构化的numpy数组。
我不太清楚你想用这句话做什么:
z1 = arr1['Date'] + arr1['Time'] 您是否有意将日期和时间字符串连接在一起,如:'2012-04-01 00:10'?你可以这样做:
z1 = [d + ' ' + t for d,t in zip(arr1['Date'],arr1['Time'])]这取决于您想要对输出做什么(这将给您一个字符串列表,而不是numpy数组)。
我应该指出,从1.7版开始,Numpy就有功能了。这将使您能够做更多有用的事情,如计算时间、增量等。
dts = np.array(z1,dtype=np.datetime64)编辑:如果要绘制timeseries数据,可以使用matplotlib.dates.strpdate2num将字符串转换为matplotlib数据,然后使用plot_date()
from matplotlib import dates
from matplotlib import pyplot as pp
# convert date and time strings to matplotlib datenums
dtconv = dates.strpdate2num('%Y-%m-%d%H:%M')
datenums = [dtconv(d+t) for d,t in zip(arr1['Date'],arr1['Time'])]
# use plot_date to plot timeseries
pp.plot_date(datenums,arr1['Speed'],'-ob')您还应该看看Pandas,它有一些可视化timeseries数据的不错工具。
问题2
您不能设置names of arr2,因为它不是结构化数组(参见上文)。
https://stackoverflow.com/questions/17739072
复制相似问题