我使用的是Cloudera VM 5.2和pandas 0.18.0。
我有以下数据
adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv',
parse_dates=['timestamp'],
skipinitialspace=True).assign(adCount=1)
adclicksDF.head(n=5)
Out[107]:
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 15:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 15:22:58 5980 5920 9 1027 20 clothing
adCount
0 1
1 1
2 1
3 1
4 1 数据类型字段包括
for col in adclicksDF:
print(col)
print(type(adclicksDF[col][1]))
timestamp
<class 'pandas.tslib.Timestamp'>
txId
<class 'numpy.int64'>
userSessionId
<class 'numpy.int64'>
teamId
<class 'numpy.int64'>
userId
<class 'numpy.int64'>
adId
<class 'numpy.int64'>
adCategory
<class 'str'>
adCount
<class 'numpy.int64'>我想截断时间戳中的分钟和秒。
我试过了
adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"],format='%Y-%m-%d %H')
adclicksDF.head(n=5)
Out[110]:
timestamp txId userSessionId teamId userId adId adCategory \
0 2016-05-26 15:13:22 5974 5809 27 611 2 electronics
1 2016-05-26 15:17:24 5976 5705 18 1874 21 movies
2 2016-05-26 15:22:52 5978 5791 53 2139 25 computers
3 2016-05-26 15:22:57 5973 5756 63 212 10 fashion
4 2016-05-26 15:22:58 5980 5920 9 1027 20 clothing
adCount
0 1
1 1
2 1
3 1
4 1 这不会截断分钟和秒。
如何截断分钟和秒?
https://stackoverflow.com/questions/38271224
复制相似问题