我希望在DataFrame中将True和False转换为特定值。我正在寻找以秒为单位的"time“变量中的所有时间,这些时间小于300,用一个特定的数字,例如"1”。任何数字后面的任何数字(小于300秒)超过300秒将得到相同的具体数字'1‘。该数字(大于300秒)之后的任何数字都应该始终小于300秒,并获得另一个特定的数字,例如"2“等。
下面是我的代码:
import time
from datetime import datetime, date, time, timedelta
from datetime import datetime as dt
import numpy as np
df['timestamp'] = pd.to_datetime (df['timestamp'])
df['delta'] = (df['timestamp']-df['timestamp'].shift())
df['time'] = df['delta'].dt.total_seconds()
df['outlier'] = df['time'] > 300
df['Column1'] = np.where(df['outlier'], np.where(df['time'] > 300, '1','1'),'na')这是输入。这是我拥有的DataFrame的一个样本:
timestamp delta time outlier output
0 2020-11-08 17:54:53 NaT NaN False na
1 2020-11-08 17:54:56 0 days 00:00:03 3.0 False na
2 2020-11-08 17:54:57 0 days 00:00:01 1.0 False na
3 2020-11-08 21:04:41 0 days 03:09:44 11384.0 True 1
4 2020-11-08 21:04:52 0 days 00:00:11 11.0 False na
5 2020-11-08 21:04:53 0 days 00:00:01 1.0 False na
6 2020-11-10 20:36:32 1 days 23:31:39 171099.0 True 1
7 2020-11-10 20:37:01 0 days 00:00:29 29.0 False na
8 2020-11-10 20:37:04 0 days 00:00:03 3.0 False na这是我正在寻找的实际输出:
timestamp delta time outlier output
0 2020-11-08 17:54:53 NaT NaN False NaN
1 2020-11-08 17:54:56 0 days 00:00:03 3.0 False 1
2 2020-11-08 17:54:57 0 days 00:00:01 1.0 False 1
3 2020-11-08 21:04:41 0 days 03:09:44 11384.0 True 1
4 2020-11-08 21:04:52 0 days 00:00:11 11.0 False 2
5 2020-11-08 21:04:53 0 days 00:00:01 1.0 False 2
6 2020-11-10 20:36:32 1 days 23:31:39 171099.0 True 2
7 2020-11-10 20:37:01 0 days 00:00:29 29.0 False 3
8 2020-11-10 20:37:04 0 days 00:00:03 3.0 False 3 请注意,这只是Dataframe的一个示例,因此请帮助我修复上面的代码,并使其适用于具有大量行的Dataframe。
发布于 2021-01-03 04:20:05
像这样的东西?
df['output'] = (df.outlier.cumsum() + 1).map(str).shift()如果您更喜欢整数:
df['output'] = (df.outlier.cumsum() + 1).map(int).astype(object).shift()https://stackoverflow.com/questions/65543166
复制相似问题