我有一列'Flight_Time‘,它显示飞行的时间和分钟(%H:%M格式,例如02:22,timedelta64ns),我想创建一个新列(df’‘val’),将飞行时间分类为‘0-2小时’、‘2-4小时’、‘4-6小时’或'6+hrs‘,这样我就可以画出这4个新变量的总和。
有人能建议我如何设置下面的if语句来创建“val”列来对这4个子集进行分类吗?飞行时间列是一个timedelta64ns对象。
if df['Flight_Time'] >= '0:00' & df['Flight_Time'] < '02:00':
df['val'] = '0-2 hrs'
elif df['Flight_Time'] >= '2:00' & df['Flight_Time'] < '04:00':
df['val'] = '2-4 hrs'
elif df['Flight_Time'] >= '4:00' & df['Flight_Time'] < '06:00':
df['val'] = '4-6 hrs'
else:
df['val'] = '6+ hrs' 目标dataframe输出将类似于:
Flight Time val
0 00:00 0-2 hr
1 00:01 0-2 hr
2 04:05 2-4 hr
3 10:08 6+ hr
4 02:10 2-4 hr更新:我已经将代码更改为下面的代码,但是现在我只在新创建的'Val‘列中获得6+小时
from datetime import timedelta
for x in df["Flight_Time"]:
if timedelta(hours = 0, minutes = 0) < x <= timedelta(hours = 2, minutes = 0):
df['val'] = '0-2 hrs'
elif timedelta(hours = 2, minutes = 0) < x <= timedelta(hours = 4, minutes = 0):
df['val'] = '2-4 hrs'
elif timedelta(hours = 4, minutes = 0) < x <= timedelta(hours = 6, minutes = 0):
df['val'] = '4-6 hrs'
else:
df['val'] = '6+ hrs' 发布于 2020-11-19 21:24:40
"""it's a bit redundant as implementation
however it shows how to handle
rows and columns with pandas the way you want
- just add the comparison using datetime types
(because here I used integers)
"""
import pandas as pd
df = pd.DataFrame([
[0.5],
[3],
[5],
[10]
], columns=["flight_time"])
df["flight_interval"] = None
df.loc[df["flight_time"] < 2, ["flight_interval"]] = "0-2 hrs"
df.loc[(df["flight_time"] > 2) & (df["flight_time"] < 4), ["flight_interval"]] = "2-4 hrs"
df.loc[(df["flight_time"] > 4) & (df["flight_time"] < 6), ["flight_interval"]] = "4-6 hrs"
df.loc[(df["flight_time"] > 6) , ["flight_interval"]] = "6+ hrs"
print(df)输出
+----+---------------+-------------------+
| | flight_time | flight_interval |
|----+---------------+-------------------|
| 0 | 0.5 | 0-2 hrs |
| 1 | 3 | 2-4 hrs |
| 2 | 5 | 4-6 hrs |
| 3 | 10 | 6+ hrs |
+----+---------------+-------------------+-编辑:具有timedelta类型的版本
import pandas as pd
from datetime import timedelta
df = pd.DataFrame([
[0.5],
[3],
[5],
[10]
], columns=["flight_time"])
df["flight_time"] = pd.to_timedelta(df["flight_time"], unit="hours")
df["flight_interval"] = None
df.loc[df["flight_time"] < timedelta(hours=2), ["flight_interval"]] = "0-2 hrs"
df.loc[(df["flight_time"] > timedelta(hours=2)) & (df["flight_time"] < timedelta(hours=4)), ["flight_interval"]] = "2-4 hrs"
df.loc[(df["flight_time"] > timedelta(hours=4)) & (df["flight_time"] < timedelta(hours=6)), ["flight_interval"]] = "4-6 hrs"
df.loc[(df["flight_time"] > timedelta(hours=6)), ["flight_interval"]] = "6+ hrs"+----+-----------------+-------------------+
| | flight_time | flight_interval |
|----+-----------------+-------------------|
| 0 | 0 days 00:30:00 | 0-2 hrs |
| 1 | 0 days 03:00:00 | 2-4 hrs |
| 2 | 0 days 05:00:00 | 4-6 hrs |
| 3 | 0 days 10:00:00 | 6+ hrs |
+----+-----------------+-------------------+发布于 2020-11-19 21:03:58
像这样的事情应该有效:
df["val"] = pd.cut(df["Flight_Time"], bins=[2,4,6,8,10,12])https://stackoverflow.com/questions/64919898
复制相似问题