我有一张像这样的数据。
Asset name Risk Chief Risks Risk category Risk start Risk end
0 Stanton County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-10 00:00:00
1 Seward County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-10 00:00:00
2 Hansford County, TX, US (Corn) Temperature high 2020-06-29 12:00:00 2020-07-02 00:00:00
3 Haskell County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-10 00:00:00
4 Hale County, TX, US (Corn) Temperature high 2020-06-29 12:00:00 2020-07-02 00:00:00
5 Greeley County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-01 00:00:00我已经创建了一个新的数据,它看起来像这样,从今天的日期+ 14天。
25 Jun 26 Jun 27 Jun 28 Jun 29 Jun 30 Jun 01 Jul 02 Jul 03 Jul 04 Jul 05 Jul 06 Jul 07 Jul 08 Jul 09 Jul 10 Jul我希望每天都有一个与高风险的日子相对应的计数。例如,第一个城市在6-25到7-10之间有很高的风险。因此,每一天都有一天的计数。第三个城市在6-29到7-2之间有很高的风险。因此,这只会使计算的具体天数增加1天。首先,我该如何做到这一点?第二,是否有一种更优雅的方式来做/展示这一点?
发布于 2020-06-25 18:39:34
我会这样做:
import pandas as pd
from io import StringIO
# Data Setup
df = pd.read_csv(StringIO("""Asset name Risk Chief Risks Risk category Risk start Risk end
0 Stanton County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-10 00:00:00
1 Seward County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-10 00:00:00
2 Hansford County, TX, US (Corn) Temperature high 2020-06-29 12:00:00 2020-07-02 00:00:00
3 Haskell County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-10 00:00:00
4 Hale County, TX, US (Corn) Temperature high 2020-06-29 12:00:00 2020-07-02 00:00:00
5 Greeley County, KS, US (Corn) Temperature high 2020-06-25 18:00:00 2020-07-01 00:00:00"""), sep="\s\s+", engine="python", index_col=0)
# Date Range
df["Risk start"] = pd.to_datetime(df["Risk start"])
min_date = df["Risk start"].min().date()
df["Risk end"] = pd.to_datetime(df["Risk end"])
max_date = df["Risk end"].max().date()
dates = pd.date_range(min_date, max_date)
# New Output DataFrame
df1 = pd.DataFrame(index=dates, columns=df['Asset name'], data=[[0]*len(df)]*len(dates))
length = (df["Risk end"] - df["Risk start"]).dt.ceil('d').dt.days
# Iterate the cities and assign value 1 at the appropriate locations
for city_ind in length.index:
sd = df["Risk start"].dt.floor('d').iloc[city_ind]
ed = length[city_ind]
df1.loc[sd:sd+pd.to_timedelta("{} days".format(ed)), df["Asset name"].iloc[city_ind]] = 1
print(df1.T.to_string())输出:
2020-06-25 2020-06-26 2020-06-27 2020-06-28 2020-06-29 2020-06-30 2020-07-01 2020-07-02 2020-07-03 2020-07-04 2020-07-05 2020-07-06 2020-07-07 2020-07-08 2020-07-09 2020-07-10
Asset name
Stanton County, KS, US 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Seward County, KS, US 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Hansford County, TX, US 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
Haskell County, KS, US 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Hale County, TX, US 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
Greeley County, KS, US 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0请注意,在最后我是如何进行转置的。在我看来,用日期作为索引和城市作为列来表达这一点更为优雅,但我想这取决于你。
https://stackoverflow.com/questions/62580215
复制相似问题