首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据特定类别绘制多个时间序列数据ei:灾害类型==洪水

如何根据特定类别绘制多个时间序列数据ei:灾害类型==洪水
EN

Stack Overflow用户
提问于 2021-07-14 18:20:10
回答 1查看 117关注 0票数 1

我创建了以下df,它包含特定类型的灾难在给定年份中发生的次数,我想要创建一个多行的图表,描述随着时间的变化,每年发生的每一次灾难的数量。因此,每一种灾害类型都有自己的界线,例如,人们可以看到,冬天的风暴在减少,而干旱在增加吗?

目前,我已经尝试定义X和y,但是,我不知道如何按洪水分组,并且每年都要添加这个数字。由于某些原因,当运行时,我得到了一个关键错误:'Start_year‘--这可能是因为开始年被用作一个索引,但是我重新设置了它,如下所示,它应该已经处理好了。抱歉,这个有点新。

代码语言:javascript
复制
#Number of each type of disaster each year
df_yearly_tcount = df_time.groupby(['Start_year', 'Disaster_Type']).size()

yearly_tcount=pd.DataFrame(df_yearly_tcount)
yearly_tcount.reset_index()

X = yearly_tcount['Start_year']
y = yearly_tcount(['Disaster_type']=='Flood')

plt.plot(X, y, label = 'Flood')

整个代码:

代码语言:javascript
复制
import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd 
import seaborn as sns 


from scipy.stats import zscore

#Import Datased
df = pd.read_csv('database.csv')

df_time = (df[['County','Disaster Type','Start Date', 'End Date']][0: :])

#Preprocessing      
     
#Number of NaN values          
df_nan = df[['County','Disaster Type','Start Date', 'End Date']].isna().sum()

#NaN values as a percentage as total 
df_nan_number = [(df_nan.sum(axis=0)), str((((539/45330)*100))) +'%']

#Remove NaN values
df_time.dropna(subset = ["County", 'End Date'], inplace=True)

#Set Date Format
df_time['Start_Date_A'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y')
df_time['End_Date_A'] = pd.to_datetime(df['End Date'], format='%m/%d/%Y')

#Create new column == Disaster Length
df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days

#Create new column == start year
df_time['Start_year'] = df_time['Start_Date_A'].dt.year

#Dropped  Old Date Formats from df
df_time = df_time.drop(columns=['Start Date', 'End Date'], axis=1)

#Replace 0 day values with 1 to indicate a Disaster length of 1 Day
df_time['Disaster_Length'] = df_time['Disaster_Length'].replace({0:1})

#Replace all values with absolute values so all days are represented as positive numeric values
df_time['Disaster_Length'] = df_time['Disaster_Length'].abs()


# Locating man-made and non 'natural' disasters, sorting Disaster types, and analyzing value counts
df_DTypes= df_time['Disaster Type'].values

df_DTypes=pd.DataFrame(df_DTypes)

df_DType_VCounts=(df_DTypes.value_counts()).sort_values(ascending=True)

df_DType_Natural=(df_DType_VCounts.drop(['Human Cause', 'Chemical', 'Dam/Levee Break', 'Terrorism','Other'],axis=0)).sort_values(ascending=True)

df_time = df_time.rename(columns={'Disaster Type': 'Disaster_Type'})

#Removing non-natural disasters from main df_time
df_time = df_time[(df_time.Disaster_Type != 'Human Cause') & (df_time.Disaster_Type != 'Chemical') & (df_time.Disaster_Type != 'Dam/Levee Break') & (df_time.Disaster_Type != 'Terrorism') & (df_time.Disaster_Type != 'Other') ]

#Resetting index for final df Analysis 
df_time.reset_index(drop=True, inplace = True)

#Analysis 

#Dataframe with mean disaster length for each year
df_yearly_mean_len = df_time.groupby(['Start_year']).mean()

df_yearly_mean_len.reset_index().plot('Start_year','Disaster_Length')


#Number of disasters declared per year
yearly_dcount = df_time.groupby(['Start_year']).size()


yearly_dcount=pd.DataFrame(yearly_dcount)
yearly_dcount.columns=['Number_of_Disasters']



#Visualizing change in total number of disasters over time 
yearly_dcount.reset_index().plot('Start_year','Number_of_Disasters')


#Number of each type of disaster each year
df_yearly_tcount = df_time.groupby(['Start_year', 'Disaster_Type']).size()

yearly_tcount=pd.DataFrame(df_yearly_tcount)
yearly_tcount.reset_index()

X = yearly_tcount['Start_year']
y = yearly_tcount(['Disaster_type']=='Flood')

plt.plot(X, y, label = 'Flood')

Df

代码语言:javascript
复制
                            0
Start_year Disaster_Type     
1959       Flood            1
1964       Flood          115
1965       Drought         51
           Earthquake       6
           Flood          198
           Hurricane       56
           Storm            6
           Tornado        112
1966       Flood          113
           Tornado          2
           Typhoon          5
1967       Fire            10
           Flood          121
           Hurricane       29
           Tornado         36
           Typhoon          1
1968       Flood           76
           Hurricane       14
           Ice             21
           Tornado         50
           Typhoon          1
1969       Flood          394
           Hurricane       64
           Storm            1
           Tornado         46
1970       Fire             6
           Flood          180
           Hurricane        7
           Storm           17
           Tornado         11

原始数据集https://www.kaggle.com/fema/federal-disasters

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-14 19:13:01

看来你走在正确的轨道上。您的许多代码/样式似乎都在朝着正确的方向发展。我把你的数据放进CSV重新设置多个索引。在此之后,绘制数据是相当简单的。如果有更多的数据,情况可能会更好,但目前有多个异常值和缺失数据的灾害(例如1959年和1964年)。此外,如果你使用一个线图,那么你比较的是相同的y轴,这会使你很难比较低频和高频的灾难(例如。地震与洪水)。您也可以绘制百分比变化图,但如果提供数据,这看起来也不太好。最后,您可以使用堆叠条形图代替。就我个人而言,我认为这看起来最好。您选择如何表示您的数据取决于您图表的目标、您想要的数量或质量,以及您是否想显示原始数据,如散点图。无论如何,这里有一些图表和一些代码应该会有所帮助。

代码语言:javascript
复制
types = ['Flood', 'Drought', 'Earthquake', 'Hurricane', 'Storm', 'Tornado',
       'Typhoon', 'Fire', 'Ice']

fig, axes = plt.subplots(ncols=2, nrows=2, figsize=(16,14))
axes = axes.flatten()

ax = axes[0]
for i in range(len(types)):
    disaster_df = df[df.Disaster_Type == types[i]]
    ax.plot(disaster_df.Start_year, disaster_df.Size, linewidth=2.5, label=types[i])
ax.legend(ncol=3, edgecolor='w')
[ax.spines[s].set_visible(False) for s in ['top','right']]
ax.set_title('Disasters Raw', fontsize=16, fontweight='bold')

#remove 1959
ax = axes[1]
df2 = df.iloc[1:]

for i in range(len(types)):
    disaster_df = df2[df2.Disaster_Type == types[i]]
    ax.plot(disaster_df.Start_year, disaster_df.Size, linewidth=2.5, label=types[i])
ax.legend(ncol=3, edgecolor='w')
[ax.spines[s].set_visible(False) for s in ['top','right']]
ax.set_title('Remove 1959', fontsize=16, fontweight='bold')

#remove 1964
ax = axes[2]
df2 = df.iloc[2:]
for i in range(len(types)):
    disaster_df = df2[df2.Disaster_Type == types[i]]
    ax.plot(disaster_df.Start_year, disaster_df.Size, linewidth=2.5, label=types[i])
ax.legend(ncol=3, edgecolor='w')
[ax.spines[s].set_visible(False) for s in ['top','right']]
ax.set_title('Remove 1959 and 1964', fontsize=16, fontweight='bold')

#plot percent change
ax = axes[3]
df2 = df.iloc[2:]
for i in range(len(types)):
    disaster_df = df2[df2.Disaster_Type == types[i]]
    ax.plot(disaster_df.Start_year, disaster_df.Size.pct_change(), linewidth=2.5, label=types[i])
ax.legend(ncol=1, edgecolor='w', loc=(1, 0.5))
[ax.spines[s].set_visible(False) for s in ['top','right']]
ax.set_title('Try plotting percent change', fontsize=16, fontweight='bold')

代码语言:javascript
复制
fig, ax = plt.subplots(figsize=(12,8))
df.pivot(index='Start_year', columns = 'Disaster_Type', values='Size' ).plot.bar(stacked=True, ax=ax, zorder=3)
ax.legend(ncol=3, edgecolor='w')
[ax.spines[s].set_visible(False) for s in ['top','right', 'left']]
ax.tick_params(axis='both', left=False, bottom=False)

ax.grid(axis='y', dashes=(8,3), color='gray', alpha=0.3)
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68383292

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档