首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用CSV的疫苗数据操作Pandas Dataframe在matplotlib上显示

用CSV的疫苗数据操作Pandas Dataframe在matplotlib上显示
EN

Code Review用户
提问于 2021-03-24 19:12:04
回答 1查看 101关注 0票数 3

我有一些代码可以操作包含新冠肺炎疫苗数据的Pandas,并将其显示在Matplotlib上。

数据在这里:https://covid.ourworldindata.org/data/owid-covid-data.csv (下载CSV)。

我对数据进行了操作,所以它只显示了当前每百种疫苗率低于10的国家(所以它不能去除所有低于10的疫苗率,它必须经过每个国家,每百率获得最新的疫苗,如果小于10,则将该国从图表中删除)。

这是高度时间敏感的,需要尽快完成。

代码:

代码语言:javascript
复制
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator
import datetime

df = pd.read_csv(
    "https://covid.ourworldindata.org/data/owid-covid-data.csv", 
    usecols=["date", "location", "total_vaccinations_per_hundred"], 
    parse_dates=["date"])

df = df[df["total_vaccinations_per_hundred"].notna()]
countries = df["location"].unique().tolist()
countries_copy = countries.copy()

main_country = "United States"

for country in countries:
    if country in countries:
        df_with_current_country = df[df['location']==country]
        if df_with_current_country[df["date"]==df_with_current_country["date"].max()]["total_vaccinations_per_hundred"].tolist()[0] < 10:
            if country != main_country: countries_copy.remove(country)

countries = countries_copy
df = df[df["location"].isin(countries)]

pivot = pd.pivot_table(
    data=df,                                    # What dataframe to use
    index="date",                               # The "rows" of your dataframe
    columns="location",                         # What values to show as columns
    values="total_vaccinations_per_hundred",    # What values to aggregate
    aggfunc="mean",                             # How to aggregate data
)

pivot = pivot.fillna(method="ffill")

# Step 4: Plot all countries
fig, ax = plt.subplots(figsize=(12,8))
fig.patch.set_facecolor("#F5F5F5")    # Change background color to a light grey
ax.patch.set_facecolor("#F5F5F5")     # Change background color to a light grey

for country in countries:
    if country == main_country:
        country_color = "#129583"
        alpha_color = 1.0
    else:
        country_color = "grey"
        alpha_color = 0.75
    ax.plot(
        pivot.index,              # What to use as your x-values
        pivot[country],           # What to use as your y-values
        color=country_color,    # How to color your line
        alpha=alpha_color     # What transparency to use for your line
    )
    if country_color != "grey":
        ax.text(
            x = pivot.index[-1] + datetime.timedelta(days=2),    # Where to position your text relative to the x-axis
            y = pivot[country].max(),                   # How high to position your text
            color = country_color,                    # What color to give your text
            s = country,                                # What to write
            alpha=alpha_color                       # What transparency to use
        )

# Step 5: Configures axes
## A) Format what shows up on axes and how it"s displayed 
date_form = DateFormatter("%Y-%m-%d")
ax.xaxis.set_major_locator(WeekdayLocator(byweekday=(0), interval=1))
ax.xaxis.set_major_formatter(date_form)
plt.xticks(rotation=45)
plt.ylim(0,100)

## B) Customizing axes and adding a grid
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_color("#3f3f3f")
ax.spines["left"].set_color("#3f3f3f")
ax.tick_params(colors="#3f3f3f")
ax.grid(alpha=0.1)

## C) Adding a title and axis labels
plt.ylabel("Total Vaccinations per 100 People", fontsize=12, alpha=0.9)
plt.xlabel("Date", fontsize=12, alpha=0.9)
plt.title("COVID-19 Vaccinations over Time", fontsize=18, weight="bold", alpha=0.9)

# D) Celebrate!
plt.show()
EN

回答 1

Code Review用户

回答已采纳

发布于 2021-03-28 17:29:57

循环和countries部分可以替换为单个DataFrameGroupBy.filter

  • .groupby('location') -按国家分列
  • .sort_values('date') -按日期排序的国家(最后是最新的)
  • .tail(1) >= 10 -只保留最新利率至少为10的国家
  • | (country.name == main_country) -永远保持main_country
代码语言:javascript
复制
df2 = df.copy() # deep copy original df before loop (only to compare later)

df2 = df2.groupby('location').filter(lambda country:
    (country.sort_values('date').total_vaccinations_per_hundred.tail(1) >= 10)
    | (country.name == main_country)
)

#       location       date  total_vaccinations_per_hundred
# 1930   Andorra 2021-01-25                            0.75
# 1937   Andorra 2021-02-01                            1.34
# ...        ...        ...                             ...
# 74507  Uruguay 2021-03-26                           14.31
# 74508  Uruguay 2021-03-27                           14.68
# 
# [3507 rows x 3 columns]

如果我们运行countries部分,我们可以验证过滤后的df2是否与环df匹配:

代码语言:javascript
复制
df2.equals(df) # compare with df after loop

# True
票数 2
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/257631

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档