我有一些代码可以操作包含新冠肺炎疫苗数据的Pandas,并将其显示在Matplotlib上。
数据在这里:https://covid.ourworldindata.org/data/owid-covid-data.csv (下载CSV)。
我对数据进行了操作,所以它只显示了当前每百种疫苗率低于10的国家(所以它不能去除所有低于10的疫苗率,它必须经过每个国家,每百率获得最新的疫苗,如果小于10,则将该国从图表中删除)。
这是高度时间敏感的,需要尽快完成。
代码:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator
import datetime
df = pd.read_csv(
"https://covid.ourworldindata.org/data/owid-covid-data.csv",
usecols=["date", "location", "total_vaccinations_per_hundred"],
parse_dates=["date"])
df = df[df["total_vaccinations_per_hundred"].notna()]
countries = df["location"].unique().tolist()
countries_copy = countries.copy()
main_country = "United States"
for country in countries:
if country in countries:
df_with_current_country = df[df['location']==country]
if df_with_current_country[df["date"]==df_with_current_country["date"].max()]["total_vaccinations_per_hundred"].tolist()[0] < 10:
if country != main_country: countries_copy.remove(country)
countries = countries_copy
df = df[df["location"].isin(countries)]
pivot = pd.pivot_table(
data=df, # What dataframe to use
index="date", # The "rows" of your dataframe
columns="location", # What values to show as columns
values="total_vaccinations_per_hundred", # What values to aggregate
aggfunc="mean", # How to aggregate data
)
pivot = pivot.fillna(method="ffill")
# Step 4: Plot all countries
fig, ax = plt.subplots(figsize=(12,8))
fig.patch.set_facecolor("#F5F5F5") # Change background color to a light grey
ax.patch.set_facecolor("#F5F5F5") # Change background color to a light grey
for country in countries:
if country == main_country:
country_color = "#129583"
alpha_color = 1.0
else:
country_color = "grey"
alpha_color = 0.75
ax.plot(
pivot.index, # What to use as your x-values
pivot[country], # What to use as your y-values
color=country_color, # How to color your line
alpha=alpha_color # What transparency to use for your line
)
if country_color != "grey":
ax.text(
x = pivot.index[-1] + datetime.timedelta(days=2), # Where to position your text relative to the x-axis
y = pivot[country].max(), # How high to position your text
color = country_color, # What color to give your text
s = country, # What to write
alpha=alpha_color # What transparency to use
)
# Step 5: Configures axes
## A) Format what shows up on axes and how it"s displayed
date_form = DateFormatter("%Y-%m-%d")
ax.xaxis.set_major_locator(WeekdayLocator(byweekday=(0), interval=1))
ax.xaxis.set_major_formatter(date_form)
plt.xticks(rotation=45)
plt.ylim(0,100)
## B) Customizing axes and adding a grid
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_color("#3f3f3f")
ax.spines["left"].set_color("#3f3f3f")
ax.tick_params(colors="#3f3f3f")
ax.grid(alpha=0.1)
## C) Adding a title and axis labels
plt.ylabel("Total Vaccinations per 100 People", fontsize=12, alpha=0.9)
plt.xlabel("Date", fontsize=12, alpha=0.9)
plt.title("COVID-19 Vaccinations over Time", fontsize=18, weight="bold", alpha=0.9)
# D) Celebrate!
plt.show()发布于 2021-03-28 17:29:57
循环和countries部分可以替换为单个DataFrameGroupBy.filter:
.groupby('location') -按国家分列.sort_values('date') -按日期排序的国家(最后是最新的).tail(1) >= 10 -只保留最新利率至少为10的国家| (country.name == main_country) -永远保持main_countrydf2 = df.copy() # deep copy original df before loop (only to compare later)
df2 = df2.groupby('location').filter(lambda country:
(country.sort_values('date').total_vaccinations_per_hundred.tail(1) >= 10)
| (country.name == main_country)
)
# location date total_vaccinations_per_hundred
# 1930 Andorra 2021-01-25 0.75
# 1937 Andorra 2021-02-01 1.34
# ... ... ... ...
# 74507 Uruguay 2021-03-26 14.31
# 74508 Uruguay 2021-03-27 14.68
#
# [3507 rows x 3 columns]如果我们运行countries部分,我们可以验证过滤后的df2是否与环df匹配:
df2.equals(df) # compare with df after loop
# Truehttps://codereview.stackexchange.com/questions/257631
复制相似问题