首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >酒精消费项目

酒精消费项目
EN

Code Review用户
提问于 2020-02-25 08:51:47
回答 2查看 824关注 0票数 3

我在做一个关于酒精消费的潘达斯项目。关于您的信息,dataset有以下列:

欧洲大陆乡村啤酒精神葡萄酒

以下是我的代码:

代码语言:javascript
复制
# Separating data by continent
# ----------------------------
data_asia   = data[data['Continent'] == 'Asia']
data_africa = data[data['Continent'] == 'Africa']
data_europe = data[data['Continent'] == 'Europe']
data_north  = data[data['Continent'] == 'North America']
data_south  = data[data['Continent'] == 'South America']
data_ocean  = data[data['Continent'] == 'Oceania']

top_5_asia_beer = data_asia.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_asia_spir = data_asia.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_asia_wine = data_asia.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_asia_pure = data_asia.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_africa_beer = data_africa.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_africa_spir = data_africa.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_africa_wine = data_africa.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_africa_pure = data_africa.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_europe_beer = data_europe.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_europe_spir = data_europe.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_europe_wine = data_europe.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_europe_pure = data_europe.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_north_beer = data_north.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_north_spir = data_north.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_north_wine = data_north.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_north_pure = data_north.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_south_beer = data_south.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_south_spir = data_south.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_south_wine = data_south.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_south_pure = data_south.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_ocean_beer = data_ocean.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_ocean_spir = data_ocean.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_ocean_wine = data_ocean.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_ocean_pure = data_ocean.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

我从重复和重复的角度理解我的代码的荒谬之处。谁能分享一下重构代码的技巧和技巧吗?

EN

回答 2

Code Review用户

发布于 2020-02-25 09:03:17

取决于你想用它做什么。将前5位存储在自己的变量中似乎有点奇怪。

首先,您可以使用DataFrame按大陆使用.groupby进行切片:

代码语言:javascript
复制
for continent, continent_data in data.groupby("Continent"):
    # `continent` is now the name of the continent (you don't have to type the continent names manually)
    # `continent_data` is a dataframe, being a subset of the `data` dataframe

根据第一个注释进行编辑:如果您想绘制变量,那么将每个变量存储在一个单独的变量中肯定不是一个好主意。您是否已经知道如何可视化您的数据?这是你需要努力的事情。我看不出每一个大陆的每一种含酒精饮料的前5名国家都是一个地区。

代码语言:javascript
复制
continents = []
top5s = {}
for continent, continent_data in data.groupby("Continent"):
    continents.append(continent)
    for beverage_column in ["Beer Servings", "Spirit Servings", "Wine Servings"]:
        topcountries = continent_data.nlargest(5, beverage_column)
        # do something with the data, such as:
        print(f"Top 5 countries in {continent} for {beverage}:")
        for row in topcountries.iterrows():
            print(f"- {row.Country}: {row['beverage_column']} servings")

确切地说:groupby()不返回可迭代的元组,实际上只是实现可迭代性的GroupBy对象(即这个__iter__()方法)。

票数 1
EN

Code Review用户

发布于 2020-02-26 13:11:11

sample_data

代码语言:javascript
复制
np.random.seed(42)

drinks = ["Beer", "Spirit", "Wine"]
continents = [
    "Asia",
    "Africa",
    "Europe",
    "North America",
    "South America",
    "Oceania",
]
countries = [f"country_{i}" for i in range(10)]
index = pd.MultiIndex.from_product(
    (continents, countries), names=["continent", "country"]
)
data = np.random.randint(1_000_000, size=(len(index), len(drinks )))

df = pd.DataFrame(index=index, columns=columns, data=data).reset_index()

数据结构

最让人不快的是,每个数据点都有自己的变量。

第一步是使用字典:

代码语言:javascript
复制
data_by_continent = {
    continent: df.loc[df["continent"] == continent]
    for continent in continents
}

注意,我使用.loc显式地创建了一个副本,而不是一个视图,以防止代码的一个部分中的更改污染另一个部分。

那么,每个大陆的精神消费是:

代码语言:javascript
复制
spirit_per_continent = {
    continent: data.loc[
        data["Spirit"].nlargest(5).index, ["country", "Spirit"]
    ]
    for continent, data in data_by_continent.items()
}

和每种饮料嵌套

代码语言:javascript
复制
consumption_per_drink_continent = {
    drink: {
        continent: data.loc[
            data[drink].nlargest(5).index, ["country", drink]
        ]
        for continent, data in data_by_continent.items()
    }
    for drink in drinks
}

熊猫群(

)

如果你把你的数据转换成一个整洁的格式,你可以使用一个简单的组。

pandas.melt是一种非常方便的数据格式化方法。

代码语言:javascript
复制
df2 = pd.melt(
    df,
    id_vars=["continent", "country"],
    var_name="drink",
    value_name="consumption",
)

大陆国家的饮料消费..。175个大洋洲country_5葡萄酒456551 176号大洋洲country_6葡萄酒894498 177号大洋洲country_7葡萄酒899684 178大洋洲country_8葡萄酒158338 179大洋洲country_9葡萄酒623094

现在您可以使用groupby,然后加入df2索引,介绍国家。

代码语言:javascript
复制
(
    df2.groupby(["continent", "drink"])["consumption"]
    .nlargest(5)
    .reset_index(["continent", "drink"])
    .sort_values(
        ["continent", "drink", "consumption"], ascending=[True, True, False]
    )
    .join(df2["country"])
)

continent drink consumption country 17 Africa Beer 953277 country\_7 19 Africa Beer 902648 country\_9 15 Africa Beer 527035 country\_5 13 Africa Beer 500186 country\_3 14 Africa Beer 384681 country\_4 ... ... ... ... ... 162 South America Wine 837646 country\_2 160 South America Wine 742139 country\_0 167 South America Wine 688519 country\_7 161 South America Wine 516588 country\_1 166 South America Wine 136330 country\_6 90 rows × 4 columns

票数 0
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/237876

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档