文章/答案/技术大牛

发布

社区首页 >问答首页 >酒精消费项目

问酒精消费项目
EN

Code Review用户

提问于 2020-02-25 08:51:47

回答 2查看 824关注 0票数 3

我在做一个关于酒精消费的潘达斯项目。关于您的信息，dataset有以下列：

欧洲大陆乡村啤酒精神葡萄酒

以下是我的代码：

# Separating data by continent
# ----------------------------
data_asia   = data[data['Continent'] == 'Asia']
data_africa = data[data['Continent'] == 'Africa']
data_europe = data[data['Continent'] == 'Europe']
data_north  = data[data['Continent'] == 'North America']
data_south  = data[data['Continent'] == 'South America']
data_ocean  = data[data['Continent'] == 'Oceania']

top_5_asia_beer = data_asia.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_asia_spir = data_asia.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_asia_wine = data_asia.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_asia_pure = data_asia.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_africa_beer = data_africa.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_africa_spir = data_africa.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_africa_wine = data_africa.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_africa_pure = data_africa.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_europe_beer = data_europe.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_europe_spir = data_europe.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_europe_wine = data_europe.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_europe_pure = data_europe.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_north_beer = data_north.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_north_spir = data_north.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_north_wine = data_north.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_north_pure = data_north.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_south_beer = data_south.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_south_spir = data_south.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_south_wine = data_south.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_south_pure = data_south.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_ocean_beer = data_ocean.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_ocean_spir = data_ocean.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_ocean_wine = data_ocean.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_ocean_pure = data_ocean.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

我从重复和重复的角度理解我的代码的荒谬之处。谁能分享一下重构代码的技巧和技巧吗？

python

python-3.x

pandas

回答 2

Code Review用户

发布于 2020-02-25 09:03:17

取决于你想用它做什么。将前5位存储在自己的变量中似乎有点奇怪。

首先，您可以使用DataFrame按大陆使用.groupby进行切片：

for continent, continent_data in data.groupby("Continent"):
    # `continent` is now the name of the continent (you don't have to type the continent names manually)
    # `continent_data` is a dataframe, being a subset of the `data` dataframe

根据第一个注释进行编辑:如果您想绘制变量，那么将每个变量存储在一个单独的变量中肯定不是一个好主意。您是否已经知道如何可视化您的数据？这是你需要努力的事情。我看不出每一个大陆的每一种含酒精饮料的前5名国家都是一个地区。

continents = []
top5s = {}
for continent, continent_data in data.groupby("Continent"):
    continents.append(continent)
    for beverage_column in ["Beer Servings", "Spirit Servings", "Wine Servings"]:
        topcountries = continent_data.nlargest(5, beverage_column)
        # do something with the data, such as:
        print(f"Top 5 countries in {continent} for {beverage}:")
        for row in topcountries.iterrows():
            print(f"- {row.Country}: {row['beverage_column']} servings")

确切地说：groupby()不返回可迭代的元组，实际上只是实现可迭代性的GroupBy对象(即这个__iter__()方法)。

票数 1

Code Review用户

发布于 2020-02-26 13:11:11

sample_data

np.random.seed(42)

drinks = ["Beer", "Spirit", "Wine"]
continents = [
    "Asia",
    "Africa",
    "Europe",
    "North America",
    "South America",
    "Oceania",
]
countries = [f"country_{i}" for i in range(10)]
index = pd.MultiIndex.from_product(
    (continents, countries), names=["continent", "country"]
)
data = np.random.randint(1_000_000, size=(len(index), len(drinks )))

df = pd.DataFrame(index=index, columns=columns, data=data).reset_index()

数据结构

最让人不快的是，每个数据点都有自己的变量。

第一步是使用字典：

data_by_continent = {
    continent: df.loc[df["continent"] == continent]
    for continent in continents
}

注意，我使用.loc显式地创建了一个副本，而不是一个视图，以防止代码的一个部分中的更改污染另一个部分。

那么，每个大陆的精神消费是：

spirit_per_continent = {
    continent: data.loc[
        data["Spirit"].nlargest(5).index, ["country", "Spirit"]
    ]
    for continent, data in data_by_continent.items()
}

和每种饮料嵌套

consumption_per_drink_continent = {
    drink: {
        continent: data.loc[
            data[drink].nlargest(5).index, ["country", drink]
        ]
        for continent, data in data_by_continent.items()
    }
    for drink in drinks
}

熊猫群(

)

如果你把你的数据转换成一个整洁的格式，你可以使用一个简单的组。

pandas.melt是一种非常方便的数据格式化方法。

df2 = pd.melt(
    df,
    id_vars=["continent", "country"],
    var_name="drink",
    value_name="consumption",
)

大陆国家的饮料消费..。175个大洋洲country_5葡萄酒456551 176号大洋洲country_6葡萄酒894498 177号大洋洲country_7葡萄酒899684 178大洋洲country_8葡萄酒158338 179大洋洲country_9葡萄酒623094

群

现在您可以使用groupby，然后加入df2索引，介绍国家。

(
    df2.groupby(["continent", "drink"])["consumption"]
    .nlargest(5)
    .reset_index(["continent", "drink"])
    .sort_values(
        ["continent", "drink", "consumption"], ascending=[True, True, False]
    )
    .join(df2["country"])
)

continent drink consumption country 17 Africa Beer 953277 country\_7 19 Africa Beer 902648 country\_9 15 Africa Beer 527035 country\_5 13 Africa Beer 500186 country\_3 14 Africa Beer 384681 country\_4 ... ... ... ... ... 162 South America Wine 837646 country\_2 160 South America Wine 742139 country\_0 167 South America Wine 688519 country\_7 161 South America Wine 516588 country\_1 166 South America Wine 136330 country\_6 90 rows × 4 columns

票数 0

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/237876

复制

相似问题

问酒精消费项目
EN

回答 2

Code Review用户

Code Review用户

sample_data

数据结构

熊猫群(

群

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问酒精消费项目EN

回答 2

Code Review用户

Code Review用户

sample_data

数据结构

熊猫群(

群

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问酒精消费项目
EN